CN-122021840-A - Knowledge graph construction method and system
Abstract
The invention discloses a knowledge graph construction method and a knowledge graph construction system, which are used for carrying out feature extraction on acquired multi-source operation and maintenance text data to generate context vector features, static word vector features and statistical text features, carrying out weighted fusion on the three features to generate text representations, wherein a time sequence and causal ontology is used as a weight adjustment basis, the greater the similarity between the multi-source operation and maintenance text data and the ontology is, the greater the weight coefficient corresponding to the context vector features is, the time sequence and causal ontology is a structured knowledge framework about time sequence rules and causal mechanisms, identifying entity and entity association relationships based on the generated text representations, generating a target result through entity alignment and information fusion processing, and finally constructing a knowledge graph according to the target result. According to the embodiment of the invention, through multi-dimensional feature extraction and combining a weighted fusion strategy guided by a time sequence and a causal ontology, accurate identification of the entity and the entity association relationship is realized, and the accuracy of knowledge graph construction is further improved.
Inventors
- WANG YIDAN
- FENG JUN
- QI DONGLIAN
- YAN YUNFENG
- GONG SHICHAO
- GAO BO
- CAI YANAN
- CHEN ZUGE
Assignees
- 国网浙江省电力有限公司信息通信分公司
- 国网浙江省电力有限公司
- 浙江大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (10)
- 1. The knowledge graph construction method is characterized by comprising the following steps of: Acquiring multi-source operation and maintenance text data; extracting the characteristics of the multi-source operation and maintenance text data to obtain context vector characteristics, static word vector characteristics and statistical text characteristics; The context vector features, the static word vector features and the statistical text features are subjected to weighted fusion to generate text representation, wherein the greater the similarity between the multi-source operation and maintenance text data and a pre-constructed time sequence and causal ontology is, the greater the weight coefficient corresponding to the context vector features is, and the time sequence and causal ontology is a structural knowledge framework about time sequence rules and causal mechanisms; Performing recognition of the entity and the entity association relation based on the text representation to obtain an initial recognition result; Performing entity alignment and information fusion processing on the initial identification result to obtain a target result; And constructing a knowledge graph according to the target result.
- 2. The knowledge-graph construction method of claim 1 wherein said weighting and fusing said context vector features, said static word vector features, and said statistical text features to generate a text representation comprises: The time sequence and the causal ontology are obtained, and the multi-source operation and maintenance text data are encoded into query vectors; Calculating cosine similarity of the query vector and the mode vector; determining a first target weight coefficient, a second target weight coefficient and a third target weight coefficient based on the cosine similarity, wherein the first target weight coefficient and the cosine similarity are in positive correlation and correspond to the context vector feature, and the second target weight coefficient and the third target weight coefficient are respectively in negative correlation with the cosine similarity and correspond to the static word vector feature and the statistical text feature; And generating a text representation based on the first target weight coefficient, the second target weight coefficient and the third target weight coefficient, and fusing the context vector feature, the static word vector feature and the statistical text feature.
- 3. The knowledge-graph construction method of claim 2, wherein determining a first target weight coefficient, a second target weight coefficient, and a third target weight coefficient based on the cosine similarity comprises: Acquiring a first initial weight coefficient, a second initial weight coefficient and a third initial weight coefficient, wherein the first initial weight coefficient corresponds to the context vector feature, the second initial weight coefficient corresponds to the static word vector feature, and the third initial weight coefficient corresponds to the statistical text feature; Outputting a causal activation vector based on the cosine similarity, wherein the causal activation vector and the cosine similarity are in positive correlation; Sequentially passing the causal activation vector through a full-connection network and an S-shaped function to generate a gating signal; And adjusting the first initial weight coefficient, the second initial weight coefficient and the third initial weight coefficient according to the gating signal to obtain the first target weight coefficient, the second target weight coefficient and the third target weight coefficient, wherein the first target weight coefficient and the gating signal are in positive correlation and correspond to the context vector feature, and the second target weight coefficient and the third target weight coefficient are respectively in negative correlation and correspond to the static word vector feature and the statistical text feature.
- 4. The knowledge graph construction method of claim 1, wherein the identifying the entity and the entity association relationship based on the text representation to obtain an initial identification result comprises: Processing the text representation through a two-way long-short-term memory network to obtain the matching degree score of each time step in the multi-source operation and maintenance text data and various labels in a predefined label set; for the current time step, taking each tag in the predefined tag set as a candidate tag of the current time step; combining the matching degree score of the candidate label and the transfer score of the optimal label to the candidate label in the last time step, and calculating to obtain the path score of each transfer path; according to the path score, screening out an optimal label of the current time step; when all time steps are traversed, outputting an optimal tag sequence, wherein the optimal tag sequence is generated based on the optimal tags of all time steps; and identifying the entity and entity association relation of the multi-source operation and maintenance text data based on the optimal tag sequence to obtain an initial identification result.
- 5. The knowledge graph construction method of claim 4 wherein the optimal label of the first time step is the label with the highest score of the matching degree in the predefined label set.
- 6. The knowledge graph construction method of claim 4, wherein the calculating a path score of each transfer path by combining the matching degree score of the candidate tag and the transfer score of the optimal tag to the candidate tag in the previous time step comprises: when the optimal label of the last time step represents the end of one event and the candidate label of the current time step represents the start of another event, extracting the event dependency relationship to be verified of the event corresponding to the last time step and the event corresponding to the current time step based on the multi-source operation and maintenance text data; setting the time sequence causal score to be a preset negative value when the event dependency relationship to be verified does not meet the requirements of the time sequence and causal body; when the optimal label of the last time step does not represent the end of one event or the candidate label of the current time step does not represent the start of another event, setting the time sequence causal score to be a preset intermediate value, wherein the preset intermediate value is larger than the preset negative value and smaller than the preset positive value; And calculating the path score of each transfer path by combining the time sequence causal score, the matching degree score of the candidate label and the transfer score of the optimal label to the candidate label in the last time step.
- 7. The knowledge graph construction method of claim 6, wherein when the optimal label of the previous time step represents the end of one event and the candidate label of the current time step represents the start of another event, extracting the event dependency relationship to be verified of the event corresponding to the previous time step and the event corresponding to the current time step based on the multi-source operation and maintenance text data comprises: When the optimal label of the last time step represents the end of one event and the candidate label of the current time step represents the start of another event, positioning text contents respectively corresponding to the optimal label of the last time step and the candidate label of the current time step in the multi-source operation and maintenance text data, and extracting event dependency information; And generating an event dependency relationship to be verified based on the positioned text content and the event dependency information.
- 8. The knowledge graph construction method of claim 6, wherein the calculating a path score for each transfer path by combining the time-series causal score, the matching degree score of the candidate label, and the transfer score of the optimal label to the candidate label in the previous time step comprises: And adding the time sequence causal score, the matching degree score of the candidate label and the transfer score of the optimal label to the candidate label in the last time step to obtain a path score of each transfer path, wherein the absolute value of the preset negative value is larger than a preset maximum value, and the preset maximum value is the sum of the preset upper limit value of the matching degree score and the preset upper limit value of the transfer score.
- 9. A knowledge graph construction system, comprising: The data acquisition module is used for acquiring multi-source operation and maintenance text data; the feature extraction module is used for carrying out feature extraction on the multi-source operation and maintenance text data to obtain context vector features, static word vector features and statistical text features; the feature fusion module is used for carrying out weighted fusion on the context vector features, the static word vector features and the statistical text features to generate text representation, wherein the greater the similarity between the multi-source operation and maintenance text data and a pre-constructed time sequence and causal body is, the greater the weight coefficient corresponding to the context vector features is, and the time sequence and causal body is a structural knowledge framework about a time sequence rule and a causal mechanism; the entity identification module is used for identifying the entity and the entity association relation based on the text representation to obtain an initial identification result; the alignment and fusion module is used for carrying out entity alignment and information fusion processing on the initial identification result to obtain a target result; and the knowledge graph construction module is used for constructing a knowledge graph according to the target result.
- 10. The knowledge graph construction system of claim 9, wherein the feature fusion module is specifically configured to: The time sequence and the causal ontology are obtained, and the multi-source operation and maintenance text data are encoded into query vectors; Calculating cosine similarity of the query vector and the mode vector; determining a first target weight coefficient, a second target weight coefficient and a third target weight coefficient based on the cosine similarity, wherein the first target weight coefficient and the cosine similarity are in positive correlation and correspond to the context vector feature, and the second target weight coefficient and the third target weight coefficient are respectively in negative correlation with the cosine similarity and correspond to the static word vector feature and the statistical text feature; And generating a text representation based on the first target weight coefficient, the second target weight coefficient and the third target weight coefficient, and fusing the context vector feature, the static word vector feature and the statistical text feature.
Description
Knowledge graph construction method and system Technical Field The invention relates to the technical field of data processing, in particular to a knowledge graph construction method and a knowledge graph construction system. Background In the operation and maintenance process of the system, operation and maintenance data are stored in various system carriers in a scattered way to gradually form a multi-source heterogeneous text data pool, so that the island phenomenon of operation and maintenance knowledge is directly caused to be prominent, and the data call of a cross-system and the intelligent decision of a cross-scene are difficult to effectively land. Knowledge graph has become the core technical direction for breaking the operation and maintenance knowledge barriers and solving the 'islanding' problem by virtue of the strong semantic representation capability and the associated organization capability. However, the current operation and maintenance knowledge management scheme based on the knowledge graph still has obvious defects that entity identification is used as a key link of graph construction, the prior art mostly adopts a mode of 'single feature extraction', and the single feature extraction mode cannot fully cover deep semantic information of an operation and maintenance text, so that entity identification precision is low, and finally the constructed knowledge graph has insufficient precision. Disclosure of Invention Based on the method and the system, the invention provides a knowledge graph construction method and a system, which are used for solving the defect that the entity identification and knowledge graph construction accuracy are insufficient due to single feature extraction in the prior art. In order to achieve the above object, an embodiment of the present invention provides a knowledge graph construction method, including: Acquiring multi-source operation and maintenance text data; extracting the characteristics of the multi-source operation and maintenance text data to obtain context vector characteristics, static word vector characteristics and statistical text characteristics; The context vector features, the static word vector features and the statistical text features are subjected to weighted fusion to generate text representation, wherein the greater the similarity between the multi-source operation and maintenance text data and a pre-constructed time sequence and causal ontology is, the greater the weight coefficient corresponding to the context vector features is, and the time sequence and causal ontology is a structural knowledge framework about time sequence rules and causal mechanisms; Performing recognition of the entity and the entity association relation based on the text representation to obtain an initial recognition result; Performing entity alignment and information fusion processing on the initial identification result to obtain a target result; And constructing a knowledge graph according to the target result. In order to achieve the above object, an embodiment of the present invention further provides a knowledge graph construction system, including: The data acquisition module is used for acquiring multi-source operation and maintenance text data; the feature extraction module is used for carrying out feature extraction on the multi-source operation and maintenance text data to obtain context vector features, static word vector features and statistical text features; the feature fusion module is used for carrying out weighted fusion on the context vector features, the static word vector features and the statistical text features to generate text representation, wherein the greater the similarity between the multi-source operation and maintenance text data and a pre-constructed time sequence and causal body is, the greater the weight coefficient corresponding to the context vector features is, and the time sequence and causal body is a structural knowledge framework about a time sequence rule and a causal mechanism; the entity identification module is used for identifying the entity and the entity association relation based on the text representation to obtain an initial identification result; the alignment and fusion module is used for carrying out entity alignment and information fusion processing on the initial identification result to obtain a target result; and the knowledge graph construction module is used for constructing a knowledge graph according to the target result. Compared with the prior art, the knowledge graph construction method and system disclosed by the embodiment of the invention comprise the steps of firstly acquiring multi-source operation and maintenance text data, then carrying out feature extraction on the multi-source operation and maintenance text data to generate context vector features, static word vector features and statistical text features, then carrying out weighted fusion on the three features to generate text representa