US-12626191-B2 - Knowledge graph fusion method based on iterative completion

US12626191B2US 12626191 B2US12626191 B2US 12626191B2US-12626191-B2

Abstract

Provided is a knowledge graph fusion method based on iterative completion, which includes: obtaining multiple knowledge graphs, and identifying each of entities of the multiple knowledge graphs; performing structure vector representation learning on each of entities to obtain a structure vector of each of entities, and performing entity name vector representation learning on each of entities to obtain an entity name vector of each of entities; determining a structural similarity between the entities according to the structure vector of each of entities, and determining an entity name similarity between the entities according to the entity name vector of each of entities; constructing a degree-aware-based co-attention network, and calculating an entity similarity between fused entities through the degree-aware-based co-attention network; and obtaining a high-confidence entity pair according to the entity similarity between the fused entities, and performing knowledge graph completion by iterative training to obtain fused knowledge graphs.

Inventors

Xiang Zhao
Weixin Zeng
Jiuyang TANG
Hongbin Huang
Jibing WU
Deke Guo
Lailong Luo

Assignees

NATIONAL UNIVERSITY OF DEFENSE TECHNOLOGY

Dates

Publication Date: 20260512
Application Date: 20230116

Claims (7)

1 . A knowledge graph fusion method based on iterative completion, executed by a processor and comprising: step 1, obtaining a plurality of knowledge graphs, and identifying each of entities of the plurality of knowledge graphs; step 2, performing structure vector representation learning on each of entities of the plurality of knowledge graphs to obtain a structure vector of each of entities of the plurality of knowledge graphs, and performing entity name vector representation learning on each of entities of the plurality of knowledge graphs to obtain an entity name vector of each of entities of the plurality of knowledge graphs; step 3, determining a structural similarity between the entities of the plurality of knowledge graphs according to the structure vector of each of entities of the plurality of knowledge graphs, and determining an entity name similarity between the entities of the plurality of knowledge graphs according to the entity name vector of each of entities of the plurality of knowledge graphs; step 4, constructing a degree-aware-based co-attention network, and calculating an entity similarity between fused entities through the degree-aware-based co-attention network; wherein the calculating the entity similarity between fused entities through the degree-aware-based co-attention network comprises: step 401, constructing a feature matrix of each of entities of the multiple knowledge graphs, where the feature matrix of each entity is composed of the entity name vector {right arrow over (N)}(e) of the entity, the structure vector {right arrow over (Z)}(e) of the entity, and an entity degree vector {right arrow over (g)} e of the entity; the entity degree vector {right arrow over (g)} e is expressed as g → e = M → · h → e ∈ ℝ d g , where {right arrow over (h)} e represents a one-hot vector of a degree of the entity, {right arrow over (M)} represents a full connection parameter matrix, and d g represents a dimension of the entity degree vector; for an entity e 1 of the entities of the plurality of knowledge graphs, a feature vector thereof is expressed as: F → e 1 = [ N → ( e 1 ) ; Z → ( e 1 ) ; ⋅ ⁢ g → e 1 ] ∈ ℝ 3 × d m , where; represents concatenation along columns, d m =max {d n , d s , d g }, d n represents a dimension of the entity name vector of the entity e 1 , and d s represents a dimension of the structure vector of the entity e 1 ; and step 402, constructing a co-attention similarity matrix S∈ 3×3 for dynamically depicting a relationship between the feature matrix {right arrow over (F)} e of the entity e 1 and a feature matrix {right arrow over (F)} e 2 of an entity e 2 , wherein a similarity between an i-th feature of the entity e 1 and a j-th feature of the entity e 2 is expressed as: S → ij = α ⁡ ( F → e 1 i : , F → e 2 j : ) ∈ ℝ , where F → e 1 i : represents an i-th row vector of the feature matrix F → e 1 , F → e 2 j : represents a j-th row vector of the feature matrix {right arrow over (F)} e 2 , i=1,2,3; j=1,2,3, α({right arrow over (u)}, {right arrow over (v)})={right arrow over (w)} T ({right arrow over (u)}⊕{right arrow over (v)}⊕({right arrow over (u)} ∘ {right arrow over (v)})) represents a trainable scalar function for generating similarity, {right arrow over (w)}∈ 3d m represents a parameter vector, ⊕ represents concatenation along rows, and ∘ represents an element-wise multiplication operation; and step 403, generating attention matrices {right arrow over (att 1 )} and {right arrow over (att 2 )} using the co-attention similarity matrix {right arrow over (S)}, comprising: inputting the co-attention similarity matrix {right arrow over (S)} into a softmax layer of the degree-aware-based co-attention network and then inputting into an average layer of the degree-aware-based co-attention network, to thereby generate the attention matrices {right arrow over (att 1 )} and {right arrow over (att 2 )}, wherein the attention matrix {right arrow over (att 1 )} represents a correlation degree of a feature of the entity e 1 and a feature of the entity e 2 , {right arrow over (att 2 )} represents a correlation degree of the feature of the entity e 2 and the feature of the entity e 1 ; obtaining the entity similarity between the fused entities through multiplying similarities of different features of the fused entities by weights of the similarities of different features of the fused entities respectively, the entity similarity between fused entities is expressed as: Sim ( e 1 , e 2 ) = Sim s ( e 1 , e 2 ) · att → 1 s + Sim t ( e 1 , e 2 ) · att → 1 t , where att → 1 s ⁢ and ⁢ att → 1 t are a first value and a second value of the attention matrix {right arrow over (att 1 )}, respectively, and represent a weight of the structural similarity Sim s (e 1 , e 2 ) and a weight of the entity name similarity Sim t (e 1 , e 2 ), respectively; step 5, obtaining a high-confidence entity pair according to the entity similarity between the fused entities, and performing knowledge graph completion by iterative training to obtain fused knowledge graphs; and step 6, deploying the fused knowledge graphs to a question answering system, wherein the question answering system is configured to utilize knowledge in the fused knowledge graphs to generate responses to user queries.
2 . The knowledge graph fusion method based on iterative completion according to claim 1 , wherein the entity name vector is a power average word vector; wherein one entity of the entities of the plurality of knowledge graphs has an entity name s, a word vector of words constituting the entity name s is expressed in a matrix form as: {right arrow over (W)}=[{right arrow over (w)} 1 , . . . , {right arrow over (w)} l ]∈ l×d , where l represents a quantity of the words, and d represents an embedded dimension; wherein the power average word vector is generated through performing a power average operation on {right arrow over (w)} 1 , . . . , {right arrow over (w)} l , where {right arrow over (w)} 1 , . . . , {right arrow over (w)} l ∈ d ; and the power average operation is expressed by a following formula: H p ( W → ) = ( w 1 ⁢ i p + … + w li p l ) 1 / p , ∀i=1, . . . , d, p∈ ∪±∞, where H p ({right arrow over (W)})∈ d represents the generated power average word vector after processing {right arrow over (w)} 1 , . . . , {right arrow over (w)} l .
3 . The knowledge graph fusion method based on iterative completion according to claim 1 , wherein the entity name vector is a concatenated K-power average word vector, wherein one entity of the entities of the plurality of knowledge graphs has an entity name s, a word vector of words constituting the entity name s is expressed in a matrix form as: {right arrow over (W)}=[{right arrow over (w)} 1 , . . . , {right arrow over (w)} l ]∈ l×d , where l represents a quantity of the words, and d represents an embedded dimension; wherein the concatenated K-power average word vector {right arrow over (n)} s ∈ d×K is obtained by calculating K-power average word vectors of the word vector of the words of the entity name, and then concatenating the K-power average word vectors; wherein the concatenated K-power average word vector {right arrow over (n)} s ∈ d×K is expressed as a following formula: {right arrow over (n)} s =H p 1 ({right arrow over (W)})⊕ . . . ⊕ H p K (W), where ⊕ represents concatenation along rows, and p 1 , . . . , p K represent K different power mean values.
4 . The knowledge graph fusion method based on iterative completion according to claim 3 , wherein each of the K different power mean values is one selected from the group consisting of 1, negative infinity and positive infinity.
5 . The knowledge graph fusion method based on iterative completion according to claim 1 , wherein the structural similarity Sim s (e 1 , e 2 ) is a cosine similarity between the structure vector {right arrow over (Z)}(e 1 ) of the entity e 1 and the structure vector {right arrow over (Z)}(e 2 ) of the entity e 2 , and the entity name similarity Sim t (e 1 , e 2 ) is a cosine similarity between the entity name vector {right arrow over (N)}(e 1 ) of the entity e 1 and the entity name vector {right arrow over (N)}(e 2 ) of the entity e 2 .
6 . The knowledge graph fusion method based on iterative completion according to claim 1 , wherein the obtaining the high-confidence entity pair according to the entity similarity between the fused entities comprises: assuming that, for the entity e 1 in an original knowledge graph, the entity e 2 in an external knowledge graph has a most similarity with the entity e 1 , an entity in the external e 2 ′ in the external knowledge has a second most similarity with the entity e 1 , and a similarity difference therebetween is Δ 1 = Δ Sim ⁡ ( e 1 , e 2 ) - Sim ⁡ ( e 1 , e 2 ′ ) ; and for the entity e 2 in the external knowledge graph, the entity e 1 in the original knowledge graph has a most similarity with the entity e 2 , the entity e 1 ′ in the original knowledge graph has a second most similarity with the entity e 2 , and a similarity difference therebetween is Δ 2 = Δ Sim ⁡ ( e 2 , e 1 ) - S ⁢ i ⁢ m ⁡ ( e 2 , e 1 ′ ) , and determining (e 1 , e 2 ) as the high-confidence entity pair, if the similarity differences Δ 1 and Δ 2 are each higher than a preset value; wherein the iterative training of the knowledge graph completion comprises a plurality of rounds; during the iterative training, for each triplet in the external knowledge graph, if a head entity of the triplet and a tail entity of the triplet are in the original knowledge graph, an entity in the external knowledge graph is replaced with a corresponding entity in the original knowledge graph and added to the original knowledge graph to obtain a knowledge graph after adding; and the knowledge graph after adding is used to: re-perform the structure vector representation learning, re-calculate the entity similarity, generate a new high-confidence entity pair, and re-perform the knowledge graph completion, and the iterative training is stopped until a stop condition is satisfied.
7 . A knowledge graph fusion method based on iterative completion, executed by a processor and comprising: step 1, obtaining a plurality of knowledge graphs, and identifying each of entities of the plurality of knowledge graphs; step 2, performing structure vector representation learning on each of entities of the plurality of knowledge graphs to obtain a structure vector of each of entities of the plurality of knowledge graphs, and performing entity name vector representation learning on each of entities of the plurality of knowledge graphs to obtain an entity name vector of each of entities of the plurality of knowledge graphs; step 3, determining a structural similarity between the entities of the plurality of knowledge graphs according to the structure vector of each of entities of the plurality of knowledge graphs, and determining an entity name similarity between the entities of the plurality of knowledge graphs according to the entity name vector of each of entities of the plurality of knowledge graphs; step 4, constructing a degree-aware-based co-attention network, and calculating an entity similarity between fused entities through the degree-aware-based co-attention network; step 5, obtaining a high-confidence entity pair according to the entity similarity between the fused entities, and performing knowledge graph completion by iterative training to obtain fused knowledge graphs; and step 6, integrating the fused knowledge graphs into a question answering system by storing the fused knowledge graphs in a knowledge base storage accessible to the question answering system, and configuring the question answering system to query the fused knowledge graphs to generate responses to user queries; wherein the obtaining the high-confidence entity pair according to the entity similarity between the fused entities comprises: assuming that, for each entity e 1 in an original knowledge graph, an entity e 2 in an external knowledge graph has a most similarity with the entity e 1 , an entity e 2 ′ in the external knowledge has a second most similarity with the entity e 1 , and a similarity difference therebetween is Δ 1 = Δ Sim ⁡ ( e 1 , e 2 ) - Sim ⁡ ( e 1 , e 2 ′ ) ; and for the entity e 2 in the external knowledge graph, the entity e 1 in the original knowledge graph has a most similarity with the entity e 2 , the entity e 1 ′ in the original knowledge graph has a second most similarity with the entity e 2 , and a similarity difference therebetween is Δ 2 = Δ Sim ⁡ ( e 2 , e 1 ) - Sim ⁡ ( e 2 , e 1 ′ ) , and determining (e 1 , e 2 ) as the high-confidence entity pair, if the similarity differences Δ 1 , Δ 2 are each higher than a preset value; and wherein the iterative training of the knowledge graph completion comprises a plurality of rounds; during the iterative training, for each triplet in the external knowledge graph, if a head entity of the triplet and a tail entity of the triplet are in the original knowledge graph, an entity in the external knowledge graph is replaced with a corresponding entity in the original knowledge graph and added to the original knowledge graph to obtain a knowledge graph after adding; and the knowledge graph after adding is used to: re-perform the structure vector representation learning, re-calculate the entity similarity, generate a new high-confidence entity pair, and re-perform the knowledge graph completion, and the iterative training is stopped until a stop condition is satisfied.

Description

TECHNICAL FIELD The present disclosure relates to the technical field of natural language processing, and to generation and fusion of a knowledge graph, in particularly to a knowledge graph fusion method based on iterative completion. DESCRIPTION OF RELATED ART Over recent years, a large number of knowledge graphs (KGs) have emerged, such as YAGO, DBpedia, and Knowledge Vault. These large-scale knowledge graphs play an important role in intelligent services such as a question answering system and personalized recommendation. In addition, in order to meet the requirements of specific fields, more and more domain-specific knowledge graphs, such as academic knowledge graphs, are derived. However, no knowledge graph can be complete or completely correct. In order to improve the coverage and accuracy of a knowledge graph, a feasible method is introducing relevant knowledge from other knowledge graphs, the reason is that there is redundancy and complementarity of knowledge among knowledge graphs constructed in different ways. For example, a general-purpose knowledge graph extracted and constructed from web pages may only contain names of scientists, while more information can be found in academic knowledge graphs constructed based on academic data. In order to integrate knowledge of external knowledge graphs into a target knowledge graph, the most important step is to align different knowledge graphs. Therefore, a task of entity alignment (EA) has been put forward and received wide attention. This task aims to find entity pairs that express a same meaning in different knowledge graphs. These entity pairs serve as bonds for linking the different knowledge graphs, and serve the consequent tasks. At present, mainstream entity alignment methods mainly determine whether two entities point to a same thing by means of structural features of the knowledge graph. These methods assume that entities expressing a same meaning in different knowledge graphs have similar adjacency information. An entity generation structure vector proposed by Lingbing Guo et al., which brought a certain effect on the recognition of entity pairs (referring to the reference document Lingbing Guo, Zequn Sun, and Wei Hu, 2019, Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs, In Proceedings of the 36th International Conference on Ma Chinese Learning, ICML 2019, 9-15 Jun. 2019, Long Beach, California, USA. 2505-2514). Further, these methods have achieved the best experimental results on artificially constructed data sets. However, a recent work pointed out that knowledge graphs in these artificially constructed data sets are denser than those in the real world, and thus the effect of the entity alignment method based on the structural features on the knowledge graphs with normal distribution is greatly reduced. In fact, by analyzing the distribution of entities of the knowledge graphs in the real world, it can be known that more than half of the entities are each connected to only one or two other entities. These entities are called long-tail entities, which occupy most of the entities of the knowledge graphs, making the knowledge graphs show high sparseness as a whole, which is also in line with the knowledge graphs in the real world: only a few entities are frequently used and have rich adjacency information; and most entities are rarely mentioned and contain little structure information. Therefore, current entity alignment methods and knowledge graph fusion methods based on structure information are unsatisfactory in data sets in the real world. SUMMARY In view of this, the objective of the present disclosure is to propose a knowledge graph fusion method based on iterative completion. The method overcomes the shortcomings of the related art, and is used to identify and align identical or similar entities from multiple knowledge graphs, so as to realize knowledge fusion of multiple knowledge graphs and improve the coverage and accuracy of knowledge graphs. Based on the above objective, a knowledge graph fusion method based on iterative completion is provided, which includes: step 1, obtaining multiple knowledge graphs, and identifying each of entities of the multiple knowledge graphs;step 2, performing structure vector representation learning on each of entities of the multiple knowledge graphs to obtain a structure vector of each of entities of the multiple knowledge graphs, and performing entity name vector representation learning on each of entities of the multiple knowledge graphs to obtain an entity name vector of each of entities of the multiple knowledge graphs;step 3, determining a structural similarity between the entities of the multiple knowledge graphs according to the structure vector of each of entities of the multiple knowledge graphs, and determining an entity name similarity between the entities of the multiple knowledge graphs according to the entity name vector of each of entities of the multiple knowled