CN-121981226-A - Knowledge fusion method, device and equipment

CN121981226ACN 121981226 ACN121981226 ACN 121981226ACN-121981226-A

Abstract

The application provides a knowledge fusion method, a knowledge fusion device and knowledge fusion equipment, relates to the technical field of knowledge maps, and aims to improve the quality of constructing knowledge maps. The method comprises the steps of respectively encoding target entities to be stored in a knowledge base and each entity in the knowledge base to obtain respective first characteristic representations of the entities, processing the respective first characteristic representations of the entities through an entity link model to obtain respective second characteristic representations of the entities, linking the target entities to one entity in the knowledge base according to the respective second characteristic representations of the entities, carrying out entity clustering on the entities in the knowledge base after entity linking, fusing relations corresponding to the entities in the knowledge base after entity linking and entity clustering to obtain the knowledge base after entity linking and entity clustering and relation fusion, and using the entities and the relations in the knowledge base to construct a knowledge graph.

Inventors

NIE XIAONING
LIU BO
PENG BAOYUN
SUN XIAO
GUO PENGFEI
YANG PEIYING

Assignees

北京大数据先进技术研究院

Dates

Publication Date: 20260505
Application Date: 20260121

Claims (10)

1. A method of knowledge fusion, the method comprising: Encoding a target entity to be stored in a knowledge base and each entity in the knowledge base respectively to obtain respective first characteristic representations of the target entity and each entity in the knowledge base; Processing the first characteristic representations of the target entity and each entity in the knowledge base through an entity link model to obtain second characteristic representations of the target entity and each entity in the knowledge base, wherein the entity link model is used for enabling the second characteristic representations of the entities with the same semantics to be close to each other and enabling the second characteristic representations of the entities with different semantics to be far away from each other; According to the target entity and the respective second characteristic representation of each entity in the knowledge base, linking the target entity to one entity in the knowledge base to obtain a knowledge base after entity linking; performing entity clustering on each entity in the knowledge base after the entity linking to obtain the knowledge base after the entity linking and the entity clustering; and fusing the relations corresponding to the entities in the knowledge base after the entity links and the entity clusters to obtain the knowledge base after the entity links and the entity clusters and the relations are fused, and constructing a knowledge graph by using the entities and the relations in the knowledge base after the entity links and the entity clusters and the relations are fused.
2. The knowledge fusion method according to claim 1, wherein linking the target entity to one entity in the knowledge base according to the second feature representation of each entity in the knowledge base and the target entity, to obtain the knowledge base after entity linking, comprises: determining a similarity between the second feature representation of each entity in the knowledge base and the second feature representation of the target entity according to the second feature representation of the entity; And sequencing all the entities in the knowledge base according to the sequence from big to small of the similarity between the second characteristic representations, linking the target entity to any entity in K entities in the knowledge base, and obtaining the knowledge base after entity linking, wherein K is an integer greater than 0.
3. The knowledge fusion method according to claim 1, wherein performing entity clustering on each entity in the knowledge base after entity linking to obtain the knowledge base after entity linking and entity clustering comprises: generating an entity diagram structure according to first characteristic representations of all entities in a knowledge base after entity linking, wherein the entity diagram structure represents topological dependency relations among the entities; Processing the entity graph structure through a graph neural network to obtain a third characteristic representation of each entity of the knowledge base after entity linkage, wherein the third characteristic representation is obtained by adopting a neighborhood aggregation mechanism; Clustering third feature representations of the entities in the knowledge base through a clustering network; and according to the third characteristic representation of the plurality of entities in the same cluster, merging the plurality of entities in the same cluster into one entity to obtain a knowledge base after entity link and entity clustering.
4. The knowledge fusion method according to claim 1, wherein the fusing of the relationships corresponding to the respective entities in the knowledge base after the entity links and the entity clusters to obtain the knowledge base after the entity links and the entity clusters and the relationships fusion comprises: Obtaining a relation set according to the relation corresponding to each entity in the knowledge base after entity linking and entity clustering; Determining a first triplet corresponding to the first relation and a second triplet corresponding to the second relation as a relation triplet pair by taking any two relations in the relation set as a first relation and a second relation, so as to obtain a plurality of relation triplet pairs; For each relation triplet pair, extracting features of the relation triplet pair according to a first head entity, a first tail entity and the first relation in the first triplet and a second head entity, a second tail entity and the second relation in the second triplet to obtain feature vectors of the relation triplet pair; For each relation triplet pair, determining the fusion probability of the relation triplet pair according to the feature vector of the relation triplet pair; And combining the first triplet and the second triplet in the relation triplet pair into one relation triplet under the condition that the fusion probability of the relation triplet pair is larger than a probability threshold value aiming at each relation triplet pair, so as to obtain a knowledge base after entity linkage, entity clustering and relation fusion.
5. A knowledge fusion method according to claim 3, wherein generating an entity graph structure from the first feature representations of the respective entities in the knowledge base after entity linking comprises: determining the similarity between the first characteristic representation of each entity and the first characteristic representations of other entities in the knowledge base after entity linking by taking one entity in the knowledge base after entity linking as a node; adding an edge between nodes corresponding to two entities with similarity higher than target similarity between the first feature representations to generate the entity graph structure; Processing the entity graph structure through a graph neural network to obtain a third characteristic representation of each entity of the knowledge base after entity linkage, wherein the third characteristic representation comprises the following steps: For each entity in the knowledge base after entity linking, determining each entity connected with the entity in the entity graph structure as a neighborhood entity of the entity through the graph neural network; for each entity in the knowledge base after entity linking, aggregating the first feature representation of each neighborhood entity of the entity to the first feature representation of the entity through the graph neural network to obtain a third feature representation of the entity.
6. A knowledge fusion method according to claim 3, wherein the graph neural network is trained according to the following steps: Obtaining a plurality of first sample entities and corresponding first sample entity graph structures, each first sample entity having a pre-labeled cluster identifier; Processing the first sample entity graph structure through a graph neural network to be trained to obtain respective third characteristic representations of the plurality of first sample entities, wherein the third characteristic representations are obtained by adopting a neighborhood aggregation mechanism; clustering third characteristic representations of the plurality of first sample entities through the clustering network to obtain a clustering result; and under the condition that the clustering result represents that the first sample entities carrying different clustering cluster identifiers are in the same clustering cluster, updating the model parameters of the graph neural network to be trained until the clustering result represents that the first sample entities carrying different clustering cluster identifiers are in different clustering clusters and the clustering cluster identifiers carried by the first sample entities in the same clustering cluster are all the same clustering cluster identifier, stopping training, and obtaining the graph neural network.
7. The knowledge fusion method of claim 1, wherein the entity link model is trained according to the following steps: Obtaining a first feature representation of each of a plurality of second sample entities and semantics of each of the plurality of second sample entities; for each second sample entity, constructing a positive sample pair of the second sample entity based on other second sample entities having the same semantics as the second sample entity, and constructing a negative sample pair of the second sample entity based on other second sample entities having different semantics from the second sample entity; processing the first characteristic representation of each second sample entity through the entity link model to be trained to obtain a second characteristic representation of the second sample entity; For each second sample entity, determining a similarity between second feature representations of two second sample entities in a positive sample pair of the second sample entity, and determining a similarity between second feature representations of two second sample entities in a negative sample pair of the second sample entity; And updating model parameters of the entity link model to be trained by taking the maximum similarity of positive sample pairs of the plurality of second sample entities and the minimum similarity of negative sample pairs of the plurality of second sample entities as targets, and obtaining the entity link model after training is finished.
8. The knowledge fusion method of claim 4, wherein the feature vector of each relationship triplet pair is obtained by: For each relation triplet pair, determining head entity similarity according to a first head entity in the first triplet and a second head entity in the second triplet; Determining tail entity similarity according to a first tail entity in the first triplet and a second tail entity in the second triplet; Determining a relationship similarity according to a first relationship in the first triplet and a second relationship in the second triplet; Determining triple structure similarity according to the head entity similarity, the tail entity similarity and the relationship similarity; For each relationship triplet pair, determining a confidence score according to the head entity similarity, the tail entity similarity, the relationship similarity and the triplet structure similarity; Determining the neighborhood entity similarity of the relation triplet pair according to the neighborhood entity of the first head entity and the neighborhood entity of the second head entity and the neighborhood entity of the first tail entity and the neighborhood entity of the second tail entity; And for each relation triplet pair, obtaining a feature vector of the relation triplet pair at least according to the neighborhood entity similarity and the confidence score.
9. A knowledge fusion apparatus, the apparatus comprising: the coding module is used for respectively coding a target entity to be stored in a knowledge base and each entity in the knowledge base to obtain respective first characteristic representations of the target entity and each entity in the knowledge base; The processing module is used for processing the first characteristic representations of the target entity and each entity in the knowledge base through an entity link model to obtain second characteristic representations of the target entity and each entity in the knowledge base, wherein the entity link model is used for enabling the second characteristic representations of the entities with the same semantics to be close to each other and enabling the second characteristic representations of the entities with different semantics to be far away from each other; The entity link module is used for linking the target entity to one entity in the knowledge base according to the second characteristic representation of each entity in the target entity and the knowledge base to obtain the knowledge base after entity linking; The entity clustering module is used for carrying out entity clustering on each entity in the knowledge base after the entity linking to obtain the knowledge base after the entity linking and the entity clustering; The relationship fusion module is used for fusing the relationships corresponding to the entities in the knowledge base after the entity links and the entity clusters to obtain the knowledge base after the entity links and the entity clusters and the relationship fusion, and each entity and each relationship in the knowledge base after the entity links and the entity clusters and the relationship fusion is used for constructing the knowledge graph.
10. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor implements the steps of the knowledge fusion method of any of claims 1-8.

Description

Knowledge fusion method, device and equipment Technical Field The embodiment of the application relates to the technical field of knowledge maps, in particular to a knowledge fusion method, a knowledge fusion device and knowledge fusion equipment. Background Knowledge Graph (knowledgegraph) is a structured semantic Knowledge base, and by modeling real world entities, concepts and relationships thereof in the form of graphs, the Knowledge Graph becomes an indispensable infrastructure in the field of artificial intelligence, and plays a core role in applications such as intelligent question-answering, recommendation systems, knowledge reasoning and the like. Knowledge Fusion (knowledgefusion) is used as a core technology for constructing a Knowledge graph, and the core goal is to systematically integrate, disambiguate, deduplicate and normalize Knowledge from different data sources and different extraction batches, so as to finally form a unified, consistent and redundancy-free Knowledge representation system. However, the prior knowledge fusion system has some limitations that the entity clustering excessively depends on the traditional dimension reduction method, which may cause information loss, the standard sentence vector model is not specially designed for the entity linking task, the generated expression space may not effectively distinguish the slightly different entities, and different expressions of similar entities are erroneously judged to be different entities, and the like. Disclosure of Invention Embodiments of the present application provide a knowledge fusion method, apparatus and device, which aim to overcome or at least partially solve the above problems. An embodiment of the present application provides a knowledge fusion method, where the method includes: Encoding a target entity to be stored in a knowledge base and each entity in the knowledge base respectively to obtain respective first characteristic representations of the target entity and each entity in the knowledge base; Processing the first characteristic representations of the target entity and each entity in the knowledge base through an entity link model to obtain second characteristic representations of the target entity and each entity in the knowledge base, wherein the entity link model is used for enabling the second characteristic representations of the entities with the same semantics to be close to each other and enabling the second characteristic representations of the entities with different semantics to be far away from each other; According to the target entity and the respective second characteristic representation of each entity in the knowledge base, linking the target entity to one entity in the knowledge base to obtain a knowledge base after entity linking; performing entity clustering on each entity in the knowledge base after the entity linking to obtain the knowledge base after the entity linking and the entity clustering; and fusing the relations corresponding to the entities in the knowledge base after the entity links and the entity clusters to obtain the knowledge base after the entity links and the entity clusters and the relations are fused, and constructing a knowledge graph by using the entities and the relations in the knowledge base after the entity links and the entity clusters and the relations are fused. In an alternative embodiment, linking the target entity to one entity in the knowledge base according to the second characteristic representation of each entity in the knowledge base and the target entity, to obtain the knowledge base after entity linking, including: determining a similarity between the second feature representation of each entity in the knowledge base and the second feature representation of the target entity according to the second feature representation of the entity; And sequencing all the entities in the knowledge base according to the sequence from big to small of the similarity between the second characteristic representations, linking the target entity to any entity in K entities in the knowledge base, and obtaining the knowledge base after entity linking, wherein K is an integer greater than 0. In an alternative embodiment, performing entity clustering on each entity in the knowledge base after entity linking to obtain the knowledge base after entity linking and entity clustering, including: generating an entity diagram structure according to first characteristic representations of all entities in a knowledge base after entity linking, wherein the entity diagram structure represents topological dependency relations among the entities; Processing the entity graph structure through a graph neural network to obtain a third characteristic representation of each entity of the knowledge base after entity linkage, wherein the third characteristic representation is obtained by adopting a neighborhood aggregation mechanism; Clustering third feature representations of the entities