CN-121981228-A - Diffusion model-based multi-mode knowledge graph completion method and system

CN121981228ACN 121981228 ACN121981228 ACN 121981228ACN-121981228-A

Abstract

The invention provides a multi-mode knowledge graph completion method and system based on a diffusion model, wherein the method comprises the steps of obtaining a fact triplet of a knowledge graph to be completed, mapping the fact triplet into pure noise based on a multi-mode contrast representation fusion mechanism and a forward process of the diffusion model, adopting a lightweight fact condition denoising device based on a multi-layer perceptron to gradually denoise the pure noise based on a reverse process of the diffusion model, generating target fact embedding, and realizing automatic completion of the knowledge graph to be completed based on the target fact embedding. According to the invention, complex semantic distribution in multi-modal knowledge is uniformly modeled by introducing a multi-modal contrast characterization fusion mechanism and a condition generation framework based on a diffusion model. The method realizes cross-modal semantic alignment and high-quality knowledge graph representation generation, and remarkably improves the accuracy and generalization capability of multi-modal knowledge graph complementation.

Inventors

LI JIE
LI JINCHENG
CHENG GAOFENG
YANG SIMIN
LUO RONG
ZHAO KE
LIU ZEYI
DENG YUQIU
CUI ZHEN
QIU ZHANHONG
ZHANG HUASEN

Assignees

北京邮电大学
中国人民解放军91054部队

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. The multi-mode knowledge graph completion method based on the diffusion model is characterized by comprising the following steps of: Acquiring a fact triplet of the knowledge graph to be complemented; mapping the fact triples into pure noise based on a multimode contrast characterization fusion mechanism and a forward process of a diffusion model; The reverse process based on the diffusion model adopts a lightweight fact condition denoising device based on a multilayer perceptron to gradually denoise pure noise, and target fact embedding is generated; and realizing automatic completion of the knowledge graph to be completed based on target fact embedding.
2. The method of claim 1, wherein the mapping of the fact triples to pure noise based on the multi-modal contrast characterizes the forward process of the fusion mechanism and diffusion model comprises: constructing a learnable fact embedding module based on a multi-mode contrast characterization fusion mechanism, and mapping a fact triplet to a continuous vector space to obtain fact embedding; Based on the forward process of the diffusion model, noise is gradually added to the fact embedding, and is mapped to pure noise.
3. The method of claim 2, wherein the fact embedding module comprises an entity multi-modal contrast characterization layer and a relationship characterization layer; the entity multi-mode contrast characterization layer is used for fusing semantic information of the entity under different modes; and the relation characterization layer is used for obtaining a relation vector representation by adopting linear transformation.
4. The method of claim 1, wherein the step-wise denoising of pure noise using a lightweight fact-based conditional denoising apparatus based on a multi-layer perceptron based on a diffusion model inverse process, the method of generating target fact embedding comprises: the loss function performs gradient updates at each denoising step: ; ; Wherein, the In order to fix the learning rate, Belonging to Is a subset of the set of (c), Is shown in the first In the diffusion time step, the current estimated value of the corresponding head entity and the relation part in the fact embedding, Expressed in time steps Is embedded in the noise fact of (a), Representing embedding by a real header entity Embedding with relation Joint embedding obtained by splicing is carried out; Representing a loss function Relative to the variables Is a gradient of (a).
5. The method of claim 1, wherein the lightweight fact-based conditional denoiser based on a multi-layer perceptron comprises: ; ; ; Wherein, the And (3) with Embedding vectors respectively representing head entities and relationships; Representing the final conditional embedding generated by the conditional encoder; Representing the Hadamard product in vector space; expressed in time steps Is embedded in the noise fact of (a); Embedding for time steps; intermediate feature representation calculated for the denoising module; Noise vector ConditionEncoder for model prediction Condition encoder LINEARLAYER A linear transformation is represented and is used to represent, Indicating denoising operations or denoising functions, LN Representing a layer normalization operation.
6. A multi-mode knowledge graph completion system based on a diffusion model, wherein the system is used for realizing the method of any one of claims 1-5, and the system comprises an acquisition module, a noise adding module, a denoising module and a completion module; the acquisition module is used for acquiring the fact triples of the to-be-complemented knowledge graph; The noise adding module is used for mapping the fact triples into pure noise based on the forward process of the multimode contrast representation fusion mechanism and the diffusion model; The denoising module is used for gradually denoising pure noise by adopting a lightweight fact condition denoising device based on a multilayer perceptron based on a reverse process of the diffusion model to generate target fact embedding; And the completion module is used for realizing automatic completion of the knowledge graph to be completed based on target fact embedding.
7. The system of claim 6, wherein the noise adding module comprises a mapping unit and a noise adding unit; The mapping unit is used for constructing a learnable fact embedding module based on a multi-mode contrast characterization fusion mechanism, mapping the fact triples to a continuous vector space and obtaining fact embedding; And the noise adding unit is used for gradually adding noise into the fact embedding based on the forward process of the diffusion model and mapping the noise into pure noise.
8. The system of claim 7, wherein the fact embedding module comprises an entity multi-modal contrast characterization layer and a relationship characterization layer; the entity multi-mode contrast characterization layer is used for fusing semantic information of the entity under different modes; and the relation characterization layer is used for obtaining a relation vector representation by adopting linear transformation.
9. The system of claim 6, wherein the method for generating the target fact embedding by gradually denoising pure noise using a lightweight fact condition denoising device based on a multi-layer perceptron based on a reverse process of a diffusion model comprises: the loss function performs gradient updates at each denoising step: ; ; Wherein, the In order to fix the learning rate, Belonging to Is a subset of the set of (c), Is shown in the first In the diffusion time step, the current estimated value of the corresponding head entity and the relation part in the fact embedding, Expressed in time steps Is embedded in the noise fact of (a), Representing embedding by a real header entity Embedding with relation Joint embedding obtained by splicing is carried out; Representing a loss function Relative to the variables Is a gradient of (a).
10. The system of claim 6, wherein the lightweight fact-based conditional denoiser based on a multi-layer perceptron comprises: ; ; ; Wherein, the And (3) with Embedding vectors respectively representing head entities and relationships; Representing the final conditional embedding generated by the conditional encoder; Representing the Hadamard product in vector space; expressed in time steps Is embedded in the noise fact of (a); Embedding for time steps; intermediate feature representation calculated for the denoising module; Noise vector ConditionEncoder for model prediction Condition encoder LINEARLAYER A linear transformation is represented and is used to represent, Indicating denoising operations or denoising functions, LN Representing a layer normalization operation.

Description

Diffusion model-based multi-mode knowledge graph completion method and system Technical Field The invention belongs to the technical field of multi-mode knowledge graph completion, and particularly relates to a multi-mode knowledge graph completion method and system based on a diffusion model. The technology can be applied to scenes of knowledge management, intelligent question-answering, recommendation systems and the like which need to automatically infer potential knowledge relations from structured and multi-mode information, and is used for realizing automatic completion and semantic enhancement of knowledge. Background The existing knowledge graph completion method is mainly based on knowledge graph embedding (Knowledge Graph Embedding, KGE) technology. The basic idea of this type of method is to map entities and relationships in a knowledge graph into a continuous vector space and measure the rationality of the triplet (entity-relationship-entity) facts by defining a scoring function, thereby achieving the inference of missing facts. Common representative methods include: the DistMult model is characterized in that each relation is expressed as a diagonal matrix, fact scores are modeled through bilinear interaction between entity embeddings, and the DistMult model is suitable for modeling symmetrical relations; a RotatE model, wherein entities are represented as points in a complex space, the relationship is modeled as rotation operation, and symmetrical, antisymmetric, inversion and combined relationship modes can be captured simultaneously; GIE model interactive learning in Euclidean, hyperbolic and hypersphere space to strengthen the adaptability of model to different geometric modes. In addition, some studies have attempted to integrate multiple embedded models or to fuse different scoring functions using an attention mechanism in an effort to capture richer patterns of relationships. However, such methods still rely on explicitly defined scoring functions, whose expressive power is limited by the functional form, and are difficult to cover the complex and diverse connection patterns in real knowledge maps. Meanwhile, due to the fact that the relationship types of the real knowledge graphs are numerous and nonlinear dependence exists, the real knowledge graphs are difficult to fully model by a single model or a limited model combination. Specifically, the defects and problems of the existing multi-mode knowledge graph completion method mainly comprise the following points: 1. The mode fusion is insufficient, that is, most of the existing methods only adopt simple mode splicing or weighted summation, and cannot effectively align semantic spaces among different modes (text, images and structures), so that information deviation and noise exist in fusion characterization. 2. The fact completion still depends on an explicit scoring function, and complex relation distribution is difficult to capture, namely the current multi-mode KGE model is used for judging the fact rationality based on scoring functions (such as TransE, distMult and the like), and high-order nonlinear association in multi-mode knowledge is difficult to express. 3. The generating formula is not enough, the unobserved or sparse relation is difficult to process, most methods are discriminant models, only the existing modes can be classified or matched, and the unobserved relation or modal combination is limited in supplementing capability. The causes of these disadvantages and problems include: 1. the method has strong modal difference, namely, feature spaces of different modalities (text, image and structure) are inconsistent, and a unified alignment and constraint mechanism is lacked, so that insufficient semantic fusion is caused. 2. The scoring function expression is limited, the traditional method depends on a predefined scoring function, the form is fixed, and complex nonlinear relations among multiple modes are difficult to capture. 3. The method lacks the generative modeling capability, most models only perform discriminant reasoning, and potential reasonable facts cannot be generated from the distribution angle, so that the generalization of complementation is limited. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a multimode knowledge graph completion method and system based on a diffusion model, which aims to uniformly model complex semantic distribution in multimode knowledge by introducing a multimode contrast characterization fusion mechanism and a condition generation framework based on the diffusion model, realize cross-mode semantic alignment and high-quality knowledge graph representation generation and improve the accuracy and generalization capability of multimode knowledge graph completion. In order to achieve the above object, the present invention provides the following solutions: a multi-modal knowledge graph completion method based on a diffusion model, the meth