CN-122021635-A - Engineering entity alignment method based on generation type semantic enhancement and structural feature reasoning
Abstract
The invention relates to an engineering entity alignment method based on generation type semantic enhancement and structural feature reasoning, and belongs to the technical field of knowledge graph construction and multi-source data fusion. The method comprises the steps of constructing a semantic enhanced entity text sequence, embedding the semantic enhanced entity text sequence into a prompt template containing an alignment task instruction, inputting a large language model to obtain entity semantic vectors containing engineering context logic, capturing structural topological features among entities by utilizing a graph neural network, fusing the entity semantic vectors with the structural topological features, calculating preliminary similarity among the entities, screening out candidate alignment sets, converting entity pairs in the candidate alignment sets and neighborhood difference information thereof into a natural language reasoning task, generating a judgment reason by utilizing the large language model, and calculating final alignment probability. The invention can effectively break through the semantic gap brought by engineering data sparseness, and remarkably improve the accuracy and the robustness of alignment of the cross-professional engineering entity.
Inventors
- Liao Duxue
- ZHANG ZHOUMING
- WANG YI
- ZHANG HUI
- JIANG YUECHUN
- YANG WENJUN
Assignees
- 贾维斯智能科技(云南)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260403
Claims (9)
- 1. The engineering entity alignment method based on the generated semantic enhancement and the structural feature reasoning is characterized by comprising the following steps: Step1, deducing and generating auxiliary attribute description based on a known context by using a large language model aiming at an attribute sparse entity in a map to be aligned, and constructing a semantic enhanced entity text sequence; Step2, embedding the entity text sequence with enhanced semantics into a prompt template containing a semantic alignment task instruction, and inputting a large language model to obtain entity semantic vectors containing engineering context logic; step3, capturing structural topological features among entities by utilizing a graph neural network, fusing the structural topological features with semantic vectors of the entities, calculating preliminary similarity among the entities, and screening out candidate alignment sets; step4, converting entity pairs in the candidate alignment set and neighborhood difference information thereof into a natural language reasoning task, generating a judgment reason by using a large language model, and calculating final alignment probability.
- 2. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein Step1 comprises the steps of: screening sparse entities with attribute quantity lower than a preset threshold in the map to be aligned, extracting names of the sparse entities and first-order neighborhood structure information of the sparse entities, and constructing an inference context containing engineering professional categories and local topological features; inputting the reasoning context into a large language model, generating a supplementary description text implied by the sparse entity according to engineering common sense, and splicing the supplementary description text with the original entity name to form a semantic enhanced entity text sequence.
- 3. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein Step2 comprises the steps of: Constructing a prompt template containing task instructions of engineering expert role setting and semantic alignment, filling the enhanced entity text sequence into the prompt template, and generating an input instruction containing task context; And transmitting the input instruction into a large language model, and extracting the output characteristics of the hidden state of the last layer of the model or the instruction fine adjustment layer as entity semantic vectors containing engineering context logic.
- 4. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein Step3 comprises the steps of: Modeling neighborhood nodes of the engineering entity by using a graph attention network, calculating structural attention weights according to the connection relation among the nodes, and weighting and aggregating neighbor features to obtain a structural embedded representation; splicing or weighting fusion is carried out on the structure embedded representation and the entity semantic vector obtained by Step2, a unified entity feature vector is constructed, and a plurality of entity pairs with highest similarity are selected to form a candidate alignment set by calculating vector cosine similarity.
- 5. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein Step4 comprises the steps of: screening entity pairs with similarity scores in a preset fuzzy interval from the candidate alignment set as difficult cases, converting attribute differences and neighborhood structure differences of paired entities into natural language descriptions, and constructing entity alignment reasoning instructions; Inputting the entity alignment reasoning instruction into a large language model, generating a logic basis and a confidence score for judging whether two entities point to the same object through a thinking chain mechanism, carrying out weighted fusion on the confidence score and the preliminary similarity calculated by Step3, and determining a final alignment result.
- 6. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein the specific steps of Step1 include: Step1.1, defining a source engineering knowledge graph to be aligned And target engineering knowledge graph Any map Represented as Wherein A set of entities is represented and, A set of relationships is represented and, Representing a set of attribute names, Representing a set of attribute values; for any entity in the graph Its original set of attributes is expressed as Its first-order neighborhood structure set is expressed as ; Step1.2, calculating the attribute density of each entity in the map , Representing statistics of A kind of electronic device Is used in the number of (a) and (b), Corresponding to the obtained value, a sparse threshold is set If (if) Marking the Is a sparse entity; For sparse entities Construction of context prompt input to generate inferences, i.e. inference context : (1); Wherein, the Representing the engineering specialty class to which the entity belongs, For the name of the entity, The text description is formed by serializing neighbor node information; step1.3, build Attribute Generation Command Combine it with inference context Joint input large language model LLM, generating virtual attribute text sequence : (2); Wherein, the A text-splicing operation is represented and, Adopting a generating model with engineering common sense to generate The method comprises the steps of including technical parameters implied by an entity; step1.4, fusing the original entity information with the generated virtual attribute text sequence to construct a final semantic enhanced entity text sequence : (3); Wherein, the Representing flattening operations, attribute sets to be structured To a single-dimensional sequence of consecutive text.
- 7. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein the specific Step of Step2 comprises the following steps: Step2.1, designing a prompt template facing alignment task Comprises character setting, task description and input slot position, and entity text sequence enhanced by semanteme Constructing an input instruction : (4); Step2.2, will Inputting encoder of pre-training large language model, extracting vector corresponding to [ CLS ] mark or instruction mark in last layer hidden state of model, and using the vector as semantic embedded representation of entity, namely entity semantic vector : (5); Wherein, the For the dimension of the semantic vector, Representing the encoder of the pre-trained large language model.
- 8. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein the specific Step of Step3 comprises the following steps: Step3.1, use of GAT to entity Is aggregated by neighborhood of (1) first computing entity With its neighbour entities Attention coefficient between : (6); (7); Wherein, the As a matrix of linear transformations that can be learned, In order for the attention vector to be of interest, The vector concatenation is represented by a concatenation of vectors, For a first-order neighborhood structure set representation of an entity, Is an entity Is a function of the semantic vector of (a), Is an entity Neighbor entities of (a) Entity semantic vectors of (a); is a nonlinear activation function; Representing a central entity With its neighbour entities Raw attention score in between, while Then we refer broadly to the strength of association between the central entity and any neighbor k; Aggregating neighbor features based on attention weights to obtain a structure embedded representation : (8); Wherein, the Representing a sigmoid activation function; Step3.2, entity Semantic vector of (a) Embedding representations with structures Performing self-adaptive weighted fusion to obtain a final entity feature vector : (9); Wherein, the The weight is represented by a weight that, Is the weight of the paranoid; step3.3 for entities in Source engineering knowledge graph And entities in a target profile Calculating vector cosine similarity : (10); Selecting the first K entity pairs with highest similarity to form a candidate alignment set , For the source engineering knowledge graph to be aligned, For the knowledge graph of the target engineering, As a threshold value for the similarity degree, For entities in the target map Is a physical feature vector of (1).
- 9. The method for aligning engineering entities based on generated semantic enhancement and structural feature reasoning according to claim 1, wherein the specific Step of Step4 comprises the following steps: step4.1, define fuzzy interval If the entity pairs Cosine similarity of (2) Marking the verification result as a difficult case to be verified; Step4.2, extract entity pairs Is characterized by the difference of attribute difference set And neighbor difference set Constructing reasoning prompt words The construction process of the prompt word is formally expressed as: (11); Wherein, the For the text-splicing operation, Representing task instruction text; Representing a mental chain guide text; Step4.3, will Inputting a large model to make it output the reasoning step and judge the conclusion, and mapping the conclusion to confidence probability ; Calculating a final alignment score : (12); Wherein, the For the balance coefficient, if If the set threshold is larger than the set threshold, determining the entity And (3) with Alignment.
Description
Engineering entity alignment method based on generation type semantic enhancement and structural feature reasoning Technical Field The invention relates to an engineering entity alignment method based on generation type semantic enhancement and structural feature reasoning, and belongs to the technical field of knowledge graph construction and multi-source data fusion. Background Along with the rapid development of intelligent engineering and digital twin technology, mass multi-source heterogeneous data are accumulated in the whole life cycle of engineering, and the structural form, design parameters and running state of an entity are recorded, so that the intelligent engineering and digital twin technology is a core base stone for constructing a knowledge graph in the engineering field. However, engineering data has remarkable cross-specialty and cross-stage characteristics, different specialty are huge in entity naming and description granularity, and the focus of the same entity in the design, construction and operation and maintenance stages is quite different, so that the overlapping degree of attribute fields in a cross-source knowledge base is low, and high data sparsity and expression isomerism are presented. The method makes the same engineering entity difficult to accurately identify when being aligned across sources, and severely restricts the deep fusion of engineering knowledge. Existing entity alignment techniques have significant limitations depending on string similarity or on shallow semantic features that represent learning. While conventional string matching is difficult to handle engineering abbreviations and aliases, existing neural networks or pre-trained language models are difficult to generate feature representations with differentiation due to lack of sufficient semantic context support in the face of long-tailed entities with severe attribute deficiencies. In addition, the existing model based on vector space distance lacks a logic reasoning mechanism similar to human expert, and is often difficult to effectively disambiguate when facing complex entities with similar names but different functions or consistent topology but different names. Therefore, there is a need for an engineering entity alignment method based on generative semantic enhancement and structural feature reasoning, which fills in the semantic gap and introduces deep logic discrimination through intelligence so as to significantly improve the accuracy and robustness of the alignment of the engineering field across professional entities. Disclosure of Invention In order to solve the problems in the background art, the invention provides an engineering entity alignment method based on the generated semantic enhancement and the structural feature reasoning, which remarkably improves the accuracy and the robustness of the cross-professional entity alignment in the engineering field. The technical scheme of the invention is that the engineering entity alignment method based on the generated semantic enhancement and the structural feature reasoning comprises the following steps: Step1, deducing and generating auxiliary attribute description based on a known context by using a large language model aiming at an attribute sparse entity in a map to be aligned, and constructing a semantic enhanced entity text sequence; Step2, embedding the entity text sequence with enhanced semantics into a prompt template containing a semantic alignment task instruction, and inputting a large language model to obtain entity semantic vectors containing engineering context logic; step3, capturing structural topological features among entities by utilizing a graph neural network, fusing the structural topological features with semantic vectors of the entities, calculating preliminary similarity among the entities, and screening out candidate alignment sets; step4, converting entity pairs in the candidate alignment set and neighborhood difference information thereof into a natural language reasoning task, generating a judgment reason by using a large language model, and calculating final alignment probability. Further, step1 includes: screening sparse entities with attribute quantity lower than a preset threshold in the map to be aligned, extracting names of the sparse entities and first-order neighborhood structure information of the sparse entities, and constructing an inference context containing engineering professional categories and local topological features; inputting the reasoning context into a large language model, generating a supplementary description text implied by the sparse entity according to engineering common sense, and splicing the supplementary description text with the original entity name to form a semantic enhanced entity text sequence. Further, step2 includes: Constructing a prompt template containing task instructions of engineering expert role setting and semantic alignment, filling the enhanced entity text sequence i