CN-121979958-A - Semantic alignment-based multi-source heterogeneous data automatic mapping method and device
Abstract
The application relates to a multi-source heterogeneous data automatic mapping method and device based on semantic alignment, wherein the method comprises the steps of determining a power entity and an entity feature vector according to a power domain ontology model, determining a power entity mapped by a field to be aligned based on similarity of the entity feature vector and the feature vector of the field to be aligned, updating a message transfer mechanism by taking a constructed entity constraint condition as a constraint condition of a message function, inputting a full-connection graph serving as a node to be aligned Ji Ziduan into a graph neural network, obtaining an embedded vector of each node based on the updated message transfer mechanism, determining a candidate mapping relation according to semantic similarity of the two embedded vectors, performing entity type conflict detection and field conflict detection, correcting the candidate mapping relation according to a detection result, and generating data in a target format based on the corrected candidate mapping relation. The application realizes the sharing, management and compatibility of the multi-source heterogeneous data in different systems.
Inventors
- ZHANG YONGJIAN
- ZHU YUANGENG
- ZHAO ZHENXIA
- XIE YING
- Yao Huangfu
- ZHANG ZHIYUAN
- LI NAIYI
- QIAN YIHONG
- ZHAO FENG
- YUAN YUFENG
- HAN BAOLI
- GAO XIAOXIN
- Xia Baobing
- HU LIHUI
Assignees
- 国网浙江省电力有限公司绍兴供电公司
- 国网浙江省电力有限公司
- 国网信息通信产业集团有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260126
Claims (10)
- 1. The automatic mapping method for the multi-source heterogeneous data based on semantic alignment is characterized by comprising the following steps of: Constructing an electric power domain ontology model based on a metadata set and an electric power domain constraint rule, determining an electric power entity set and entity feature vectors of each electric power entity through the electric power domain ontology model, and determining the electric power entity mapped by each field to be aligned based on the similarity of the entity feature vectors and feature vectors of the fields to be aligned; Constructing entity constraint conditions based on mapping consistency of fields and the electric power entity, entity semantic relation and path similarity of the electric power entity, and constructing a message transfer mechanism by using the constrained message function, aggregation function and update function by taking the entity constraint conditions as constraint conditions of the message function; constructing a full-connection graph by taking the to-be-paired Ji Ziduan as nodes and taking the similarity between fields as an edge weight, inputting the full-connection graph into a graph neural network, and iteratively updating the node vector of each node by the graph neural network through the message transmission mechanism to obtain the embedded vector of each node; When the semantic similarity of the embedded vectors of any two nodes is larger than a preset threshold, forming two fields to be aligned corresponding to the two nodes into candidate mapping relations, performing entity type conflict detection and field conflict detection on each candidate mapping relation, and correcting the candidate mapping relations according to detection results; And processing the data contained in the field to be aligned based on the corrected candidate mapping relation to generate data in a target format.
- 2. The semantic alignment-based multi-source heterogeneous data automatic mapping method according to claim 1, further comprising, after said entity class conflict detection and field conflict detection for each of said candidate mappings: generating a semantic link according to the field to be aligned, the semantic vector of the field to be aligned, the propagation weight of the graph neural network, the electric entity and the mapping result; Based on the detection result, determining conflict nodes of the semantic links and obtaining the confidence of the conflict nodes; and when the confidence coefficient is larger than or equal to a first preset threshold value, determining conflict contents to be corrected according to the conflict type in the conflict node.
- 3. The automatic mapping method of multi-source heterogeneous data based on semantic alignment according to claim 1, wherein the entity constraint information includes entity class consistency constraint, semantic relation constraint and ontology path similarity constraint, the constructing entity constraint condition based on mapping consistency of fields and the power entity, entity semantic relation and path similarity of the power entity includes: constructing the entity class consistency constraint based on whether the power entities mapped by any two nodes are the same; constructing the semantic relation constraint based on the semantic relation between the electric power entities mapped by any two nodes; and constructing the body path similarity constraint based on the path similarity of the mapping power entity of any two nodes.
- 4. The semantic alignment-based multi-source heterogeneous data automatic mapping method according to claim 3, wherein the constraint condition using the entity constraint condition as a constraint condition of a message function constructs a message transfer mechanism with the constrained message function, aggregation function and update function, and comprises the following steps: Acquiring an original vector of the node based on the entity class consistency constraint, wherein if any one of the node and a neighbor node are mapped to the same electric entity, a non-zero original vector between the node and the neighbor node is acquired, and if any one of the node and the neighbor node are mapped to mutually exclusive electric entities, the original vector between the node and the neighbor node is a zero vector; generating a virtual edge of the node based on the semantic relation constraint, and updating a neighborhood of the node based on the edge of the node and the virtual edge; Based on the ontology path similarity constraint, taking the similarity constraint as a bias term, and updating the attention weight of the message function based on the bias term; And for each node, aggregating all the original vectors of the node through the aggregation function based on the updated neighborhood to obtain an aggregation representation of the node, and updating the embedded vector of the node through the update function based on the aggregation representation and the characteristic vector of the current node.
- 5. The semantic alignment-based multi-source heterogeneous data automatic mapping method according to claim 1, wherein the determining the set of power entities and the entity feature vector of each power entity comprises: determining a plurality of electric power entities and characteristic information of each electric power entity based on the electric power domain ontology model; And vectorizing the characteristic information through a word vector model to obtain the entity characteristic vector of each electric entity.
- 6. The automatic mapping method for multi-source heterogeneous data based on semantic alignment according to claim 1, further comprising, after correcting the candidate mapping relation according to the detection result: Determining the comprehensive confidence coefficient of each corrected candidate mapping relation based on the semantic similarity, the association weight of the nodes to be aligned in the graph neural network and a field matching condition; When the comprehensive confidence coefficient is smaller than a preset confidence coefficient threshold value, the candidate mapping relation corresponding to the comprehensive confidence coefficient is taken as an object to be corrected; and carrying out the entity category conflict detection and the field conflict detection on the object to be corrected, correcting the object to be corrected when the conflict is detected, and feeding the corrected object to be corrected back to a mapping rule base and a semantic model.
- 7. The automatic mapping method of multi-source heterogeneous data based on semantic alignment according to claim 1, wherein the processing the data contained in the field to be aligned based on the modified candidate mapping relationship to generate data in a target format comprises: determining the structure of output data according to the corrected candidate mapping relation; Reconstructing the data contained in the field to be aligned based on the structure, and generating the data in the target format according to the structure by the reconstructed data.
- 8. The automatic mapping method of multi-source heterogeneous data based on semantic alignment according to claim 1, wherein the side weights are specifically configured to: obtaining the feature vector of each node, and obtaining cosine similarity and Jaccard similarity of any two feature vectors; Determining an editing distance based on text codes of the two feature vectors; based on the electric power entity mapped by the node, determining enhanced similarity according to the vector similarity of entity feature vectors of the two electric power entities and a domain dictionary; and determining the side weight according to the cosine similarity, the Jaccard similarity, the editing distance, the enhanced similarity and preset weights.
- 9. A semantic alignment-based multi-source heterogeneous data automatic mapping device, comprising: The power entity determining module is used for constructing a power domain ontology model based on a metadata set and a power domain constraint rule, determining a power entity set and entity feature vectors of each power entity through the power domain ontology model, and determining the power entity mapped by each field to be aligned based on the similarity of the entity feature vectors and the feature vectors of the fields to be aligned; A message transfer mechanism updating module, configured to construct an entity constraint condition based on mapping consistency of a field and the electric power entity, entity semantic relation and path similarity of the electric power entity, and construct a message transfer mechanism with the constrained message function, aggregation function and updating function by taking the entity constraint condition as a constraint condition of the message function; The graph neural network processing module is used for constructing a full-connection graph by taking the similarity among fields as an edge weight, inputting the full-connection graph into a graph neural network, and iteratively updating the node vector of each node by the graph neural network through the message transmission mechanism to obtain the embedded vector of each node; The correction module is used for forming candidate mapping relations by two fields to be aligned corresponding to any two nodes when the semantic similarity of the embedded vectors of the two nodes is larger than a preset threshold value, carrying out entity category conflict detection and field conflict detection on each candidate mapping relation, and correcting the candidate mapping relations according to detection results; and the generation module is used for processing the data contained in the field to be aligned based on the corrected candidate mapping relation and generating data in a target format.
- 10. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the semantic alignment based multi-source heterogeneous data auto-mapping method according to any one of claims 1 to 8 when the computer program is executed.
Description
Semantic alignment-based multi-source heterogeneous data automatic mapping method and device Technical Field The application relates to the field of data processing, in particular to a multi-source heterogeneous data automatic mapping method and device based on semantic alignment. Background In the power industry, different power service systems are responsible for different power services, and different service data, such as standard clauses, power equipment parameters, technical specifications, etc., are located in different service systems. The semantic expressions of field naming, data format, metering units and the like of the same data are different among the electric power business systems. The data of different power business systems become multi-source heterogeneous data. In the case of processing power tasks with different system data, multi-source heterogeneous data is difficult to achieve data sharing, semantic alignment, and data fusion. On the other hand, a large number of electric power standard clauses exist in a text form of natural language, and a computer has difficulty in directly understanding the natural language and lacks a unified semantic expression mode, so that multi-source heterogeneous data is difficult to integrate and automatically map. In the prior art, the integration and mapping of data are generally carried out by adopting a manually formulated field mapping rule or a simple character string matching algorithm, the manually formulated field mapping rule cannot cover massive and dynamic professional expressions, the simple character string matching algorithm cannot cover complex semantic structures and the implicit semantics of texts based on character surface layer similarity, so that the semantic mapping error of multi-source heterogeneous data at the field level is caused, the situation of unit conflict, repeated definition and logic contradiction occurs after the data are combined, and therefore, the problem that the multi-source heterogeneous data among different power service systems are difficult to share, compatible and manage exists. Disclosure of Invention The embodiment of the application provides a semantic alignment-based multi-source heterogeneous data automatic mapping method and device, which at least solve the problem that multi-source heterogeneous data is difficult to share, compatible and manage in the related technology. In a first aspect, an embodiment of the present application provides a method for automatically mapping multi-source heterogeneous data based on semantic alignment, including: Constructing an electric power domain ontology model based on a metadata set and an electric power domain constraint rule, determining an electric power entity set and entity feature vectors of each electric power entity through the electric power domain ontology model, and determining the electric power entity mapped by each field to be aligned based on the similarity of the entity feature vectors and feature vectors of the fields to be aligned; Constructing entity constraint conditions based on mapping consistency of fields and the electric power entity, entity semantic relation and path similarity of the electric power entity, and constructing a message transfer mechanism by using the constrained message function, aggregation function and update function by taking the entity constraint conditions as constraint conditions of the message function; constructing a full-connection graph by taking the to-be-paired Ji Ziduan as nodes and taking the similarity between fields as an edge weight, inputting the full-connection graph into a graph neural network, and iteratively updating the node vector of each node by the graph neural network through the message transmission mechanism to obtain the embedded vector of each node; When the semantic similarity of the embedded vectors of any two nodes is larger than a preset threshold, forming two fields to be aligned corresponding to the two nodes into candidate mapping relations, performing entity type conflict detection and field conflict detection on each candidate mapping relation, and correcting the candidate mapping relations according to detection results; And processing the data contained in the field to be aligned based on the corrected candidate mapping relation to generate data in a target format. In an embodiment, after the entity class conflict detection and the field conflict detection are performed on each candidate mapping relationship, the method further includes: generating a semantic link according to the field to be aligned, the semantic vector of the field to be aligned, the propagation weight of the graph neural network, the electric entity and the mapping result; Based on the detection result, determining conflict nodes of the semantic links and obtaining the confidence of the conflict nodes; and when the confidence coefficient is larger than or equal to a first preset threshold value, determining conflic