CN-122019840-A - Knowledge graph construction method, equipment and medium for power distribution network materials
Abstract
The invention provides a knowledge graph construction method, equipment and medium of distribution network materials, belonging to the field of intelligent material management and data management of distribution networks; the method comprises the steps of defining a label dictionary, labeling text fragments of multi-source distribution network material data according to a preset labeling rule, carrying out knowledge extraction according to labels of the multi-source distribution network material data to generate a knowledge triplet of a subject-predicate-relation object structure, carrying out distribution network material knowledge fusion on the knowledge triplet based on rule constraint and coding constraint, and generating a distribution network material knowledge map according to distribution network material knowledge fusion results. The invention constructs a set of distribution network material knowledge graph construction method based on natural language processing technology, and solves the problems of material identification conflict and expression redundancy through a material entity alignment method with double constraints of rules and codes.
Inventors
- WANG JINYOU
- CAI ZHIHUA
- ZHANG ZHONGHAN
- LI LINGXI
Assignees
- 中国电建集团福建省电力勘测设计院有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260120
Claims (10)
- 1. The knowledge graph construction method for the power distribution network materials is characterized by comprising the following steps of: preprocessing material data of the multi-source power distribution network; defining a label dictionary, wherein the label dictionary comprises three label types, namely an entity label, an attribute label and a relation label, and labeling text fragments of the multi-source distribution network material data according to a preset labeling rule; carrying out knowledge extraction according to the labels of the material data of the multi-source distribution network to generate a knowledge triplet of a subject-predicate-relation object structure; Carrying out distribution network material knowledge fusion on the knowledge triplets based on rule constraint and coding constraint; The rule constraint obtains a standardized knowledge triplet through distribution network material entity global unique identification grouping, distribution network material attribute conflict checking and information fusion in the same global unique identification grouping; The coding constraint is used for processing a distribution network material entity without an effective global unique identifier and without entering rule constraint through semantic coding and semantic vector similarity matching, and a standardized knowledge triplet is obtained by summarizing a coding constraint result; And generating a distribution network material knowledge graph according to the distribution network material knowledge fusion result.
- 2. The knowledge graph construction method for power distribution network materials according to claim 1, wherein the grouping by the global unique identification of the power distribution network material entities comprises the following steps: extracting an associated identifier of a material entity of the power distribution network, and distinguishing a global unique identifier and an invalid identifier; the distribution network material entities associated with the same global unique identifier are aggregated into a group through an identifier key value hash aggregation algorithm; The identification key value hash aggregation algorithm takes an effective global unique identification of a power distribution network as a key, takes a material entity list of the power distribution network as a value, and initializes a hash container; Traversing all the material entities of the power distribution network, extracting the global unique identification field value of each material entity, and completing the validity verification of the identification value through regular matching; And calculating a hash value for the global unique identifier passing the validity check, and mapping the material entity corresponding to the distribution network to an entity list corresponding to the same key in the hash container to complete the aggregation of the packets.
- 3. The knowledge graph construction method for the power distribution network materials, as claimed in claim 1, is characterized in that the coding constraint generates a power distribution network material entity semantic vector through an all-MiniLM-L6-v2 model.
- 4. The knowledge graph construction method for the power distribution network materials according to claim 1, wherein the semantic vector similarity is calculated by adopting a cosine similarity algorithm, and a calculation formula is as follows: ; Wherein, the Semantic vectors of the material entity a of the power distribution network; semantic vectors of the material entities b of the power distribution network; Is the ith dimension component of semantic vector V a ; Is the i-th dimensional component of the semantic vector V b .
- 5. The knowledge graph construction method for the power distribution network materials according to claim 4, wherein the power distribution network material entities without effective identification and without entering rule constraint are processed through semantic vector similarity matching, and the method is specifically characterized by comprising the following steps: screening the distribution network material entities with semantic vector similarity more than or equal to a preset value to perform core attribute verification, and if the core attributes are not in conflict, classifying the distribution network material entities into the same distribution network material entity alignment group; And judging that the core attribute conflicts are different power distribution network material entities.
- 6. The knowledge graph construction method for power distribution network materials according to claim 1, wherein the preprocessing comprises: Integrity check and missing mark of identification field of material identification data of the power distribution network; Regular matching and unified text format specification; removing redundant blank spaces and repeated contents; splitting unstructured data and filtering phrases without actual semantics; And unifying the synonym texts according to the distribution network material synonym dictionary.
- 7. The knowledge graph construction method for the power distribution network materials according to claim 1, wherein the preset labeling rules comprise two types of labeling rules; The first labeling rule is directly mapped to a corresponding label for a fixed field; and secondly, establishing regular expression matching for the attribute type text fragments according to keywords.
- 8. The knowledge graph construction method for the power distribution network materials according to claim 1, wherein the knowledge graph is stored by taking a Neo4j database based on a graph data model as a carrier, and the standardized knowledge triples and the power distribution network material data are imported into the Neo4j database for storage.
- 9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the computer program when run by the processor performs the method of any one of claims 1 to 7.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 7.
Description
Knowledge graph construction method, equipment and medium for power distribution network materials Technical Field The invention belongs to the field of intelligent material management and data management of power distribution networks, and particularly relates to a knowledge graph construction method, equipment and medium for power distribution network materials. Background The distribution network materials refer to various materials and materials for construction, maintenance and operation of a power system, are key elements for ensuring safe, reliable and efficient operation of the power system, but the construction scale of the distribution network engineering is large, the distribution network materials are numerous in types, and the problems of material identification conflict and expression redundancy are commonly existed in the material data of the multi-source distribution network, so that the material data of the current multi-source distribution network needs to be manually processed; The Chinese patent application document with the publication number of CN118657469A discloses a power grid material group management system based on a large model technology, and the management efficiency and the engineering response capability of the engineering materials of a power distribution network are improved through the large model technology, but the problems of conflict of the material identification and expression redundancy of the power distribution network are not optimized; Therefore, a knowledge graph capable of eliminating the conflict of the material identification and the expression redundancy of the power distribution network is needed, and the intelligent level of the material management of the power distribution network is improved. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a knowledge graph construction method, equipment and medium for power distribution network materials, and a set of power distribution network material knowledge graph construction method based on a natural language processing technology is constructed, and the problems of material identification conflict and expression redundancy are solved through a material entity alignment method with double constraints of rules and codes. The technical scheme of the invention is as follows: in a first aspect, the invention provides a knowledge graph construction method for materials of a power distribution network, which comprises the following steps: preprocessing material data of the multi-source power distribution network; defining a label dictionary, wherein the label dictionary comprises three label types, namely an entity label, an attribute label and a relation label, and labeling text fragments of the multi-source distribution network material data according to a preset labeling rule; carrying out knowledge extraction according to the labels of the material data of the multi-source distribution network to generate a knowledge triplet of a subject-predicate-relation object structure; Carrying out distribution network material knowledge fusion on the knowledge triplets based on rule constraint and coding constraint; The rule constraint obtains a standardized knowledge triplet through distribution network material entity global unique identification grouping, distribution network material attribute conflict checking and information fusion in the same global unique identification grouping; The coding constraint is used for processing a distribution network material entity without an effective global unique identifier and without entering rule constraint through semantic coding and semantic vector similarity matching, and a standardized knowledge triplet is obtained by summarizing a coding constraint result; And generating a distribution network material knowledge graph according to the distribution network material knowledge fusion result. Further, the grouping of the globally unique identifiers of the material entities through the power distribution network comprises the following steps: extracting an associated identifier of a material entity of the power distribution network, and distinguishing a global unique identifier and an invalid identifier; the distribution network material entities associated with the same global unique identifier are aggregated into a group through an identifier key value hash aggregation algorithm; The identification key value hash aggregation algorithm takes an effective global unique identification of a power distribution network as a key, takes a material entity list of the power distribution network as a value, and initializes a hash container; Traversing all the material entities of the power distribution network, extracting the global unique identification field value of each material entity, and completing the validity verification of the identification value through regular matching; And calculating a hash value for the global unique identifi