CN-122019837-A - Method and system for constructing graph database based on multidimensional polymorphic digitized architecture

CN122019837ACN 122019837 ACN122019837 ACN 122019837ACN-122019837-A

Abstract

The embodiment of the invention provides a method and a system for constructing a graph database based on a multidimensional multi-state digital architecture, belonging to the technical field of graph databases. The method comprises the steps of obtaining multidimensional multi-state digital architecture data, preprocessing the obtained data, carrying out entity identification and combination on the multidimensional multi-state digital architecture data after preprocessing, carrying out attribute fusion on the identified entities while carrying out entity identification and combination on the multidimensional multi-state digital architecture data, carrying out fusion on relations among the entities based on the entity identification and the attribute fusion, constructing a graph database based on the entities after the attribute fusion and the relation fusion, and carrying out optimization and periodical updating on the constructed database. The method can collect data of a large number of architecture assets, perform relation fusion, and then construct an intuitive node diagram so as to intuitively display the relevance among the architecture assets.

Inventors

ZHANG YU
HU DONGLIANG
ZHANG CHENGPING
JIANG HAIHUI
LI HAOCHEN
MENG DEJIAN
LONG CHANGGUI
LV ZITONG
ZHAO FENG

Assignees

北京国网信通埃森哲信息技术有限公司
国网思极数字科技(北京)有限公司

Dates

Publication Date: 20260512
Application Date: 20251218

Claims (9)

1. A method for graph database construction based on a multi-dimensional polymorphic digitized architecture, the method comprising: Acquiring multidimensional multi-state digital architecture data, and preprocessing the acquired data; After preprocessing is completed, entity identification and merging are carried out on the multidimensional multi-state digital architecture data; Performing attribute fusion on the identified entity while performing entity identification and merging on the multidimensional multi-state digitized architecture data; Based on entity identification and attribute fusion, fusing the relationship among the entities; and constructing a graph database based on the entity subjected to attribute fusion and relationship fusion, and optimizing and periodically updating the constructed database.
2. The method of claim 1, wherein acquiring multi-dimensional polymorphic digitized architecture data and preprocessing the acquired data comprises: acquiring the multi-dimensional multi-state digital architecture data, and cleaning the multi-dimensional multi-state digital architecture data; after cleaning, converting the relation table, XML file and JSON object in the cleaned multidimensional multi-state digital architecture data into node and edge structures applicable to the graph database; and carrying out normalization processing on the numerical data in the multi-dimensional polymorphic digitized architecture data.
3. The method of claim 1, wherein performing entity identification and merging of the multi-dimensional polymorphic digitized architecture data after preprocessing is complete, comprises: Acquiring an entity to be identified and a corresponding attribute in the multidimensional multi-state digital architecture data; Constructing a node diagram based on the entity to be identified and the corresponding attribute; calculating the similarity between two nodes based on the attributes in the node diagram; acquiring a neighbor set corresponding to the nodes, and calculating the similarity of semantic paths between two nodes through a formula (1): Formula (1), Wherein, the Representing nodes Sum node The similarity of the semantic paths between them, Representing a path Is used for the weight of the (c), Representing connection nodes Sum node Is a set of semantic paths of (a), Representing nodes Is used for the degree of (a), Representing nodes Is the number of degrees; based on the obtained similarity between the nodes and the similarity of the semantic paths between the nodes, calculating the matching similarity between the nodes through a formula (2): formula (2), Wherein, the Representing nodes Sum node Is used to determine the matching similarity of the two, Representing the balance weight of the vehicle, Representing nodes Sum node Similarity between; screening two nodes corresponding to the matching similarity larger than a preset threshold according to the obtained matching similarity between the nodes, and combining the two nodes to obtain a new node; after the merging is completed, the edges connected with the original node are connected with the new node.
4. A method according to claim 3, wherein calculating a similarity between two nodes based on attributes in the node map comprises: Acquiring two nodes to be matched and acquiring attributes of the corresponding nodes; Calculating the similarity between two nodes according to the acquired attribute through a formula (3): Formula (3), Wherein, the Representing the total number of attributes, Representing attributes Is used for the weight of the (c), Representing an indication function, having attributes at both nodes And the value is not 1 when empty, and 0 in the rest of the cases, Expressed in a single attribute Similarity of the last two nodes.
5. The method of claim 1, wherein fusing relationships between the entities based on entity identification and attribute fusion comprises: acquiring various relations between the identified entity and the corresponding attribute and between the entity and the entity corresponding to the attribute; sending the plurality of relationships between the entities into a relationship level ordering model to order the relationship levels between the entities; After the ordering is completed, determining whether a conflict relation exists among the relations among the entities in the ordering; When the conflict relation exists among the entities in the ordering, the relation among the entities ordered after the ordering is abandoned; and weighting the relation among the ordered entities, then fusing, and highlighting the relation among the entities ordered before.
6. The method of claim 5, wherein the feeding the plurality of relationships between the entities into a relationship ranking model to rank the relationship rankings between the entities comprises: the relation level ranking model calculates a ranking score according to the relation between the entities through a formula (4): Equation (4), Wherein, the The ranking score is represented as a function of, A location parameter representing a relationship between the ordered entities, Representing the number of relationships between the ordered entities, Represent the first Correlation of relationships between individual entities; Ordering the relation among the entities according to an ideal sequence, and importing the corresponding parameters into the formula (4) to obtain an ideal ordering score; calculating normalized damage accumulation gain according to the obtained ideal sorting score through a formula (5): Equation (5), Wherein, the Indicating the normalized loss cumulative gain, Representing an ideal ranking score; taking the maximum normalized damage accumulation gain as a target to obtain the trained relation class ordering model; and sending the entities to be ranked and various relations among the entities into the trained relation level ranking model so as to rank the relation levels among the entities.
7. The method of claim 6, wherein obtaining the correlation of the relationships between the entities comprises: acquiring the confidence, reliability, timeliness and evidence number of the sources of the relation between the entities; Grading the relevance of the relation between the entities to grade irrelevant, somewhat relevant, highly relevant, very relevant and perfectly relevant, thereby obtaining a grade matrix; Obtaining an initial association matrix according to the obtained confidence coefficient, reliability, timeliness and membership degree of the grades of the evidence number and the grade division of the relation among the entities; collecting the obtained initial incidence matrixes to obtain a fuzzy relation matrix; Giving weight to the confidence, reliability, timeliness and evidence number of the sources of the relation between the entities to obtain a weight matrix; multiplying the weight matrix with the fuzzy relation matrix to obtain a comprehensive association matrix; and matching the obtained comprehensive incidence matrix with the grade matrix to obtain the grade of the correlation of the relation between the corresponding entities, thereby obtaining the corresponding correlation.
8. The method of claim 1, wherein the multi-dimensional polymorphic digitized architectural data comprises structured, semi-structured, and unstructured data, The structured data is stored by a business flow system, an application server and a database management system platform, the semi-structured data and the unstructured data are dispersed in various documents, and the project demand specifications, the research reports and the summary design contain rich data information.
9. A system for graph database construction based on a multi-dimensional polymorphic digitized architecture, the system comprising: The data acquisition module is used for acquiring multidimensional multi-state digital architecture data; A construction module, configured to perform the method for constructing a graph database based on the multi-dimensional polymorphic digitized architecture according to any one of claims 1-8 according to the acquired multi-dimensional polymorphic digitized architecture data.

Description

Method and system for constructing graph database based on multidimensional polymorphic digitized architecture Technical Field The invention relates to the technical field of graph databases, in particular to a method and a system for constructing a graph database based on a multidimensional multi-state digital architecture. Background The architecture assets in the prior art are stored and queried primarily in the form of electronic documents and are scattered among different departments, units and personnel. The assets have various data sources and different formats, and lack of unified integration standards and specifications, which not only greatly increases the difficulty in using architecture assets, but also severely restricts the quality and efficiency of architecture design. At present, a large number of architecture assets exist, the traditional relational database is low in efficiency when processing semi-structured assets, the deep fusion capability of architecture assets of different types and different sources is lacking, organic association between the architecture assets cannot be established, so that data analysis work of cross-business domains and cross-asset types is difficult to develop effectively, global analysis and decision making of enterprises under complex business scenes are difficult to support, and value release of the architecture assets is greatly limited. Therefore, a graph database construction method based on a multidimensional multi-state digital architecture is needed to collect data of a large number of architecture assets, perform relationship fusion, and then construct an intuitive node graph so as to intuitively display the relevance among the architecture assets. Disclosure of Invention The embodiment of the invention aims to provide a method for constructing a graph database based on a multi-dimensional polymorphic digital architecture, which can collect data of a large number of architecture assets, perform relationship fusion and then construct an intuitive node diagram so as to facilitate the visual display of the relevance among the architecture assets. To achieve the above object, an embodiment of the present invention provides a method for constructing a graph database based on a multidimensional polymorphic digitized architecture, the method including: Acquiring multidimensional multi-state digital architecture data, and preprocessing the acquired data; After preprocessing is completed, entity identification and merging are carried out on the multidimensional multi-state digital architecture data; Performing attribute fusion on the identified entity while performing entity identification and merging on the multidimensional multi-state digitized architecture data; Based on entity identification and attribute fusion, fusing the relationship among the entities; and constructing a graph database based on the entity subjected to attribute fusion and relationship fusion, and optimizing and periodically updating the constructed database. Optionally, acquiring the multidimensional multi-state digitized architecture data and preprocessing the acquired data, including: acquiring the multi-dimensional multi-state digital architecture data, and cleaning the multi-dimensional multi-state digital architecture data; after cleaning, converting the relation table, XML file and JSON object in the cleaned multidimensional multi-state digital architecture data into node and edge structures applicable to the graph database; and carrying out normalization processing on the numerical data in the multi-dimensional polymorphic digitized architecture data. Optionally, after preprocessing is completed, performing entity identification and merging on the multidimensional multi-state digitized architecture data, including: Acquiring an entity to be identified and a corresponding attribute in the multidimensional multi-state digital architecture data; Constructing a node diagram based on the entity to be identified and the corresponding attribute; calculating the similarity between two nodes based on the attributes in the node diagram; acquiring a neighbor set corresponding to the nodes, and calculating the similarity of semantic paths between two nodes through a formula (1): Formula (1), Wherein, the Representing nodesSum nodeThe similarity of the semantic paths between them,Representing a pathIs used for the weight of the (c),Representing connection nodesSum nodeIs a set of semantic paths of (a),Representing nodesIs used for the degree of (a),Representing nodesIs the number of degrees; based on the obtained similarity between the nodes and the similarity of the semantic paths between the nodes, calculating the matching similarity between the nodes through a formula (2): formula (2), Wherein, the Representing nodesSum nodeIs used to determine the matching similarity of the two,Representing the balance weight of the vehicle,Representing nodesSum nodeSimilarity between; screening two node