CN-121212291-B - Knowledge-graph-based large model expert database construction

CN121212291BCN 121212291 BCN121212291 BCN 121212291BCN-121212291-B

Abstract

The invention relates to the technical field of artificial intelligence and discloses a large model expert database construction method based on a knowledge graph, which realizes the real-time acquisition of multi-source data, calculates a marked update point through a hash fingerprint algorithm and semantic similarity, judges and updates by combining metadata and stores the updated data into a message queue; extracting features to construct a fusion network to generate low-dimensional semantic vectors, establishing cross-modal mapping, constructing an attribute graph by taking the semantic vectors as nodes, including basic facts, reasoning rules and time sequence edges, analyzing a user request as a graph traversing task, returning a result through inverse facts reasoning, encrypting sensitive data and relations, constructing an index, migrating source domain graph features to optimize a target domain graph, and accelerating expert library construction. The invention realizes the efficient processing of data, semantic association analysis, safety inquiry and rapid construction of expert database, and improves the data utilization and knowledge reasoning efficiency.

Inventors

Mei Shuhong
ZHANG ZUYU
WU HUI
CHEN RUIBO
JING WEIMING
YANG XIAOLI
ZHANG XUN
LI QUSHENG

Assignees

广西壮族自治区自然资源遥感院

Dates

Publication Date: 20260508
Application Date: 20251010

Claims (10)

1. The large model expert database construction based on the knowledge graph is characterized by comprising the following steps: The method comprises the steps of obtaining real-time data of a multi-source data platform, establishing a data butt joint obtaining relation between the multi-source data platform and a literature database as well as between the multi-source data platform and an unstructured data platform, synchronizing and integrating the real-time data from the literature database and the unstructured data platform, dividing the integrated full data into data blocks by the multi-source data platform, comparing hash values of the data blocks by a hash fingerprint algorithm, marking potential update points, dividing the unstructured text from the multi-source data platform into fixed-length subsequences, calculating semantic similarity of adjacent subsequences in the same unstructured text, marking an update area, comprehensively judging data update by combining metadata information, storing the data judged to be updated into a message queue, wherein the update points and the update area are core judgment basis of data update, and the update data in the message queue are used for subsequent multi-source data feature extraction and target field knowledge map real-time update; the method comprises the steps of extracting characteristics of multi-source data, constructing a fusion network, and fusing multi-mode characteristics through the fusion network to obtain fusion mode characteristics, wherein the fusion network is a multi-mode characteristic fusion special network constructed by combining a multi-head attention mechanism of dynamic weight distribution with residual connection, and the fusion mode characteristics are unified characteristic sets of all mode characteristics fused by the fusion network, and the fusion mode characteristics are compressed into low-dimensional semantic vectors; The expert database comprises a basic fact layer and an inference rule layer, wherein the basic fact layer takes a semantic vector as a node of a knowledge graph to calculate cosine similarity between nodes, and when the similarity exceeds a preset threshold value, undirected association edges are established between the corresponding nodes; the inference rule layer utilizes a cloud model to define fuzzy relation nodes, the cloud model is a general algorithm model for processing fuzzy knowledge relation, genetic algorithm is adopted to optimize expected, entropy and super entropy digital characteristic parameters of the fuzzy relation nodes, the fuzzy relation nodes are nodes for representing the fuzzy knowledge relation among entities, a directed fuzzy association side is established between the fuzzy relation nodes and a basic fact node, and labels of the sides represent types of the fuzzy relation; Receiving user demands, analyzing key words and semantic logic in the demands based on a preset task analysis rule base, converting the demands into graph traversal task sentences which can be understood by an attribute graph database, determining node types, edge relations and reasoning targets which need to be inquired, wherein the node types comprise basic fact nodes and fuzzy relation nodes, the edge relations comprise undirected association edges, directed fuzzy association edges and directed time sequence edges, extracting basic fact association rules and fuzzy relation reasoning rules related to the current tasks as constraint conditions of graph traversal; Preprocessing a target domain knowledge graph, extracting a structural feature vector, aligning an initial target domain graph, transferring source domain structural features to the target domain, applying the transferred features to the target domain knowledge graph, adding new nodes and edges into the target domain knowledge graph according to the association relation among the features, expanding the scale and association range of the knowledge graph, and updating the target domain knowledge graph in real time by combining new target domain data acquired from a multi-source data platform to complete the construction of an expert database.
2. The knowledge-based large model expert database construction of claim 1, wherein after applying the migrated features to the target domain knowledge-graph, comprising: based on the new data of the target field obtained from the multi-source data platform, recalculating cosine similarity among nodes in the knowledge graph of the target field; presetting a first similarity change threshold S1 and a second similarity change threshold S2, wherein S1 is less than S2; calculating the variable quantity delta S of cosine similarity among nodes after new data is introduced; When DeltaS is more than or equal to S2, judging that the relation among the nodes is obviously changed, triggering the comprehensive updating flow of the knowledge graph, reconstructing the associated edges and updating the weights of the edges; When S1 is less than or equal to DeltaS < S2, judging that the relationship between the nodes has slight change, and only locally adjusting the associated edges of the related nodes; When DeltaS is less than S1, the relationship between the nodes is basically stable, and the existing knowledge graph structure is maintained; After the target domain knowledge graph is updated in real time, establishing a knowledge retrieval index based on the updated knowledge graph; Presetting an index updating quantity threshold F0, and counting updating quantity N of nodes and edges in the knowledge graph; when N is more than or equal to F0, immediately regenerating a knowledge retrieval index; When N < F0, index updating is performed at preset periodic time intervals T1.
3. The knowledge-based large model expert database construction of claim 2, wherein the triggering of the comprehensive update flow of the knowledge-graph comprises: presetting a node connectivity threshold D1 and an information entropy threshold E1, and calculating connectivity delta D and information entropy delta E of each node in the knowledge graph; When the delta D of the node is smaller than D1 and the delta E is smaller than E1, judging that the node is a redundant node, and deleting the node and the related edges thereof from the knowledge graph; Calculating cosine similarity among the rest nodes again, presetting a new edge establishment threshold S3, if the similarity among the nodes is more than or equal to S3, establishing undirected associated edges, and giving edge weights according to the similarity; Aiming at the reasoning rule layer, re-evaluating the confidence coefficient of the fuzzy relation node, and presetting a confidence coefficient threshold C1, wherein the confidence coefficient threshold C1 is a judging threshold of the effectiveness of the fuzzy relation node; when the confidence coefficient of the fuzzy relation node is less than C1, the node is defined by using the cloud model again, and the directed fuzzy association edge of the node with the ground truth is updated; When the confidence coefficient of the fuzzy relation node is more than or equal to C1, the fuzzy relation represented by the fuzzy relation node is represented to meet the requirements of knowledge graph reasoning and application on the reliability of the fuzzy relation node under the current evaluation system, and the node and the associated side are not required to be redefined.
4. A knowledge-based large model expert database construction in accordance with claim 3, wherein said locally adjusting the associated edges of the relevant nodes comprises: Calculating the change quantity delta W of the related edge weight; when DeltaW is more than or equal to W1, presetting a weight adjustment formula of new weight=original weight x (1+k x DeltaW), wherein k is a weight adjustment coefficient; if the adjusted edge weight is less than the preset minimum weight Wmin, deleting the associated edge of the related node; If the newly-appearing similarity between the nodes is more than or equal to a preset temporary threshold S4, a temporary association edge is established and marked, and the temporary association edge is reevaluated when the next round of updating is performed.
5. The knowledge-based large model expert database construction of claim 4, wherein said building a knowledge retrieval index comprises: Presetting a text index type T1, an image index type T2 and an audio index type T3; according to the node data type, if the data is text data, adopting a T1 type index, extracting keywords and establishing a keyword-node ID mapping table; if the image data is the image data, a T2 type index is adopted, and hash value-node ID mapping is established after the image feature vector is hashed; If the audio data is the audio data, a T3 type index is adopted, and an index is established based on the Mel frequency spectrum characteristics; And presetting an index capacity threshold C1, and when the occupied space of the index is more than or equal to C1×80%, performing index compression, and merging repeated index items.
6. The knowledge-based large model expert database construction of claim 1, wherein said comparing hash values of data blocks with a hash fingerprinting algorithm and marking potential update points comprises: presetting a first hash difference threshold H1 and a second hash difference threshold H2, wherein H1 is smaller than H2; when delta H is more than or equal to H2, marking the corresponding data block as a high-priority potential update point; when H1 is less than or equal to delta H < H2, marking the corresponding data block as a low-priority potential update point; When delta H < H1, the data block is judged to be not updated, and the original state is maintained.
7. The knowledge-based large model expert database construction of claim 1, wherein said computing sub-sequence semantic similarity for unstructured text, labeling updated regions, comprises: presetting a semantic similarity threshold B0; Dividing unstructured text from a multi-source data platform into subsequences with fixed length, and calculating semantic similarity delta B of adjacent subsequences in the same unstructured text, wherein the size of delta B represents consistency of semantic content of the subsequences; when delta B < B0, indicating that the semantic content of the subsequence region changes, and marking the corresponding subsequence region as an update region; When delta B is more than or equal to B0, the semantic content of the subsequence region is not changed, and no update of the corresponding subsequence region is judged, and no marking treatment is performed.
8. The knowledge-based large model expert database construction of claim 1, wherein the integrated decision data update in combination with metadata information comprises: acquiring metadata, wherein the metadata comprises an update time stamp of data and file size change, and presetting a time stamp difference threshold T0 and a file size change threshold F0; Calculating the timestamp difference delta T and the file size difference delta F of the current metadata and the historical metadata; when the delta T is more than or equal to T0 and the delta F is more than or equal to F0, combining the potential updating point and the updating area mark to judge that the data is updated; when Δt < T0, Δf < F0, it is determined that no update of the data occurs.
9. The knowledge-based large model expert database construction according to claim 1, wherein when generating a hypothetical scenario by inverse facts reasoning and verifying the deduced returned result, the method comprises: selecting key nodes or edges from the candidate paths, and modifying the attributes of the key nodes or edges to generate a hypothetical scene; Carrying out consistency verification on the assumed scene by using other nodes and side relations in the knowledge graph, and checking whether the assumed scene conflicts with the existing knowledge; and deducing the result of the hypothesis scene to obtain a result of the anti-facts reasoning, and sorting and formatting the verified reasoning path and the anti-facts reasoning result which accord with the rule constraint and returning the result to the user.
10. The knowledge-based large model expert database construction of claim 9, wherein the deriving the consequences of the hypothetical scenario comprises: inputting the assumed scene into a preset deduction model through a rule engine in the knowledge graph, and carrying out multi-hop reasoning along the candidate path; Calculating the influence degree of the hypothetical scene on each associated node according to the attribute of the node in the knowledge graph and the weight of the edge, and generating an influence propagation path; And if the fuzzy relation nodes are involved in the deduction process, quantitatively representing the uncertainty of the deduction result by utilizing the digital characteristics of the cloud model, and obtaining a counterfactual reasoning result containing probability distribution.

Description

Knowledge-graph-based large model expert database construction Technical Field The invention relates to the technical field of artificial intelligence, in particular to a large model expert database construction based on a knowledge graph. Background Under the age background of advanced development of digitalization and intelligence, the data scale generated by each industry is rapidly expanded to cover multi-mode information such as texts, images, audios and the like, and the heterogeneous data like scattered pearls has huge knowledge value. However, conventional data storage and processing methods, such as relational databases, are difficult to effectively process complex semantic relationships and unstructured data, so that knowledge mining efficiency is low and information island phenomenon is serious. The knowledge graph is used as a semantic network, can describe the entity and the relation thereof in a structured mode, converts discrete data into a computable and inferable knowledge network, and shows unique advantages in knowledge representation and reasoning. Large models, particularly large-scale pre-trained language models, are excellent in tasks such as text generation, question-answering, and the like, by virtue of their powerful natural language processing capabilities. The knowledge graph and the large model are combined to construct an expert database, so that the expert database becomes a key path for realizing efficient utilization of knowledge and assisting intelligent decision. In practical application, however, the knowledge graph faces the challenges of difficult multi-source data fusion, insufficient cross-modal knowledge association, delayed knowledge update and the like, and the large model has the problems of poor accuracy, lack of interpretability, difficulty in processing complex logic reasoning and the like in the professional field knowledge. Therefore, it is necessary to design a knowledge-graph-based large model expert database construction to solve the problems of difficulty in multi-source data fusion, insufficient cross-modal knowledge association, lag in knowledge update and difficulty in processing complex logic reasoning in the traditional database. Disclosure of Invention In view of the above, the invention provides a knowledge graph-based large model expert database construction, which aims to solve the problems of difficult multi-source data fusion, insufficient cross-modal knowledge association, delayed knowledge update and difficult processing of complex logic reasoning in the traditional database. The invention provides a knowledge-graph-based large model expert database construction, which comprises the following steps: Acquiring real-time data of a multi-source data platform, wherein the multi-source data comprises images, videos, texts, images and audios, the multi-source data platform is connected with a literature database and an unstructured data platform, hash values of data blocks are compared by a hash fingerprint algorithm, potential update points are marked, sub-sequence semantic similarity is calculated for the unstructured texts, update areas are marked, and data update is comprehensively judged by combining metadata information and stored in a message queue; extracting features of the multi-source data, constructing fusion network fusion modal features, and compressing the fusion network fusion modal features into semantic vectors; The expert database comprises a basic fact layer and an inference rule layer, wherein the basic fact layer takes a semantic vector as a node of a knowledge graph to calculate cosine similarity between nodes, and when the similarity exceeds a preset threshold value, undirected association edges are established between the corresponding nodes; embedding time stamp information into the attributes of the nodes and the edges, establishing directed time sequence edges between the nodes with causal relation or evolution relation according to time sequence, storing and constructing a knowledge map by adopting an attribute map database, and endowing corresponding attributes for each node and each edge; Receiving user demands, converting the demands into graph traversal task sentences which can be understood by a graph database, determining the types of nodes, the side relations and the reasoning targets which need to be queried, extracting rules related to the current task as constraint conditions of graph traversal; Preprocessing a target domain knowledge graph, extracting a structural feature vector, aligning an initial target domain graph, transferring source domain structural features to the target domain, applying the transferred features to the target domain knowledge graph, adding new nodes and edges into the target domain knowledge graph according to the association relation among the features, expanding the scale and association range of the knowledge graph, and updating the target domain knowledge graph in real time by co