CN-121981233-A - Knowledge graph construction and retrieval method and device, electronic equipment and storage medium
Abstract
The invention relates to the technical field of natural language processing and provides a knowledge graph construction and retrieval method, a device, electronic equipment and a storage medium, wherein the method comprises the steps of inputting a domain document and a domain ontology structure into a large language model so that the large language model takes a target service theme to be extracted as a preset service theme, a structured triplet to be extracted accords with the preset triplet structure as a constraint, and extracting the target service theme and the structured triplet to which the domain document belongs; based on the target business theme, the field document and the structured triplet, a knowledge graph comprising theme nodes, document nodes and entity nodes is constructed, and the attribution relation from the document nodes to the theme nodes and the source relation from the entity nodes to the document nodes are built in the knowledge graph. The method adopts a mode of combining strong constraint extraction of the domain ontology structure and three-layer heterogeneous node association construction, thereby improving the accuracy and logic regularity of knowledge graph data in a domain scene.
Inventors
- Shen Chengen
- Ding yangguang
- CHEN JIANBO
- HAN YUHU
Assignees
- 科大讯飞股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (19)
- 1. The knowledge graph construction method is characterized by comprising the following steps of: Acquiring a domain ontology structure, wherein the domain ontology structure comprises a preset service theme and a preset triplet structure; Inputting a domain document and the domain ontology structure into a large language model, so that the large language model takes a target business theme to be extracted as the preset business theme, and a structured triplet to be extracted accords with the preset triplet structure as a constraint, and extracting the target business theme and the structured triplet to which the domain document belongs; And constructing a knowledge graph comprising topic nodes, document nodes and entity nodes based on the target business topic, the domain document and the structured triplet, and establishing the attribution relation from the document nodes to the topic nodes and the source relation from the entity nodes to the document nodes in the knowledge graph.
- 2. The knowledge graph construction method according to claim 1, wherein the inputting the domain document and the domain ontology structure into a large language model, so that the large language model uses a target business topic to be extracted as the preset business topic, and the structured triples to be extracted conform to the preset triples structure as constraints, and extracting the target business topic and the structured triples to which the domain document belongs includes: converting the preset business theme and the preset triplet structure into a prompt word instruction; Inputting the prompt word instruction, the field document and the structured output format into the large language model, so that the large language model takes a target service theme to be extracted as the preset service theme, the structured triplet to be extracted accords with the preset triplet structure as constraint, and the target service theme and the initial structured triplet are output according to the structured output format; and performing format verification and content comparison on the initial structured triplet to obtain the structured triplet.
- 3. The knowledge graph construction method according to claim 2, wherein the performing format verification and content comparison on the initial structured triplet to obtain the structured triplet includes: judging whether an initial element in the initial structural triplet belongs to a preset element set corresponding to the preset triplet structure or not; Calculating feature distances between the target initial element and each candidate element in the preset element set under the condition that the target initial element which is not matched with the preset element set exists; and selecting a candidate element with the minimum characteristic distance to replace the target initial element to obtain the structured triplet.
- 4. The knowledge-graph construction method according to claim 3, further comprising, after calculating the feature distance between the target initial element and each candidate element in the preset element set: judging whether the minimum characteristic distance is larger than a preset deviation threshold value or not; Generating error feedback information containing the target initial element under the condition that the minimum characteristic distance is larger than the preset deviation threshold value, and updating the prompt word instruction by utilizing the error feedback information; And re-inputting the updated prompt word instruction into the large language model to trigger the large language model to re-output the initial structured triples.
- 5. The knowledge-graph construction method according to any one of claims 1 to 4, characterized by further comprising, after constructing a knowledge-graph including subject nodes, document nodes, and entity nodes: acquiring a new field document and a history document to be replaced corresponding to the new field document in the knowledge graph; creating a new document node for the newly added domain document in the knowledge graph, and setting the state attribute of the new document node to be a valid state; Updating the state attribute of the original document node corresponding to the history document to be replaced into a failure state; and establishing a version substitution relation connection line between the original document node and the new document node.
- 6. The knowledge graph construction method according to claim 5, wherein the obtaining the newly added domain document and the history document to be replaced corresponding to the newly added domain document in the knowledge graph includes: calculating the vector similarity between the newly added domain document and the existing document in the knowledge graph; And determining the existing document as the history document to be replaced under the condition that the vector similarity exceeds a preset threshold and the new added domain document and the existing document relate to the same preset business theme, or extracting key metadata of the new added domain document, identifying the corresponding existing document existing in the knowledge graph through fuzzy matching of the key metadata, and determining the corresponding existing document as the history document to be replaced.
- 7. The knowledge graph construction method according to claim 5, wherein updating the state attribute of the original document node corresponding to the history document to be replaced to the failure state includes: Respectively extracting a first publishing time of the newly added domain document and a second publishing time of the history document to be replaced; And under the condition that the first issuing time is later than the second issuing time, updating the state attribute of the original document node into a failure state, and carrying out incremental assignment on the version number of the new document node.
- 8. The knowledge graph construction method according to any one of claims 1 to 4, further comprising, after extracting the target business topic and the structured triplet to which the domain document belongs: Identifying age terms expressions contained in the domain document; And converting the aging clause expression into a time validity attribute, and additionally recording the time validity attribute into the corresponding edge attribute of the entity node or the structured triplet.
- 9. The knowledge-graph construction method according to any one of claims 1 to 4, characterized by further comprising, after constructing a knowledge-graph including subject nodes, document nodes, and entity nodes: responding to a revocation instruction of a target domain document, and positioning a target document node corresponding to the target domain document in the knowledge graph; removing the attribution relation from the target document node to the corresponding subject node; Searching all target entity nodes pointed to the target document nodes through the source relation, and deleting all target entity nodes, all association relations derived among the target entity nodes and the target document nodes.
- 10. A retrieval method, comprising: receiving a query question, and carrying out element analysis on the query question based on a domain ontology structure to generate a retrieval plan comprising a plurality of logic steps; according to the retrieval plan, performing progressive query among the topic nodes, the document nodes and the entity nodes of the knowledge graph to obtain a structured evidence chain; carrying out consistency verification on the structured evidence chain, and generating a retrieval answer with tracing information based on the structured evidence chain after the verification is passed; Wherein the knowledge graph is constructed based on the knowledge graph construction method according to any one of claims 1 to 9.
- 11. The search method of claim 10, wherein the parsing the query question based on the domain ontology structure to generate a search plan comprising a plurality of logical steps comprises: identifying a target retrieval topic and a target core entity in the query problem by utilizing the domain ontology structure; And constructing a plurality of retrieval tasks which are arranged according to the execution sequence based on the target retrieval subject and the target core entity, and generating the retrieval plan by taking the retrieval result of the front retrieval task as the input condition of the rear retrieval task.
- 12. The method according to claim 11, wherein performing a progressive query between a topic node, a document node, and an entity node of a knowledge graph according to the search plan, obtaining a structured evidence chain, comprises: Positioning an initial entity node in the knowledge graph based on the input condition of the current retrieval task; Searching an associated entity node with an associated relation with the initial entity node in the knowledge graph according to the retrieval target of the current retrieval task, and searching a target document node corresponding to the associated entity node according to a source relation; combining the initial entity node, the associated entity node and the target document node into a search result of the current search task, taking the search result of the current search task as an input condition of a next search task, returning to the step of executing the positioning initial entity node until all the search tasks are executed, and combining the search results of all the search tasks into the structured evidence chain.
- 13. The retrieval method according to any one of claims 10 to 12, wherein the verifying consistency of the structured evidence chain, and generating retrieval answers with traceability information based on the structured evidence chain after verification is passed, includes: detecting whether a logic consistency conflict exists among document nodes contained in the structured evidence chain; and under the condition that the logic consistency conflict exists, extracting conflict characteristic information in the structured evidence chain, and feeding back the conflict characteristic information to the step of carrying out element analysis on the query problem based on the domain ontology structure so as to regenerate the retrieval plan and execute the progressive query again.
- 14. The method of claim 13, wherein detecting whether there is a logical consistency conflict with document nodes contained in the structured evidence chain comprises: Detecting whether expression conflicts exist on the same fact by the structural triples corresponding to the document nodes from different sources; Extracting effective moments corresponding to the document nodes with the expression conflicts under the condition that the expression conflicts exist; And carrying out conflict resolution according to the priority of the effective time, and adopting the structured triples corresponding to the document nodes with the latest effective time as proof of passing verification.
- 15. The retrieval method according to claim 13, further comprising, after detecting whether there is a logical consistency conflict with a document node contained in the structured evidence chain: Converting the structured evidence chain into natural language text without detecting the existence of the logical consistency conflict; And extracting the document identification and the effective time of the document node corresponding to each entity node in the structured evidence chain, and splicing the document identification and the effective time serving as a traceability tag into the natural language text to generate a retrieval answer with traceability information.
- 16. The knowledge graph construction device is characterized by comprising: The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a domain ontology structure, and the domain ontology structure comprises a preset service theme and a preset triplet structure; The input module is used for inputting the domain document and the domain ontology structure into a large language model so that the large language model takes a target business theme to be extracted as the preset business theme, the structural triplet to be extracted accords with the preset triplet structure as constraint, and the target business theme and the structural triplet to which the domain document belongs are extracted; the construction module is used for constructing a knowledge graph comprising a topic node, a document node and an entity node based on the target business topic, the domain document and the structured triplet, and establishing a attribution relation from the document node to the topic node and a source relation from the entity node to the document node in the knowledge graph.
- 17. A search device, comprising: The receiving module is used for receiving the query problem, carrying out element analysis on the query problem based on the domain ontology structure and generating a retrieval plan comprising a plurality of logic steps; the query module is used for executing progressive query among the topic nodes, the document nodes and the entity nodes of the knowledge graph according to the retrieval plan to obtain a structured evidence chain; the verification module is used for carrying out consistency verification on the structured evidence chain, and generating a retrieval answer with tracing information based on the structured evidence chain after the verification is passed; Wherein the knowledge graph is constructed based on the knowledge graph construction method according to any one of claims 1 to 9.
- 18. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the processor implements the knowledge graph construction method of any one of claims 1 to 9 or the retrieval method of any one of claims 10 to 15 when executing the computer program.
- 19. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the knowledge-graph construction method according to any one of claims 1to 9, or implements the retrieval method according to any one of claims 10 to 15.
Description
Knowledge graph construction and retrieval method and device, electronic equipment and storage medium Technical Field The present invention relates to the field of natural language processing technologies, and in particular, to a knowledge graph construction method, a knowledge graph retrieval device, an electronic device, and a storage medium. Background The technology of retrieval enhancement generation (RETRIEVAL-Augmented Generation, RAG) based on knowledge graph is widely applied to make up for the defect that a large language model is easy to generate a factual error when processing the problem in the professional field. Currently, a triplet is extracted from an original document by using tools such as a large language model (Large Language Model, LLM) and the like, so as to construct a flattened network only comprising entity nodes and relationship edges. However, the method has the advantages that error basic data are easy to generate in a professional scene, the constructed two-dimensional flat map seriously loses multidimensional context attributes of knowledge, explicit connection between the knowledge and a source document and a high-level business theme of the knowledge is completely stripped, authoritative tracing of the knowledge in a system cannot be carried out, the business logic lacks organization and collusion, and complex requirements of the professional scene on the rigor of the knowledge are difficult to adapt. Disclosure of Invention The invention provides a knowledge graph construction and retrieval method and device, electronic equipment and a storage medium, which are used for solving the defects in the prior art. The invention provides a knowledge graph construction method, which comprises the following steps: Acquiring a domain ontology structure, wherein the domain ontology structure comprises a preset service theme and a preset triplet structure; Inputting a domain document and the domain ontology structure into a large language model, so that the large language model takes a target business theme to be extracted as the preset business theme, and a structured triplet to be extracted accords with the preset triplet structure as a constraint, and extracting the target business theme and the structured triplet to which the domain document belongs; And constructing a knowledge graph comprising topic nodes, document nodes and entity nodes based on the target business topic, the domain document and the structured triplet, and establishing the attribution relation from the document nodes to the topic nodes and the source relation from the entity nodes to the document nodes in the knowledge graph. The invention provides a retrieval method, which comprises the following steps: receiving a query question, and carrying out element analysis on the query question based on a domain ontology structure to generate a retrieval plan comprising a plurality of logic steps; according to the retrieval plan, performing progressive query among the topic nodes, the document nodes and the entity nodes of the knowledge graph to obtain a structured evidence chain; carrying out consistency verification on the structured evidence chain, and generating a retrieval answer with tracing information based on the structured evidence chain after the verification is passed; The knowledge graph is constructed based on the knowledge graph construction method. The invention also provides a knowledge graph construction device, which comprises the following modules: The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a domain ontology structure, and the domain ontology structure comprises a preset service theme and a preset triplet structure; The input module is used for inputting the domain document and the domain ontology structure into a large language model so that the large language model takes a target business theme to be extracted as the preset business theme, the structural triplet to be extracted accords with the preset triplet structure as constraint, and the target business theme and the structural triplet to which the domain document belongs are extracted; the construction module is used for constructing a knowledge graph comprising a topic node, a document node and an entity node based on the target business topic, the domain document and the structured triplet, and establishing a attribution relation from the document node to the topic node and a source relation from the entity node to the document node in the knowledge graph. The invention also provides a retrieval device, which comprises the following modules: The receiving module is used for receiving the query problem, carrying out element analysis on the query problem based on the domain ontology structure and generating a retrieval plan comprising a plurality of logic steps; the query module is used for executing progressive query among the topic nodes, the document