CN-122019765-A - Intelligent retrieval method and system integrating knowledge graph and semantic vector

CN122019765ACN 122019765 ACN122019765 ACN 122019765ACN-122019765-A

Abstract

The invention discloses an intelligent retrieval method and system integrating a knowledge graph and a semantic vector, and relates to the technical field of natural language processing and information retrieval. The method comprises a text preprocessing process, a document abstract generating process, a knowledge map constructing process, a vectorization process, a fusion retrieval process and a rearrangement process. In the vectorization process, the document abstract and the text segment are encoded into an abstract vector and a text segment vector, and the abstract vector and all the text segment vectors of the belonged document are stored in a vector database in an associated mode. And in the fusion retrieval process, retrieving text segments associated with query sentences through a knowledge graph, retrieving related abstract vectors and text segment vectors through a vector database, and merging retrieval results. The intelligent retrieval method and the system which are integrated with the knowledge graph and the semantic vector, provided by the invention, have the advantages that the comprehensiveness, the accuracy and the relevance of the retrieval result are obviously improved, and the whole theme and the fragment detail information of the document can be considered at the same time.

Inventors

WANG WEIDI
YUAN JIACHENG

Assignees

福建星网智慧科技有限公司
福建星网锐捷通讯股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251229

Claims (8)

1. An intelligent retrieval method integrating a knowledge graph and a semantic vector is characterized by comprising the following steps: importing a document and segmenting the document text to obtain a text segment; generating a whole abstract of a document text based on the large model; The knowledge graph construction process comprises the steps of extracting concept phrases in text segments, constructing a knowledge graph, and storing association relations between the concept phrases and the text segments; The vectorization process comprises the steps of encoding a document abstract and text segments into abstract vectors and text segment vectors, storing the abstract vectors and the text segment vectors of all the belonged documents in a vector database; the fusion search process is used for searching text segments associated with query sentences through the knowledge graph, searching related abstract vectors and text segment vectors through the vector database, and combining search results to form a primary candidate result set; And the rearrangement process is used for finely sorting the preliminary candidate result set through a rearrangement model to obtain a final retrieval result.
2. The system of claim 1, further comprising an integrity determination process for determining whether the text segment is complete based on the large model after the text preprocessing process, and returning to the text preprocessing process to re-segment the document text if not.
3. The system according to claim 1 or 2, wherein the knowledge graph construction process specifically comprises extracting key concept words from an input text segment by using a large language model, constructing the concept words into a knowledge link according to semantic association, simultaneously assigning a unique link number to the knowledge link and establishing association with the text segment, and fusing the knowledge links into a knowledge graph by using the same concept nodes in a plurality of knowledge links as connection points to form a semantic network.
4. The system of claim 3, wherein the fusion search process specifically comprises extracting concept words in query sentences, matching the concept words in a knowledge graph to obtain related knowledge links, counting repetition rates of the concept words contained in the matched knowledge links, sorting according to the repetition rates, selecting Top-N most related knowledge links, and tracing back text segments associated with the knowledge links.
5. An intelligent retrieval system integrating a knowledge graph and a semantic vector, comprising: The text preprocessing module is used for importing a document and segmenting the document text to obtain a text segment; the document abstract generating module is used for generating an overall abstract of the document text based on the large model; the knowledge graph construction module is used for extracting concept phrases in the text segment, constructing a knowledge graph and storing the association relation between the concept phrases and the text segment; The vectorization module is used for encoding the document abstract and the text segment into an abstract vector and a text segment vector and storing the abstract vector and the text segment vector into a vector database, wherein the abstract vector is stored in association with all text segment vectors of the affiliated document; The fusion retrieval module is used for retrieving text segments associated with query sentences through the knowledge graph, retrieving related abstract vectors and text segment vectors through the vector database, and combining retrieval results to form a primary candidate result set; And the rearrangement module is used for finely sequencing the preliminary candidate result set through a rearrangement model to obtain a final retrieval result.
6. The system of claim 5, further comprising an integrity determination module, after the text preprocessing module, for determining whether the text segment is complete based on the large model, and if not, the text preprocessing module re-segments the document text.
7. The system according to claim 5 or 6, wherein the knowledge graph construction module is specifically configured to extract, for an input text segment, a concept word of a key of the text segment by using a large language model, construct the concept word as a knowledge link according to semantic association, assign a unique link number to the knowledge link, and establish association with the text segment, and fuse the knowledge links into a knowledge graph by using the same concept node in the plurality of knowledge links as a connection point, thereby forming a semantic network.
8. The system of claim 7, wherein the fusion search module is specifically configured to extract a concept word in a query sentence, obtain a related knowledge link according to matching of the concept word in a knowledge graph, perform repetition rate statistics on the concept word contained in the matched knowledge link, sort the concept word according to the repetition rate, select Top-N most related knowledge links, and trace back to a text segment associated with the knowledge link.

Description

Intelligent retrieval method and system integrating knowledge graph and semantic vector Technical Field The invention relates to the technical field of natural language processing and information retrieval, in particular to an intelligent retrieval method and system integrating a knowledge graph and a semantic vector. Background Existing retrieval enhancement generation (RAG) systems typically rely on vector similarity retrieval or keyword-based retrieval approaches. Although vector retrieval is capable of capturing semantic information, there are limitations in handling complex conceptual associations, multi-hop reasoning, and the like. On the other hand, conventional knowledge maps, while capable of representing associations between concepts, lack direct association and semantic understanding capabilities on the original text segments. In addition, the candidate result set obtained by preliminary retrieval still has an optimization space in relevance and accuracy, and a refined ordering mechanism is lacked to improve the quality of the final returned result. The disadvantages of the prior art are at least the following: pure vector retrieval has difficulty in handling complex associations between concepts; keyword retrieval is easy to miss content related to semantics but different in terms; knowledge graph retrieval lacks direct links and semantic understanding of the original text segment; The initial search result set may have noise or inaccurate sorting problem to affect the quality of final answer, and the conventional RAG system has room for improvement in the accuracy and relevance sorting of the search results. Therefore, based on the above drawbacks, the existing search is mostly based on text segments, and lack of grasp of the overall subject matter and core ideas of the document may cause the search result to deviate from the subject matter of the document although the segments are related. Disclosure of Invention The invention aims to solve the technical problem of providing an intelligent retrieval method and system for fusing a knowledge graph and a semantic vector, which can realize multidimensional, high-precision retrieval and fine sequencing of text data, remarkably improve the comprehensiveness, accuracy and relevance of a retrieval result and simultaneously consider the whole subject and fragment detail information of a document. In a first aspect, the present invention provides an intelligent retrieval method for fusing a knowledge graph and a semantic vector, including: importing a document and segmenting the document text to obtain a text segment; generating a whole abstract of a document text based on the large model; The knowledge graph construction process comprises the steps of extracting concept phrases in text segments, constructing a knowledge graph, and storing association relations between the concept phrases and the text segments; The vectorization process comprises the steps of encoding a document abstract and text segments into abstract vectors and text segment vectors, storing the abstract vectors and the text segment vectors of all the belonged documents in a vector database; the fusion search process is used for searching text segments associated with query sentences through the knowledge graph, searching related abstract vectors and text segment vectors through the vector database, and combining search results to form a primary candidate result set; And the rearrangement process is used for finely sorting the preliminary candidate result set through a rearrangement model to obtain a final retrieval result. Further, after the text preprocessing process, the method further comprises an integrity judging process for judging whether the text segment is complete or not based on the large model, and if not, returning to the text preprocessing process to re-segment the document text. The knowledge graph construction process specifically comprises the steps of extracting key concept words from an input text segment by using a large language model, constructing the concept words into a knowledge link according to semantic association, simultaneously distributing unique link numbers for the knowledge link, establishing association with the text segment, and fusing the knowledge links into a knowledge graph by taking the same concept nodes in a plurality of knowledge links as connection points to form a semantic network. Further, in the fusion retrieval process, the text segments related to the query sentences through the knowledge graph are specifically included, concept words in the query sentences are extracted, related knowledge links are obtained by matching the concept words in the knowledge graph, repetition rate statistics is carried out on the concept words contained in the matched knowledge links, the Top-N most related knowledge links are selected according to the high and low ranking of the repetition rate, and then the text segments related to the knowledge links a