Search

CN-122021935-A - Knowledge graph enhanced retrieval generation question-answering method and system for postal service field

CN122021935ACN 122021935 ACN122021935 ACN 122021935ACN-122021935-A

Abstract

The invention discloses a knowledge graph enhanced search generation question-answering method and system for the postal service field, and belongs to the technical field of information search. Aiming at the problem that the existing large model is easy to generate condition dislocation and reality illusion under multiple nesting rules, the invention provides a double-path mixed retrieval and dynamic fine-ranking framework. Based on the construction of a vector library and a lightweight map, a dynamic mixed scoring model fused with multidimensional features is innovatively introduced. The system calculates initial semantic similarity score and atlas entity alignment score in parallel, establishes a strong rule one-ticket overrule mechanism, and calculates final reordering score. The invention carries out forced constraint and elimination on flexible semantic features through a rigid map rule, intercepts rule fragments which are absolutely matched and guides large model generation. The invention thoroughly removes the dislocation phenomenon in the complex tariff calculation, and greatly improves the accuracy and compliance of service question and answer.

Inventors

  • WANG XIAOTONG
  • YU XIULI

Assignees

  • 北京邮电大学

Dates

Publication Date
20260512
Application Date
20260326

Claims (9)

  1. 1. A knowledge graph enhanced search generation question-answering method oriented to the postal service field is characterized by comprising the following steps: Acquiring a postal service document input by a user, preprocessing and reducing noise to obtain a high-purity knowledge block; Carrying out double-source extraction on the high-purity knowledge blocks, and respectively constructing a vector database and a lightweight knowledge graph; Receiving a user query, identifying a service intention of the user query, and performing dynamic intention routing based on the service intention; According to the routing result, performing two-way mixed retrieval based on knowledge graph enhancement to obtain rule fragments which are absolutely matched with the user query; and using the rule fragments as knowledge constraint conditions, and generating and outputting a question-answer result with traceability references by using a large language model.
  2. 2. The method of claim 1, wherein the obtaining the postal service document input by the user performs preprocessing and noise reduction to obtain a high purity knowledge block, and specifically comprises: The method comprises the steps of using a large language model as a structure analyzer to accurately identify and remove non-text structured noise in document data, wherein the structured noise comprises catalogues, page numbers and page-crossing table broken lines; And carrying out semantic level slicing on the denoised plain text by adopting a sliding window strategy with the overlapping rate, dynamically and adaptively adjusting the step length of the sliding window according to the paragraph length of the postal service rule, and finally outputting a high-purity knowledge block with complete context logic.
  3. 3. The method according to claim 1, wherein the constructing a vector database and a lightweight knowledge-graph respectively, specifically comprises: The vector library construction, namely calling an embedded model optimized in the Chinese field, converting the high-purity knowledge block into a high-dimensional dense vector with fixed dimension, and storing the high-dimensional dense vector into a vector database supporting approximate nearest neighbor search; And constructing a knowledge graph, namely accurately extracting postal service core elements from the high-purity knowledge block by utilizing an entity identification and relation extraction algorithm, wherein the core elements comprise regions, weight intervals, tariff standards and service types, and constructing the lightweight knowledge graph reflecting discrete charging rules by carrying out structural association on the elements in a triplet form.
  4. 4. The method for generating questions and answers by knowledge graph enhancement search for postal service as claimed in claim 1, wherein said receiving user query, identifying the service intention of said user query, and performing dynamic intention routing based on said service intention comprises: extracting key entity information in user inquiry and evaluating the completeness of inquiry conditions; if the user query is judged to be the conventional postal concept query, guiding the routing node to a single-path vector for retrieval; And if the user inquiry is judged to relate to specific weight, tariff or numerical rule calculation, guiding the routing node to the two-way mixed retrieval based on the knowledge graph enhancement.
  5. 5. The method according to claim 4, wherein the performing of the two-way hybrid search based on knowledge-graph enhancement specifically comprises a dynamic hybrid scoring model based on multidimensional feature fusion, and the score calculation and filtering mechanism is as follows: Extracting high-dimensional embedded vectors of user query and candidate knowledge segments, calculating cosine similarity of the user query and the candidate knowledge segments, and obtaining initial semantic scores ; Dividing the extracted business entity into a region set, a numerical value set and a business type set, and calculating a map entity matching score by using a Boolean matching function and a numerical value interval judging function ; Introducing a pattern conflict one-ticket overrule penalty term If the core condition conflicting with the user query exists in the candidate segment, then the method causes Is minus infinity; final composite score Wherein And (3) with Is a preset balance coefficient and meets Setting up for postal service tariff inquiry scenario Intercepting by the system Fragments greater than the set threshold are used as rule fragments for absolute matching.
  6. 6. The method for generating questions and answers by enhancing the knowledge graph in the postal service as claimed in claim 1, wherein the method for generating and outputting the questions and answers with traceability references by using the rule segment as a knowledge constraint condition by using a large language model comprises the following steps: Splicing the rule segments which are absolutely matched with the user query, and assembling to generate Prompt words (Prompt) with strong constraint; inputting the prompt word into a large language model of local physical isolation deployment; the large language model carries out logical reasoning and generation strictly according to rule fragments in prompt words, extracts provenance information of the rule fragments, and outputs final answers containing fact basis and traceability references.
  7. 7. A knowledge graph enhanced search generation question-answering system oriented to the postal service field is characterized by comprising: the data noise reduction module is used for acquiring a postal service document input by a user, preprocessing and reducing noise to obtain a high-purity knowledge block; The knowledge base construction module is used for carrying out double-source extraction on the high-purity knowledge blocks and respectively constructing a vector database and a lightweight knowledge graph; The intention routing module is used for receiving user inquiry, identifying the service intention of the user inquiry and carrying out dynamic intention routing based on the service intention; The double-path mixed retrieval module is used for executing double-path mixed retrieval based on knowledge graph enhancement according to the routing result to acquire rule fragments which are absolutely matched with the user inquiry; and the large model constraint generation module is used for generating and outputting a question-answer result with a traceability reference by using the rule segment as a knowledge constraint condition and utilizing a large language model.
  8. 8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements a knowledge-graph enhanced retrieval generation question-answering method oriented to the postal service as claimed in any one of claims 1 to 6.
  9. 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the postal service field oriented knowledge-graph enhanced retrieval generation question-answering method of any one of claims 1 to 6 when the program is executed by the processor.

Description

Knowledge graph enhanced retrieval generation question-answering method and system for postal service field Technical Field The invention relates to the technical fields of artificial intelligence, natural language processing and information retrieval, in particular to a retrieval enhancement generation (RAG) method and a system based on knowledge graph enhancement retrieval precision and constraint large model output, which are particularly suitable for postal and vertical business intelligent question-answering scenes with complex rules and extremely high strict requirements. Background With the rapid development of Large Language Models (LLMs), intelligent question-answering systems are widely used in various industries. However, in the field of vertical business such as postal service, the existing intelligent question-answering scheme still has a significant technical bottleneck. In the prior art, the scheme for solving the intelligent question and answer in the vertical field mainly comprises the steps of firstly directly using a pre-trained large language model to carry out zero-sample question and answer, secondly using a traditional retrieval enhancement generation (RAG) technology to store document blocks into a vector database, recalling a context by calculating the vector similarity between a user question and a document block, and then giving the large model to generate an answer. The prior art has the following serious defects: The illusion is serious and the compliance is poor, the large model lacks external absolute accurate knowledge base constraint, and the illusion of facts and the false amount generation are extremely easy to generate when the price standard with extremely high timeliness and accuracy requirements in the postal field is faced. The lack of logical reasoning is extremely easy to generate conditional dislocation, and the traditional RAG only depends on fuzzy semantic similarity retrieval. While postal tariffs often contain multiple nested discrete logic conditions (e.g., specific originating provinces to destination provinces + weights greater than a specified threshold + specific premium, etc.). The inability of pure vector retrieval to align logical conditions often results in poor recall accuracy due to wrong province or tariff regulations for wrong weight intervals. The data noise interference is large, the official postal service documents usually contain a large amount of structured noise such as catalogues, page numbers, table broken lines and the like, and direct vectorization can lead to overlapping of retrieval space features, so that the recall rate of the system is further reduced. Disclosure of Invention In order to overcome the defects that the traditional vector retrieval cannot process multiple logic constraints and a large language model is easy to generate a fact illusion in the prior art, the invention aims to provide a knowledge graph enhanced retrieval generation question-answering method and system for the postal service field, so as to provide accurate, traceable and zero illusion intelligent question-answering service under the complex postal service rule. In order to achieve the above purpose, the present invention provides the following technical solutions: A knowledge graph enhanced search generation question-answering method oriented to the postal service field comprises the following steps: Acquiring a postal service document input by a user, preprocessing and reducing noise to obtain a high-purity knowledge block; Carrying out double-source extraction on the high-purity knowledge blocks, and respectively constructing a vector database and a lightweight knowledge graph; Receiving a user query, identifying a service intention of the user query, and performing dynamic intention routing based on the service intention; According to the routing result, performing two-way mixed retrieval based on knowledge graph enhancement to obtain rule fragments which are absolutely matched with the user query; and using the rule fragments as knowledge constraint conditions, and generating and outputting a question-answer result with traceability references by using a large language model. Preferably, the obtaining the postal service document input by the user performs preprocessing and noise reduction to obtain a high-purity knowledge block, including: collecting official document data in the field of target postal service, and removing structural noise in the document data by using a large language model, wherein the structural noise comprises catalogues, page numbers and table broken lines; and slicing the denoised document data by adopting a sliding window strategy with the overlapping rate according to the semantic integrity to obtain a high-purity knowledge block with context logic. Preferably, the constructing the vector database and the lightweight knowledge graph respectively includes: converting the high-purity knowledge block into a high-dimensional vect