Search

CN-122021623-A - Evaluation set generation method, device, equipment and storage medium

CN122021623ACN 122021623 ACN122021623 ACN 122021623ACN-122021623-A

Abstract

The application discloses an evaluation set generation method, device and equipment and a storage medium, and relates to the technical field of natural language processing. The method comprises the steps of obtaining a knowledge base document, analyzing the knowledge base document into a plurality of text blocks, extracting vectors and entities of the text blocks for each text block, constructing nodes based on the text blocks, the vectors and the entities, calculating the similarity and the co-occurrence quantity among all the nodes according to the vectors and the entities, dividing all the nodes into associated nodes and isolated nodes according to the similarity and the co-occurrence quantity, inputting the text blocks and the entities of the isolated nodes into a large model to obtain a single-hop evaluation set output by the large model, and inputting the text blocks, the entities and the association relation among the associated nodes into the large model to obtain a multi-hop evaluation set output by the large model.

Inventors

  • Qin Dengda
  • WEI GUIPING
  • ZOU JIAHUI
  • LUO FANG
  • LIU YONG

Assignees

  • 中科云谷科技有限公司

Dates

Publication Date
20260512
Application Date
20251218

Claims (10)

  1. 1. An evaluation set generation method, characterized in that the method comprises: acquiring a knowledge base document and analyzing the knowledge base document into a plurality of text blocks; extracting, for each text block, a vector and an entity of the text block, and constructing a node based on the text block, the vector and the entity; Calculating the similarity and the co-occurrence quantity among all nodes according to the vector and the entity, and dividing all nodes into associated nodes and isolated nodes according to the similarity and the co-occurrence quantity; Inputting the text blocks and the entities of the isolated nodes into a large model to obtain a single-hop evaluation set output by the large model; And inputting the text blocks and the entities of the associated nodes and the association relation among the associated nodes into the large model to obtain a multi-hop assessment set output by the large model.
  2. 2. The evaluation set generation method according to claim 1, wherein calculating the number of co-occurrence and the degree of similarity between all nodes from the vector and the entity comprises: For any two nodes, determining the similarity between the nodes according to the ratio between the vector point sum and the vector length of the nodes; for any two nodes, determining the co-occurrence number between the nodes according to the number of the same entities between the nodes.
  3. 3. The evaluation set generation method according to claim 1, wherein dividing all nodes into associated nodes and isolated nodes according to the similarity and the co-occurrence number comprises: for any two nodes, judging the node as the associated node under the condition that the similarity is larger than a first preset threshold value and the co-occurrence number is larger than a second preset threshold value; and aiming at any two nodes, judging the node as the isolated node under the condition that the similarity is smaller than or equal to the first preset threshold value or the co-occurrence number is smaller than or equal to the second preset threshold value.
  4. 4. The evaluation set generation method according to claim 1, characterized in that the method further comprises: Extracting association relations between text blocks of the association nodes through the large model, constructing association edges corresponding to the association nodes based on the association relations, and summarizing all the association nodes and the association edges to generate a knowledge graph; And acquiring a sub-graph range selected based on the knowledge graph, determining associated nodes related to the sub-graph range, and skipping to execute the step of inputting the text blocks of the associated nodes, the entities and the association relations among the associated nodes into the large model so as to obtain a multi-hop assessment set output by the large model.
  5. 5. The evaluation set generation method according to claim 1, wherein parsing the knowledge base document into a plurality of text blocks comprises: Converting the knowledge base document into a plain text format document; a text slice segmenter in the plain text format document is identified and the plain text format document is segmented into a plurality of the text blocks based on the text slice segmenter.
  6. 6. The evaluation set generation method according to claim 5, characterized in that the method further comprises: identifying adjacent form start tags and form end tags in the plain text format document; Merging cross-page tables between the form start tag and the form end tag in the presence of the text slice segmenter between adjacent form start tag and form end tag.
  7. 7. The evaluation set generation method according to claim 1, characterized in that the method further comprises: Determining all evaluation examples in the single-hop evaluation set and the multi-hop evaluation set, wherein each evaluation example comprises a question, a reference answer and a context; And judging that the evaluation example is a low-quality evaluation example and rejecting the low-quality evaluation example under the condition that the answer correlation between the question and the reference answer is smaller than a third preset threshold or the loyalty between the reference answer and the context is smaller than a fourth preset threshold.
  8. 8. An evaluation set generation device, comprising: A memory configured to store instructions; a processor configured to invoke the instructions from the memory and when executing the instructions is capable of implementing the assessment set generating method according to any one of claims 1 to 7.
  9. 9. An evaluation set generation apparatus, characterized by comprising: the evaluation set generating device according to claim 8.
  10. 10. A machine-readable storage medium having instructions stored thereon, which when executed by a processor cause the processor to be configured to perform the assessment set generation method of any of claims 1 to 7.

Description

Evaluation set generation method, device, equipment and storage medium Technical Field The present application relates to the field of natural language processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating an evaluation set. Background RAG (RETRIEVAL-Augmented Generation, search enhancement generation) effectively relieves the limitations of a large model in aspects of illusion, knowledge timeliness, field adaptability and the like by introducing external knowledge search, so that the floor application of the large model in various vertical fields is accelerated. RAG evaluation is taken as an important means for evaluating RAG performance, provides a key basis for establishing a performance benchmark, and supports the optimization and landing of RAG assistants. The core of RAG evaluation is to construct a high-quality RAG evaluation set, and the quality of the RAG evaluation set directly determines the accuracy and reliability of an evaluation result. In order to accurately evaluate RAG performance, it is often necessary to construct larger scale evaluation sets. However, the construction of such assessment sets often relies on the participation of experts with profound business knowledge to ensure their rationality, effectiveness, and truly reflect the actual business scenario. This way of relying on expert manual construction requires a great deal of human resources and is costly. Therefore, the generation of RAG assessment sets using automated methods has become an urgent need. Although the existing methods have made some progress in the automatic construction of RAG assessment sets, end-to-end construction from knowledge base documents to RAG assessment sets has not been achieved. Disclosure of Invention The embodiment of the application aims to provide an evaluation set generation method, device and equipment and a storage medium. In order to achieve the above object, a first aspect of the present application provides an evaluation set generation method, comprising: acquiring a knowledge base document and analyzing the knowledge base document into a plurality of text blocks; extracting a vector and an entity of the text block for each text block, and constructing a node based on the text block, the vector and the entity; Calculating the similarity and the co-occurrence quantity among all the nodes according to the vector and the entity, and dividing all the nodes into associated nodes and isolated nodes according to the similarity and the co-occurrence quantity; inputting the text blocks and the entities of the isolated nodes into a large model to obtain a single-hop evaluation set output by the large model; And inputting the text blocks and the entities of the associated nodes and the association relation among the associated nodes into a large model to obtain a multi-hop assessment set output by the large model. The method and the device for calculating the co-occurrence quantity of the nodes according to the vector and the entities comprise the steps of determining the similarity between the nodes according to the ratio between the vector point sum and the vector length of the nodes for any two nodes, and determining the co-occurrence quantity between the nodes according to the quantity of the same entities between the nodes for any two nodes. In the embodiment of the application, all nodes are divided into associated nodes and isolated nodes according to the similarity and the co-occurrence number, wherein the nodes are judged to be associated nodes when the similarity is larger than a first preset threshold value and the co-occurrence number is larger than a second preset threshold value for any two nodes, and the nodes are judged to be isolated nodes when the similarity is smaller than or equal to the first preset threshold value or the co-occurrence number is smaller than or equal to the second preset threshold value for any two nodes. The method comprises the steps of extracting the association relation among text blocks of the association nodes through the large model, constructing association edges corresponding to the association nodes based on the association relation, summarizing all the association nodes and the association edges to generate a knowledge graph, obtaining a sub-graph range selected based on the knowledge graph, determining the association nodes related to the sub-graph range, and jumping to execute the step of inputting the association relation among the text blocks, the entities and the association nodes of the association nodes into the large model to obtain a multi-jump evaluation set output by the large model. In an embodiment of the application, parsing the knowledge base document into a plurality of text blocks includes converting the knowledge base document into a plain text format document, identifying text-slice segmenters in the plain text format document, and segmenting the plain text format document