CN-119692351-B - Text generation model evaluation method and device

CN119692351BCN 119692351 BCN119692351 BCN 119692351BCN-119692351-B

Abstract

The invention provides a text generation model evaluation method and device, which are applied to the technical field of natural language processing. The method comprises the steps of obtaining test data, generating a plurality of test tasks according to the test data, wherein one test task corresponds to a plurality of test problems, inputting the test problems of the plurality of test tasks into a text generation model respectively to obtain corresponding test results, and determining the mastery degree of the text generation model on ontology knowledge according to the test results, wherein the plurality of test tasks comprise entity category-level memory tests, entity category-level application tests, entity attribute memory tests and entity attribute application tests.

Inventors

CHEN YUBO
LIU KANG
ZHAO JUN
ZHOU TONG
Qin Xiaotong

Assignees

中国科学院自动化研究所

Dates

Publication Date: 20260505
Application Date: 20250225

Claims (6)

1. A text generation model evaluation method, comprising: acquiring test data, wherein the test data is constructed based on an ontology knowledge framework, and the ontology knowledge framework comprises entity categories organized according to top class-middle class-bottom class hierarchical relations and entity attributes corresponding to each entity category; generating a plurality of test tasks according to the test data, wherein one test task corresponds to a plurality of test problems; Respectively inputting the test questions of the plurality of test tasks into a text generation model to obtain corresponding test results, and determining the mastering degree of the text generation model on the ontology knowledge according to the test results; wherein the plurality of test tasks includes an entity category-hierarchy memory test, an entity category-hierarchy application test, an entity attribute memory test, and an entity attribute application test; The mastering degree of the text generation model on the ontology knowledge is determined according to the test result, and the mastering degree comprises the steps of evaluating the memory capacity of the text generation model on the ontology knowledge through the entity category-level memory test and the entity attribute memory test, and evaluating the application capacity of the text generation model on the ontology knowledge through the entity category-level application test and the entity attribute application test; Determining the mastery degree of the text generation model on the ontology knowledge based on the score of the memory capability and the score of the application capability; In the case that the test task is the entity class-level memory test, the plurality of test questions includes an instance-bottom class judgment, an instance-top class judgment, a bottom class-instance judgment, a top class-instance judgment, a bottom class-middle class judgment, a bottom class-top class judgment, a middle class-bottom class judgment, and a top class-bottom class judgment; in the case that the test task is the entity class-level application test, the plurality of test questions include different class judgments, different level judgments, and class level arrangements; in the case that the test task is the entity attribute memory test, the plurality of test questions include class-definition judgment, definition-class judgment and subject-relationship-object judgment; in the case where the test task applies a test for the entity attribute, the plurality of test questions includes a negative object determination and a negative subject determination.
2. The text generation model evaluation method of claim 1, wherein the entity class-level memory test is used to evaluate the text generation model's ability to memorize class hierarchies, the entity class-level application test is used to evaluate the text generation model's ability to apply level knowledge in real scenes, the entity attribute memory test is used to evaluate the text generation model's ability to memorize class definitions and attributes, and the entity attribute application test is used to evaluate the text generation model's ability to apply attribute knowledge in real scenes.
3. The text generation model evaluation device is characterized by comprising an acquisition module and a processing module; The acquisition module is used for acquiring test data, the test data is constructed based on an ontology knowledge framework, and the ontology knowledge framework comprises entity categories organized according to top class-middle class-bottom class hierarchical relations and entity attributes corresponding to each entity category; The processing module is used for generating a plurality of test tasks according to the test data, wherein one test task corresponds to a plurality of test problems; respectively inputting the test questions of the plurality of test tasks into a text generation model to obtain corresponding test results, and determining the mastering degree of the text generation model on the ontology knowledge according to the test results; wherein the plurality of test tasks includes an entity category-hierarchy memory test, an entity category-hierarchy application test, an entity attribute memory test, and an entity attribute application test; The processing module is used for evaluating the memory capacity of the text generation model to the ontology knowledge through the entity category-level memory test and the entity attribute memory test, and evaluating the application capacity of the text generation model to the ontology knowledge through the entity category-level application test and the entity attribute application test; Determining the mastery degree of the text generation model on the ontology knowledge based on the score of the memory capability and the score of the application capability; In the case that the test task is the entity class-level memory test, the plurality of test questions includes an instance-bottom class judgment, an instance-top class judgment, a bottom class-instance judgment, a top class-instance judgment, a bottom class-middle class judgment, a bottom class-top class judgment, a middle class-bottom class judgment, and a top class-bottom class judgment; in the case that the test task is the entity class-level application test, the plurality of test questions include different class judgments, different level judgments, and class level arrangements; in the case that the test task is the entity attribute memory test, the plurality of test questions include class-definition judgment, definition-class judgment and subject-relationship-object judgment; in the case where the test task applies a test for the entity attribute, the plurality of test questions includes a negative object determination and a negative subject determination.
4. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the text generation model evaluation method according to any one of claims 1 to 2 when executing the computer program.
5. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the text generation model evaluation method according to any one of claims 1 to 2.
6. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements a text generation model evaluation method according to any one of claims 1 to 2.

Description

Text generation model evaluation method and device Technical Field The present invention relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for evaluating a text generation model. Background In recent years, large language models (Large Language Models, LLMs) based on massive pre-training data exhibit excellent performance in tasks such as natural language understanding, text generation, question-answering systems, and the like. The method has the core advantages that rich world knowledge is encoded through a self-supervision learning mechanism, and the method can be flexibly applied to scenes such as dialogue interaction, information retrieval, knowledge reasoning and the like. However, existing research focuses on the storage and extraction capabilities of models for discrete factual knowledge (e.g., entities, dates, events), while ignoring the more complex logical associations and hierarchies in the knowledge system. The isolated knowledge representation mode has obvious difference with a knowledge organization mode based on a concept network in human cognition, so that the problems of logic fracture, semantic deviation and the like of the model are easy to occur in tasks requiring context deep reasoning or cross-domain knowledge correlation. From the perspective of cognitive science, efficient utilization of knowledge relies on a hierarchical system of concepts (classes) and structured expression of their attribute relationships, while Ontology (Ontology) serves as a formal knowledge representation framework, and by defining concepts, attributes and logical relationships within the domain, a standardized semantic model can be provided for knowledge organization. Research shows that the integration of ontology knowledge into language model training can significantly enhance the accuracy of understanding of domain terms and the consistency of contextual reasoning. However, no systematic evaluation system is established in the prior art to quantify the completeness and structuring degree of ontology knowledge in a language model, so that the model optimization lacks targeted guidance. Disclosure of Invention The invention provides a text generation model evaluation method and a text generation model evaluation device, which are used for solving the problem that a systematic evaluation system is not established in the prior art so as to quantify the completeness and the structuring degree of ontology knowledge in a language model, and thus, the model optimization lacks targeted guidance. The invention provides a text generation model evaluation method which comprises the steps of obtaining test data, generating a plurality of test tasks according to the test data, wherein one test task corresponds to a plurality of test problems, respectively inputting the test problems of the plurality of test tasks into a text generation model to obtain corresponding test results, and determining the mastery degree of the text generation model on ontology knowledge according to the test results, wherein the plurality of test tasks comprise entity category-level memory tests, entity category-level application tests, entity attribute memory tests and entity attribute application tests. According to the text generation model evaluation method provided by the invention, the entity category-level memory test is used for evaluating the memory capacity of the text generation model for a category hierarchy structure, the entity category-level application test is used for evaluating the capacity of the text generation model for applying level knowledge in an actual scene, the entity attribute memory test is used for evaluating the memory capacity of the text generation model for category definition and attribute, and the entity attribute application test is used for evaluating the capacity of the text generation model for applying attribute knowledge in the actual scene. According to the text generation model evaluation method provided by the invention, under the condition that the test task is the entity category-level memory test, the plurality of test problems comprise example-bottom category judgment, example-top category judgment, bottom category-example judgment, top category-example judgment, bottom category-middle category judgment, bottom category-top category judgment, middle category-bottom category judgment and top category-bottom category judgment. According to the text generation model evaluation method provided by the invention, under the condition that the test task is the entity category-level application test, the plurality of test problems comprise different category judgment, different level judgment and category level arrangement. According to the text generation model evaluation method provided by the invention, under the condition that the test task is the entity attribute memory test, the plurality of test problems comprise class-definition judg