CN-121979758-A - Evaluation method and device of intention understanding result

CN121979758ACN 121979758 ACN121979758 ACN 121979758ACN-121979758-A

Abstract

The disclosure provides a method and a device for evaluating an intention understanding result, and relates to the field of artificial intelligence such as deep learning, large language models, natural language understanding, voice assistants and the like. The method comprises the steps of obtaining target evaluation information, wherein the target evaluation information comprises target problems and target results, the target results are intended understanding results generated by an intended understanding model aiming at the target problems, determining target reference information corresponding to the target evaluation information from candidate reference information in a knowledge base, and generating evaluation results corresponding to the target evaluation information according to the target reference information, wherein the evaluation results are used for explaining whether the target results accurately understand the intention of the target problems. By applying the scheme disclosed by the disclosure, the accuracy of the evaluation result and the like can be improved.

Inventors

Lu Nijia
SHI JINSHUAI
LIU TENGLONG
ZHANG YAN

Assignees

百度时代网络技术（北京）有限公司

Dates

Publication Date: 20260505
Application Date: 20251204

Claims (15)

1. An evaluation method of an intended understanding result, comprising: Acquiring target evaluation information, wherein the target evaluation information comprises target questions and target results, and the target results are intended understanding results generated by an intended understanding model aiming at the target questions; determining target reference information corresponding to the target evaluation information from candidate reference information in a knowledge base; and generating an evaluation result corresponding to the target evaluation information according to the target reference information, wherein the evaluation result is used for explaining whether the target result correctly understands the intention of the target problem.
2. The method of claim 1, wherein the determining, from candidate reference information in a knowledge base, target reference information corresponding to the target evaluation information includes: and carrying out primary screening on the candidate reference information in the knowledge base to obtain screened intermediate reference information, and carrying out secondary screening on the intermediate reference information to obtain the target reference information.
3. The method of claim 2, wherein the initially screening candidate reference information in the knowledge base comprises: And respectively acquiring target similarity between each candidate reference information in the knowledge base and the target evaluation information, and screening the intermediate reference information from the knowledge base according to the target similarity.
4. The method of claim 3, wherein, The target evaluation information also comprises target upper information, wherein the target upper information is the question-answering content of at least one round of questions-answering with a voice assistant before the target problem, the voice assistant utilizes the intention understanding model to carry out intention understanding on the input problem, and the target result is an intention understanding result generated by combining the target upper information; the candidate reference information comprises a reference question, a reference result and reference above information, wherein the reference result is an intention understanding result corresponding to the reference question generated by combining the reference above information, and the reference above information is the question answering content of at least one round of question answering performed with the voice assistant before the reference question.
5. The method of claim 4, wherein, The target similarity comprises at least one of a first similarity, a second similarity and a third similarity; The first similarity is a similarity between the candidate reference information and the target evaluation information determined according to the target problem and the reference problem, the second similarity is a similarity between the candidate reference information and the target evaluation information determined according to the target problem, the target context information, the reference problem and the reference context information, and the third similarity is a similarity between the candidate reference information and the target evaluation information determined according to the target result and the reference result.
6. The method of claim 5, wherein the screening the intermediate reference information from the knowledge base according to the target similarity comprises: In response to determining that the number of the target similarities is 1, sequencing each candidate reference information in the knowledge base according to the sequence from the large value to the small value of the target similarities, determining the candidate reference information in the first M bits after sequencing as the intermediate reference information, wherein M is a positive integer greater than 1; And in response to determining that the number of the target similarities is greater than 1, determining comprehensive similarity by combining each target similarity according to each candidate reference information in the knowledge base, sequencing each candidate reference information according to the order of the values of the comprehensive similarity from large to small, and determining the candidate reference information in the first M bits after sequencing as the intermediate reference information.
7. The method of claim 3, wherein, The target similarity includes a vector similarity.
8. The method of claim 2, wherein the performing the secondary screening on the intermediate reference information to obtain the target reference information comprises: Generating a first prompt word according to the intermediate reference information and the target evaluation information; Inputting the first prompt word into a screening model to obtain N pieces of target reference information screened by the screening model from the intermediate reference information, wherein N is a positive integer and is smaller than M.
9. The method of claim 1, wherein the generating the evaluation result corresponding to the target evaluation information comprises: generating a second prompt word according to the target evaluation information and the target reference information; And inputting the second prompt word into an evaluation model to obtain an output evaluation result, wherein the evaluation result comprises a first result and a second result, the first result represents that the target result correctly understands the intention of the target problem, and the second result represents that the target result does not correctly understand the intention of the target problem.
10. The method of claim 9, further comprising: in response to determining that the evaluation result is the second result, obtaining a manual verification result for the second result; In response to determining that the manual verification result is an evaluation error, adding the target evaluation information as the candidate reference information to the knowledge base; And in response to determining that the manual verification result is correct in evaluation, acquiring a manual correction result for the target result, adding the corrected target evaluation information into the knowledge base as the candidate reference information, and optimizing the intention understanding model according to the corrected target evaluation information.
11. The method of claim 9, further comprising: Responding to the determination that the evaluation result is the second result, determining and recording an error type corresponding to the target evaluation information; and responding to the fact that the report generating conditions are met, counting the occurrence times of different error types in the latest preset time according to the recorded content, and generating a visual report according to the counting result for display.
12. An evaluation device for intention understanding results comprises a first acquisition module, a first screening module and a first evaluation module; The first acquisition module is used for acquiring target evaluation information, wherein the target evaluation information comprises target questions and target results, and the target results are intended understanding results generated by an intended understanding model aiming at the target questions; The first screening module is used for determining target reference information corresponding to the target evaluation information from candidate reference information in a knowledge base; The first evaluation module is configured to generate an evaluation result corresponding to the target evaluation information according to the target reference information, where the evaluation result is used to explain whether the target result correctly understands the intention of the target problem.
13. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-11.
15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-11.

Description

Evaluation method and device of intention understanding result Technical Field The present disclosure relates to the field of artificial intelligence, and in particular, to the fields of deep learning, large language models, natural language understanding, voice assistants, and the like, and more particularly, to a method and apparatus for evaluating an intended understanding result. Background Currently, voice assistants are no longer a simple voice command response tool, but rather an Agent with complex intent understanding and execution capabilities using a large language model (LLM, large Language Model) as a core engine. The large language model is a deep learning model trained using a large amount of text data, and is capable of generating natural language text, understanding meaning of the language text, and the like. Disclosure of Invention The present disclosure provides methods and apparatus for evaluating intent understanding results. An evaluation method of an intended understanding result, comprising: Acquiring target evaluation information, wherein the target evaluation information comprises target questions and target results, and the target results are intended understanding results generated by an intended understanding model aiming at the target questions; determining target reference information corresponding to the target evaluation information from candidate reference information in a knowledge base; and generating an evaluation result corresponding to the target evaluation information according to the target reference information, wherein the evaluation result is used for explaining whether the target result correctly understands the intention of the target problem. An evaluation device for intention understanding results comprises a first acquisition module, a first screening module and a first evaluation module; The first acquisition module is used for acquiring target evaluation information, wherein the target evaluation information comprises target questions and target results, and the target results are intended understanding results generated by an intended understanding model aiming at the target questions; The first screening module is used for determining target reference information corresponding to the target evaluation information from candidate reference information in a knowledge base; The first evaluation module is configured to generate an evaluation result corresponding to the target evaluation information according to the target reference information, where the evaluation result is used to explain whether the target result correctly understands the intention of the target problem. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described above. A computer program product comprising computer programs/instructions which when executed by a processor implement a method as described above. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: FIG. 1 is a flowchart of a first embodiment of a method of evaluating an intended understanding result according to the present disclosure; FIG. 2 is a schematic diagram of a process for determining target reference information corresponding to target evaluation information according to a knowledge base in the disclosure; FIG. 3 is a flowchart of a second embodiment of a method of evaluating an intended understanding result according to the present disclosure; Fig. 4 is a schematic diagram of the composition structure of a first embodiment 400 of an evaluation device for the intended understanding result according to the present disclosure; fig. 5 is a schematic diagram showing the constitution of a second embodiment 500 of the evaluation device for understanding the result of intention according to the present disclosure; Fig. 6 shows a schematic block diagram of an electronic device 600 that may be used to implement embodiments of the present disclosure. Detailed Description Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in