CN-121983083-A - Training and scoring method for spoken language scoring model, electronic equipment and storage medium

CN121983083ACN 121983083 ACN121983083 ACN 121983083ACN-121983083-A

Abstract

The application discloses a training and scoring method of a spoken language scoring model, electronic equipment and a storage medium, wherein the training method comprises the steps of obtaining text information of spoken language questions, answer voice data and scoring information; the method comprises the steps of performing voice recognition on answer voice data to obtain answer text information corresponding to the answer voice data, inputting spoken test question text information, the answer text information and grading information into a preset language model to output grading reason information, using spoken test question type information, spoken test question text information and the answer text information to generate input training data, using grading information and grading interpretation information to generate output training data, wherein the grading interpretation information comprises grading reason information, and using the input training data and the output training data to generate spoken grading training data to train a spoken grading model. The spoken language scoring model has the capability of outputting scoring information and scoring reason information corresponding to the scoring information.

Inventors

XIA KUN
WU KUI
SHENG ZHICHAO
WANG SHIJIN
LIU CONG
HU GUOPING

Assignees

科大讯飞股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. A method of training a spoken language scoring model, comprising: Acquiring spoken test question text information, answer voice data aiming at the spoken test question text information and grading information aiming at the answer voice data; Performing voice recognition on the answer voice data to obtain answer text information corresponding to the answer voice data; inputting the spoken test question text information, the answer text information and the scoring information into a preset language model to output scoring reason information about the scoring information; The spoken test question type information corresponding to the spoken test question text information, the spoken test question text information and the answer text information are used for generating input training data; Using the scoring information and scoring interpretation information about the scoring information to generate output training data, wherein the scoring interpretation information includes the scoring theory information; the input training data and the output training data are used to generate spoken scoring training data for training a spoken scoring model.
2. The training method of claim 1, wherein training the spoken scoring model comprises: Training a spoken language scoring model by using a preset loss function; The preset loss function comprises a first cross entropy loss function and a second cross entropy loss function, wherein the first cross entropy loss function represents cross entropy loss of the scoring information in the output training data under the input training data, and the second cross entropy loss function represents cross entropy loss of the scoring information in the output training data under the input training data and the scoring information.
3. Training method according to claim 2, characterized in that the preset loss function comprises a weighted first cross entropy loss function and a weighted second cross entropy loss function, wherein the weight of the first cross entropy loss function is larger than the weight of the second cross entropy loss function.
4. The training method of claim 1, wherein using the scoring information and the scoring interpretation information to generate output training data comprises: the scoring information is disposed before the scoring interpretation information to generate the output training data.
5. The training method according to any one of claims 1 to 4, wherein inputting the spoken test question text information, the answer text information, and the scoring information into a preset language model to output scoring reason information about the scoring information, comprises: Inputting the spoken test question text information, the answer text information and the scoring information into the preset language model to output the answer text information and the scoring information, wherein the answer text information is positioned before the scoring information; the comment resolution information also includes the answer text information.
6. A method of spoken language scoring, comprising: acquiring target spoken test question text information, target answer voice data aiming at the target spoken test question text information and target spoken test question type information corresponding to the target spoken test question text information; Performing voice recognition on the target answer voice data to obtain target answer text information corresponding to the target answer voice data; Inputting the target spoken test question type information, the target spoken test question text information and the target answer text information into a spoken scoring model to output target spoken scoring information for the target answer text information, wherein the target spoken scoring information comprises target spoken scoring and target spoken scoring interpretation information for interpreting the target spoken scoring; wherein the spoken language scoring model is derived based on the training method of the spoken language scoring model of any one of claims 1-5.
7. The method of claim 6, wherein in the target spoken score information, the target spoken score precedes the target spoken score interpretation information.
8. The method of claim 6 or 7, wherein outputting target spoken scoring information for the target answer speech data comprises: responding to a first output instruction, determining that the first output instruction corresponds to the target spoken language score, and outputting the target spoken language score according to the output length corresponding to the target spoken language score; And responding to a second output instruction, determining that the second output instruction corresponds to the target spoken language score and the target spoken language score interpretation information, and outputting the target spoken language score and the target spoken language score interpretation information according to the output length corresponding to the target spoken language score and the target spoken language score interpretation information.
9. An electronic device comprising a memory and a processor coupled to each other, the processor configured to execute program instructions stored in the memory to implement the method of training the spoken scoring model of any one of claims 1-5, and to implement the method of spoken scoring of any one of claims 6-8.
10. A non-transitory computer readable storage medium having program instructions stored thereon, which when executed by a processor, implement the method of training the spoken scoring model of any one of claims 1-5, and implement the method of spoken scoring of any one of claims 6-8.

Description

Training and scoring method for spoken language scoring model, electronic equipment and storage medium Technical Field The application relates to the technical field of artificial intelligence, in particular to a training method of a spoken language scoring model, a spoken language scoring method, electronic equipment and a storage medium. Background With the continuous deep education reformulation and continuous progress of large language model capability, the spoken language scoring task is increasingly advanced towards a direction with strong interpretability and generalization capability. The interpretability refers to the basis and reason that the score can be fed back while the score is output by the automatic scoring system, so that the user experience is improved. The benefit of this interpretability is also that adding interpretation content can help researchers to more quickly discover problems with the scoring system, thereby speeding up development progress. The generalization capability refers to the more excellent general capability of a large language model with a smaller size model, so that it can adapt to more diverse questions, questions and scoring criteria. Therefore, how to improve the interpretability and generalization ability of the automatic spoken language scoring system is an important research topic in the current general spoken language scoring system. Disclosure of Invention In order to solve the problems, the application at least provides a training and scoring method of a spoken language scoring model, electronic equipment and a storage medium, which can output scores and corresponding interpretations of the scores, and improve user experience. The application provides a training method of a spoken language scoring model, which comprises the steps of obtaining spoken language test text information, answer voice data aiming at the spoken language test text information and scoring information aiming at the answer voice data, carrying out voice recognition on the answer voice data to obtain answer text information corresponding to the answer voice data, inputting the spoken language test text information, the answer text information and the scoring information into a preset language model to output scoring reason information related to the scoring information, using the spoken language test text information, the spoken language test text information and the answer text information corresponding to the spoken language test text information to generate input training data, using the scoring information and scoring interpretation information related to the scoring information to generate output training data, wherein the scoring interpretation information comprises the scoring information, and using the input training data and the output training data to generate spoken language scoring training data to train the spoken language scoring model. In some embodiments, training the spoken language scoring model includes training the spoken language scoring model with a preset loss function, wherein the preset loss function includes a first cross entropy loss function and a second cross entropy loss function, wherein the first cross entropy loss function represents cross entropy loss of the scoring information in the output training data under the input training data, and the second cross entropy loss function represents cross entropy loss of the scoring interpretation information in the output training data under the input training data and the scoring information. In some embodiments, the preset loss function comprises a weighted first cross entropy loss function and a weighted second cross entropy loss function, wherein the weight of the first cross entropy loss function is greater than the weight of the second cross entropy loss function. In some embodiments, using the scoring information and the scoring interpretation information to generate output training data includes locating the scoring information before the scoring interpretation information to generate the output training data. In some embodiments, inputting the spoken test question text information, the answer text information, and the scoring information into a preset language model to output scoring reason information about the scoring information includes inputting the spoken test question text information, the answer text information, and the scoring information into the preset language model to output the answer text information and the scoring information, wherein the answer text information precedes the scoring information, and the scoring information further includes the answer text information. The application provides a spoken language grading method, which comprises the steps of obtaining target spoken language test question text information, target answer voice data aiming at the target spoken language test question text information and target spoken language test question type information corresponding to the t