CN-122021919-A - Model-based training processing method, device, equipment and readable storage medium

CN122021919ACN 122021919 ACN122021919 ACN 122021919ACN-122021919-A

Abstract

The application discloses a training processing method, a device, equipment and a readable storage medium based on a model, wherein the method comprises the steps of obtaining N training samples, wherein the N training samples comprise M comparison samples; the task model is used for judging whether the training samples are true or false, a task model is called to execute an inference task on each training sample to obtain a first prediction label and a first prediction inference link of each training sample, classification loss of the task model is determined according to differences between real labels and the first prediction labels of each training sample in N training samples, and the task model is trained by adopting the classification loss and the comparison loss according to differences between correct inference links and first prediction inference links of each comparison sample in M comparison samples and differences between error inference links and first prediction inference links of each comparison sample in the M comparison samples, so that the task model is constructed. By adopting the application, the reasoning performance of the trained model can be optimized.

Inventors

ZHANG MANMAN
HAO YANCHAO
CHEN XI

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260512
Application Date: 20260210

Claims (17)

1. A model-based training process, the method comprising: Acquiring a training sample set, wherein the training sample set comprises N training samples and real labels of each training sample, the N training samples comprise M comparison samples, the training sample set also comprises error labels of the M comparison samples, correct reasoning links of the M comparison samples and error reasoning links of the M comparison samples, and N, M is a positive integer and M is less than or equal to N; Invoking a task model to execute an reasoning task on each training sample to obtain a first prediction label and a first prediction reasoning link of each training sample; Determining the classification loss of the task model according to the difference between the real label and the first prediction label of each training sample in the N training samples; Constructing a comparison loss of the task model according to the difference between the correct reasoning link and the first prediction reasoning link of each comparison sample in the M comparison samples and the difference between the error reasoning link and the first prediction reasoning link of each comparison sample in the M comparison samples, wherein the comparison loss is used for reducing the difference between the correct reasoning link and the first prediction reasoning link of each comparison sample and increasing the difference between the error reasoning link and the first prediction reasoning link of each comparison sample; and training the task model by adopting the classification loss and the comparison loss, wherein the trained task model is used for executing an reasoning task.
2. The method of claim 1, wherein Q training samples of the N training samples other than the M comparison samples are normal samples, Q being a positive integer less than or equal to N; the real label of one common sample is determined after the common sample is audited by adopting a specific auditing rule, and the M comparison samples are obtained based on Q common samples; the process for obtaining the M comparison samples comprises the following steps: Invoking an attribution model to carry out label attribution tasks on each common sample to obtain attribution reasoning links corresponding to each common sample, wherein the attribution reasoning links corresponding to each common sample are used for representing thinking processes of classifying the common sample into corresponding real labels under corresponding specific auditing rules; Invoking an initially trained task model to execute an inference task on each common sample to obtain a model output label and a model output inference link corresponding to each common sample; And screening M comparison samples from the Q common samples according to the attribution reasoning links, the model output labels and the model output reasoning links corresponding to each common sample in the Q common samples.
3. The method according to claim 2, wherein the screening M comparison samples from the Q common samples according to the attribution inference link, the model output tag, and the model output inference link corresponding to each of the Q common samples comprises: respectively carrying out quality check on attributive reasoning links corresponding to each common sample in the Q common samples according to a link check rule, and integrating the common samples with the quality check result being a passing result to obtain a first candidate set; integrating the output labels of the corresponding models with the common samples different from the corresponding real labels to obtain a second candidate set; performing intersection processing on the first candidate set and the second candidate set to obtain candidate intersection sets, wherein the candidate intersection sets comprise M common samples; And determining M common samples included in the candidate intersection as M comparison samples.
4. A method according to claim 3, wherein the attributed inference link corresponding to each normal sample included in the candidate intersection is the correct inference link corresponding to that normal sample; the model output label corresponding to each common sample in the candidate intersection is an error label corresponding to the common sample; And outputting an inference link corresponding to each common sample in the candidate intersection, and determining the inference link based on a model corresponding to the common sample.
5. The method of claim 4, wherein the candidate intersection includes a common sample i; And determining an error reasoning link corresponding to the common sample i based on the model output reasoning link corresponding to the common sample i, wherein the process comprises the following steps: taking the model output reasoning link corresponding to the common sample i as an original error reasoning link corresponding to the common sample i; calling a language understanding model to perform error attribution processing based on the common sample i, an error label corresponding to the common sample i and an original error reasoning link to obtain an error reason; Invoking the language understanding model to perform link expansion processing based on the error reasons to obtain similar error-prone reasoning links of the original error reasoning links corresponding to the common sample i; And determining the original error reasoning link corresponding to the common sample i and the similar error reasoning links as the error reasoning links corresponding to the common sample i.
6. The method of claim 1, wherein the first predictive label for each of the training samples is determined based on a probability distribution output by the task model for that training sample, the probability distribution including predictive probabilities of the task model for a plurality of candidate labels; The determining the classification loss of the task model according to the difference between the real label and the first prediction label of each training sample in the N training samples comprises: Carrying out loss calculation on the probability distribution of each training sample and the real label corresponding to the training sample based on the task model by adopting a classification loss function to obtain sample classification loss corresponding to the training sample; and integrating sample classification losses corresponding to the N training samples to obtain the classification loss of the task model.
7. The method of claim 1, wherein constructing the contrast penalty of the task model based on the differences between the correct and first predicted inference links for each of the M contrast samples and the differences between the incorrect and first predicted inference links for each of the M contrast samples comprises: Determining sample comparison loss corresponding to each comparison sample according to the difference between the correct reasoning link corresponding to the comparison sample and the first prediction reasoning link and the difference between the error reasoning link corresponding to the comparison sample and the first prediction reasoning link; And integrating sample comparison losses corresponding to the M comparison samples respectively to obtain the comparison loss of the task model.
8. The method of claim 7, wherein the M comparison samples comprise comparison sample j; Determining a sample comparison loss corresponding to the comparison sample j according to the difference between the correct reasoning link corresponding to the comparison sample j and the first prediction reasoning link and the difference between the error reasoning link corresponding to the comparison sample j and the first prediction reasoning link, including: Respectively carrying out semantic vector extraction processing on the correct reasoning link, the error reasoning link and the first prediction reasoning link corresponding to the comparison sample j to obtain a first semantic vector of the correct reasoning link, a second semantic vector of the error reasoning link and a third semantic vector of the first prediction reasoning link corresponding to the comparison sample j; determining a first vector difference between the third semantic vector and the first semantic vector, and a second vector difference between the third semantic vector and the second semantic vector; And carrying out loss calculation on the first vector difference value and the second vector difference value by adopting a contrast loss function to obtain a sample contrast loss corresponding to the contrast sample j.
9. The method of claim 1, wherein training the task model using the classification loss and the contrast loss comprises: Acquiring a first weight configured for the classification loss and a second weight configured for the contrast loss; Weighting calculation is carried out on the classification loss and the comparison loss based on the first weight and the second weight, so that total loss is obtained; And optimizing the task model by adopting the total loss.
10. The method of claim 1, wherein after training the task model using the classification loss and the contrast loss, the method further comprises: invoking the trained task model to execute an reasoning task on each comparison sample to obtain a second prediction label and a second prediction reasoning link of each comparison sample; scoring a second prediction label and a second prediction reasoning link corresponding to each comparison sample according to a preset scoring rule to obtain a prediction score of each comparison sample; And performing reinforcement training on the trained task model by adopting the prediction scores of the M comparison samples.
11. The method of claim 10, wherein the M comparison samples comprise comparison sample j; scoring the second prediction label and the second prediction reasoning link corresponding to the comparison sample j according to a preset scoring rule to obtain a prediction score of the comparison sample j, wherein the process comprises the following steps: Performing format detection on the second predictive reasoning link of the comparison sample j according to the standard format in the scoring rule to obtain a link format score of the comparison sample j; According to the link detection mode in the scoring rule, carrying out semantic detection on the second prediction reasoning link of the comparison sample j to obtain the link semantic score of the comparison sample j; Performing label detection on the second predicted label of the comparison sample j according to a label detection mode in the scoring rule to obtain a label score of the comparison sample j; And integrating the link format score, the link semantic score and the label score of the comparison sample j to obtain the prediction score of the comparison sample j.
12. The method of claim 11, wherein the performing semantic detection on the second predicted inferred link of the comparison sample j according to the link detection manner in the scoring rule to obtain a link semantic score of the comparison sample j includes: calculating a first semantic similarity between a correct reasoning link corresponding to the comparison sample j and a second prediction reasoning link, and a second semantic similarity between an error reasoning link corresponding to the comparison sample j and the second prediction reasoning link; and determining the link semantic score of the comparison sample j according to the first semantic similarity and the second semantic similarity.
13. The method of claim 1, wherein each of the training samples is text; after training the task model, the method further comprises: acquiring a target text to be detected; invoking the trained task model to execute an reasoning task on the target text to obtain a target prediction label of the target text; and when the target prediction label is an abnormal label, carrying out shielding processing on the target text.
14. A model-based training processing device, the device comprising: The acquisition module is used for acquiring a training sample set, wherein the training sample set comprises N training samples and real labels of each training sample, the N training samples comprise M comparison samples, the training sample set also comprises error labels of the M comparison samples, correct reasoning links of the M comparison samples and error reasoning links of the M comparison samples, and N, M is a positive integer and M is less than or equal to N; the reasoning module is used for calling a task model to execute a reasoning task on each training sample to obtain a first prediction label and a first prediction reasoning link of each training sample; The loss determination module is used for determining the classification loss of the task model according to the difference between the real label and the first prediction label of each training sample in the N training samples; The loss determination module is further used for constructing a comparison loss of the task model according to the difference between the correct reasoning link and the first prediction reasoning link of each comparison sample in the M comparison samples and the difference between the error reasoning link and the first prediction reasoning link of each comparison sample in the M comparison samples, wherein the comparison loss is used for reducing the difference between the correct reasoning link and the first prediction reasoning link of each comparison sample and increasing the difference between the error reasoning link and the first prediction reasoning link of each comparison sample; And the training module is used for training the task model by adopting the classification loss and the comparison loss, and the trained task model is used for executing an reasoning task.
15. A computer device comprises a processor, a memory, and a network interface; The processor is connected to the memory and the network interface, wherein the network interface is configured to provide a network communication function, the memory is configured to store a computer program, and the processor is configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-13.
16. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-13.
17. A computer program product, characterized in that the computer program product comprises a computer program stored in a computer readable storage medium, the computer program being adapted to be read and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-13.

Description

Model-based training processing method, device, equipment and readable storage medium Technical Field The present application relates to the field of computer technologies, and in particular, to a training processing method, apparatus, device and readable storage medium based on a model. Background At present, the use of intelligent models to perform tasks that require logical reasoning (e.g., text classification tasks, content auditing tasks, etc.) has become a popular way. In order to improve accuracy of the results output by the model, the model is often trained, and training of the model is usually implemented by relying on a large scale of samples with real labels (which can be understood as labeling labels), and the model training is specifically performed by taking the difference between the labels predicted by the model and the real labels as a training target. However, this model training method using labeling labels as single supervisory signals has the major disadvantage that for difficult samples with fuzzy boundaries or multiple steps of reasoning, the model may be able to output the correct final answer, but the internal reasoning process may be erroneous, unreasonable or unreliable. For example, in a content audit scenario, a model may "mask" answers due to misinterpretation of the anti-mock context or confusion rule priorities, but its inference logic has a fundamental shortcoming. This "answer correct, reasoning wrong" question makes the decisions of the model lacking in interpretability, limited generalization ability, and easy repeated mistakes in the face of new, similar difficult samples. Therefore, a model training scheme is needed currently, the reasoning performance of the model can be effectively improved on the premise of not remarkably increasing the labeling cost, and the effect of the model on the reasoning task is optimized. Disclosure of Invention The embodiment of the application provides a training processing method, a device, equipment and a readable storage medium based on a model, which can improve the training effect of the model in model training service so as to optimize the reasoning performance of the model. In one aspect, the embodiment of the application provides a training processing method based on a model, which comprises the following steps: Acquiring a training sample set, wherein the training sample set comprises N training samples and real labels of each training sample, the N training samples comprise M comparison samples, the training sample set also comprises error labels of the M comparison samples, correct reasoning links of the M comparison samples and error reasoning links of the M comparison samples, and N, M is a positive integer and M is less than or equal to N; invoking a task model to execute an reasoning task on each training sample to obtain a first prediction label and a first prediction reasoning link of each training sample; determining classification loss of the task model according to the difference between the real label and the first prediction label of each training sample in the N training samples; Constructing a contrast loss of the task model according to the difference between the correct reasoning link and the first prediction reasoning link of each contrast sample in the M contrast samples and the difference between the error reasoning link and the first prediction reasoning link of each contrast sample in the M contrast samples; And training the task model by adopting the classification loss and the comparison loss, wherein the trained task model is used for executing the reasoning task. In one aspect, an embodiment of the present application provides a training processing device based on a model, including: The acquisition module is used for acquiring a training sample set, wherein the training sample set comprises N training samples and real labels of each training sample, the N training samples comprise M comparison samples, the training sample set also comprises error labels of the M comparison samples, correct reasoning links of the M comparison samples and error reasoning links of the M comparison samples, and N, M is a positive integer and M is less than or equal to N; the reasoning module is used for calling the task model to execute the reasoning task on each training sample to obtain a first prediction label and a first prediction reasoning link of each training sample; the loss determination module is used for determining the classification loss of the task model according to the difference between the real label and the first prediction label of each training sample in the N training samples; The loss determination module is further used for constructing a contrast loss of the task model according to the difference between the correct reasoning link and the first prediction reasoning link of each contrast sample in the M contrast samples and the difference between the error reasoning link and the first prediction reas