CN-122019753-A - Method for evaluating output data of a target model and method for training a target model

CN122019753ACN 122019753 ACN122019753 ACN 122019753ACN-122019753-A

Abstract

The present disclosure relates to a method of evaluating output data of a target model, a method for training a target model, an evaluation device, a training device, an electronic apparatus, a computer-readable storage medium, and a computer program product. Embodiments of the present disclosure quantify uncertainty in the results of the generation of a target model based on its structure, enabling the target model to evaluate the inherent reliability of its generated content, providing an interpretable confidence indicator for downstream applications (e.g., multi-round session rewrite applications) to aid in identifying potential illusions and improving the robustness of the target model.

Inventors

GUO HUI

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (19)

1. A method of evaluating output data of a target model, comprising: generating a word element sequence and determining a prediction probability corresponding to each of a plurality of word element indexes in the word element sequence by utilizing the target model aiming at input data; calculating an uncertainty metric for each of a plurality of lemma indexes in the lemma sequence based at least in part on at least a portion of the model parameters of the target model and a prediction probability for each of the plurality of lemma indexes in the lemma sequence; Determining a confidence level for the input data based at least in part on an uncertainty measure for each of a plurality of lemma indexes in the lemma sequence, and An evaluation result of the output data of the target model is determined based at least in part on the confidence level corresponding to the input data.
2. The method of claim 1, wherein the computing an uncertainty metric for each of a plurality of lemma indexes in the lemma sequence comprises: Determining a loss function for the input data using a prediction probability corresponding to each of a plurality of lemma indexes in the lemma sequence as an argument, and And performing gradient back propagation operation on at least one part of model parameters of the target model, acquiring the gradient of the loss function relative to the prediction probability, and taking the gradient as an uncertainty measure of a corresponding word index.
3. The method of claim 2, wherein at least a portion of the model parameters of the target model are parameters of a last layer of the target model, and the gradient back-propagation operation is performed in an inferred state of the target model, the obtained gradient not being used to update model parameters of the target model.
4. The method of claim 1, wherein the determining the confidence level for the input data comprises: For each of the plurality of lemma indexes, calculating a variance of an uncertainty metric corresponding to the lemma index in a gradient space, and determining a corresponding standard deviation based on the variance; For each of the plurality of lemma indexes, determining a confidence interval of the prediction probability based on the prediction probability corresponding to the lemma index and the standard deviation and mapping the confidence interval to confidence data of the lemma index, and And determining the confidence corresponding to the input data based on the confidence data of each of the plurality of word element indexes.
5. The method of claim 1, wherein the uncertainty metric is based on an approximation of a posterior distribution of at least a portion of model parameters of the target model, the posterior distribution determined by laplace approximation bayesian-based on a negative log-likelihood function.
6. The method of claim 1, wherein the method further comprises: And outputting the word element sequence as output data of the target model, and simultaneously outputting the confidence corresponding to the output data.
7. The method of claim 1, wherein the method further comprises: And taking the uncertainty measurement or the confidence as a supervision signal, and training an auxiliary model for predicting the output confidence information of the target model.
8. The method of claim 1, wherein the method further comprises: determining whether the confidence level is lower than a preset quality threshold; And in response to the confidence level being below the preset quality threshold, marking the corresponding output data as low quality output data or as fact inconsistent output data.
9. The method of claim 1, wherein the input data includes a current question and its corresponding multi-turn historical dialog information, and the output data is a result of a rewrite after the current question is semantically rewritten.
10. The method of claim 9, wherein the method further comprises: Determining the confidence corresponding to the rewritten result; and triggering an overwriting prompt, a result labeling or a regenerating operation for the overwriting result when the confidence level is lower than a preset acceptable threshold value.
11. A method for training a target model, comprising: aiming at an input sample set, processing each input sample in the input sample set by utilizing a target model, generating a corresponding word element sequence, and determining a prediction probability corresponding to each word element index in a plurality of word element indexes in the word element sequence; Calculating an uncertainty metric corresponding to each of a plurality of lemma indexes in the lemma sequence based at least in part on at least a portion of the model parameters of the target model and the predictive probability; determining a confidence level for each input sample in the set of input samples based at least in part on an uncertainty measure for each of a plurality of lemma indexes in the lemma sequence, and The training process of the target model is controlled based at least in part on the confidence level corresponding to each input sample in the set of input samples.
12. The method of claim 11, wherein the controlling the training process of the target model comprises: in the training process of the target model, obtaining confidence degrees corresponding to a plurality of input samples in the input sample set; Based on the training step number, carrying out statistical analysis on the confidence coefficient corresponding to the plurality of input samples to obtain confidence coefficient distribution parameters, and And when the variation amplitude of the confidence coefficient distribution parameters corresponding to the adjacent training steps is smaller than a preset fluctuation threshold value, determining that the target model reaches a training convergence state.
13. The method of claim 11, wherein the controlling the training process of the target model further comprises: In the training process of the target model, obtaining the confidence coefficient corresponding to each input sample in the input sample set; screening the input samples with confidence coefficient lower than a preset threshold value in the input sample set as data to be iterated, and And performing iterative training on the target model by utilizing the data to be iterated.
14. The method of claim 13, wherein the iterative training comprises: Aiming at the data to be iterated, aligning the word element sequence generated by the target model with the corresponding reference output word element sequence position by position; calculating cross entropy loss between the prediction probability corresponding to each word index in the word sequence and the corresponding reference output word, and Model parameters of the target model are updated based at least in part on the cross entropy loss.
15. An evaluation device, comprising: The first module is configured to generate a sequence of lemmas using a target model for input data and determine a prediction probability corresponding to each of a plurality of lemmas indexes in the sequence of lemmas; the second module is configured to calculate an uncertainty metric for each of a plurality of lemma indexes in the lemma sequence based at least in part on at least a portion of the model parameters of the target model and a prediction probability for each of the plurality of lemma indexes in the lemma sequence; a third module is configured to determine a confidence level for the input data based at least in part on an uncertainty measure for each of a plurality of lemma indexes in the lemma sequence, and The fourth module is configured to determine an evaluation result of output data of the target model based at least in part on the confidence level corresponding to the input data.
16. A training device, comprising: The first module is configured to process each input sample in an input sample set by utilizing a target model, generate a corresponding word element sequence, and determine a prediction probability corresponding to each of a plurality of word element indexes in the word element sequence; The second module is configured to calculate an uncertainty metric for each of a plurality of lemma indexes in the lemma sequence based at least in part on at least a portion of the model parameters of the target model and the predictive probability; a third module is configured to determine a confidence level for each input sample in the set of input samples based at least in part on an uncertainty measure for each of a plurality of lemma indexes in the lemma sequence The fourth module is configured to control a training process of the target model based at least in part on the confidence level corresponding to each input sample in the set of input samples.
17. An electronic device, comprising: One or more processors, and One or more memories having stored therein computer executable instructions that, when executed by the processor, perform the method of any of claims 1-14.
18. A computer readable storage medium having stored thereon computer executable instructions which when executed by a processor implement the method of any of claims 1-14.
19. A computer program product comprising computer executable instructions for implementing the method of any one of claims 1-14 when executed by a processor.

Description

Method for evaluating output data of a target model and method for training a target model Technical Field The present disclosure relates to the field of computers, and more particularly to a method of evaluating output data of a target model, a method for training a target model, an evaluation device, a training device, an electronic device, a computer-readable storage medium, and a computer program product. Background In the field of natural language processing, artificial intelligence models (e.g., large language models) are increasingly being used, but they generate "illusions" that exist in content, i.e., content that does not conform to real data or deviate from user instructions. This phenomenon reduces the reliability of the model and may have serious consequences in high risk areas such as medical and legal consultation. Therefore, uncertainty estimation needs to be performed on the output of a large language model to measure the degree of confidence that the large language model is correct for its own output, and is considered as an important direction for improving the reliability of the large language model. In conventional machine learning tasks, uncertainty in the output of the model has been qualitatively studied. Uncertainty of model outputs is generally classified into two categories, occasional uncertainty and cognitive uncertainty. Occasional uncertainties result from inherent randomness in the data, while cognitive uncertainties are related to limitations in the knowledge of the model itself. However, there are difficulties in directly applying these concepts to the free text generation task of large language models. This is mainly because the complexity of the natural language output space is too high and both uncertainties are naturally interleaved in the generation process, complicating accurate estimation and differentiation. Accordingly, improvements to existing large language models are needed to increase the reliability and certainty of the large language models. Disclosure of Invention Embodiments of the present disclosure provide a method of evaluating output data of a target model, a method for training a target model, an evaluation device, a training device, an electronic device, a computer-readable storage medium, and a computer program product. The embodiment of the disclosure provides a method for evaluating output data of a target model, which comprises the steps of generating a word sequence by utilizing the target model and determining a prediction probability corresponding to each word index in a plurality of word indexes in the word sequence, calculating uncertainty corresponding to each word index in the plurality of word indexes in the word sequence based at least in part on at least part of model parameters of the target model and the prediction probability corresponding to each word index in the plurality of word indexes in the word sequence, determining confidence corresponding to the input data based at least in part on the uncertainty corresponding to each word index in the plurality of word indexes in the word sequence, and determining an evaluation result of the output data of the target model based at least in part on the confidence corresponding to the input data. The embodiment of the disclosure provides a method for training a target model, which comprises the steps of processing each input sample in an input sample set by using the target model, generating a corresponding word sequence, determining a prediction probability corresponding to each word index in a plurality of word indexes in the word sequence, calculating an uncertainty measure corresponding to each word index in the word sequence at least partially based on at least part of model parameters of the target model and the prediction probability, determining a confidence measure corresponding to each input sample in the input sample set at least partially based on the uncertainty measure corresponding to each word index in the word indexes in the word sequence, and controlling the training process of the target model at least partially based on the confidence measure corresponding to each input sample in the input sample set. An embodiment of the disclosure provides an evaluation apparatus comprising a first module configured to generate a sequence of tokens with a target model for input data and determine a prediction probability for each of a plurality of token indexes in the sequence of tokens, a second module configured to calculate an uncertainty metric for each of the plurality of token indexes in the sequence of tokens based at least in part on at least a portion of model parameters of the target model and the prediction probability for each of the plurality of token indexes in the sequence of tokens, a third module configured to determine a confidence level for each of the input data based at least in part on the uncertainty metric for each of the plurality of token indexes in the sequen