CN-121981077-A - Model evaluation method and device, electronic equipment and readable storage medium
Abstract
The application discloses a model evaluation method, a model evaluation device, electronic equipment and a readable storage medium. The method comprises the steps of obtaining a text to be analyzed, carrying out coding processing on the text to be analyzed to obtain a text vector, obtaining a reference vector corresponding to the text vector, generating an interpolation sequence based on the text vector and the reference vector, wherein the interpolation sequence comprises a plurality of interpolation points from the reference vector to the text vector, determining an attribution vector of the text to be analyzed based on gradient vectors corresponding to the interpolation points in the interpolation sequence and difference vectors between the text vector and the reference vector, and carrying out evaluation on a language model to be evaluated based on the attribution vector to obtain an evaluation result, wherein the evaluation result is used for representing the reliability degree and/or the stability degree of the language model to be evaluated. The application solves the technical problem of lower evaluation accuracy of the model in the related technology.
Inventors
- LI ANG
- MEN XIN
- CHEN YUN
- JIANG HANQING
- Wu Peiduo
- WEI XU
- YU ZIHAN
Assignees
- 中国第一汽车股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260203
Claims (14)
- 1. A method for evaluating a model, comprising: obtaining a text to be analyzed, and encoding the text to be analyzed to obtain a text vector, wherein the text vector comprises semantic information of a plurality of phrases in the text to be analyzed; Obtaining a reference vector corresponding to the text vector, wherein the semantic distance between the reference vector and the text vector is larger than a preset distance threshold; Generating an interpolation sequence based on the text vector and the reference vector, wherein the interpolation sequence comprises a plurality of interpolation points from the reference vector to the text vector; Determining an attribution vector of the text to be analyzed based on gradient vectors respectively corresponding to a plurality of interpolation points in the interpolation sequence and difference vectors between the text vectors and the reference vectors, wherein the gradient vectors are used for representing output characteristics of a to-be-analyzed comment model on the plurality of interpolation points, and the attribution vector is used for representing influence degree of the plurality of phrases in the text to be analyzed on the output characteristics of the to-be-analyzed comment model relative to the change degree of the input characteristics of the to-be-analyzed comment model; And evaluating the to-be-evaluated comment model based on the attribution vector to obtain an evaluation result, wherein the evaluation result is used for representing the reliability degree and/or the stability degree of the to-be-evaluated comment model.
- 2. The method of claim 1, wherein obtaining the reference vector corresponding to the text vector comprises: Retrieving a plurality of candidate text vectors that are similar to the text vector in semantic space, wherein the candidate text vectors satisfy natural language logic rules; and determining the reference vector corresponding to the text vector from the plurality of candidate text vectors based on semantic distances between the plurality of candidate text vectors and the text vector respectively.
- 3. The method of claim 2, wherein determining the reference vector corresponding to the text vector from among the plurality of candidate text vectors based on semantic distances between the plurality of candidate text vectors and the text vector, respectively, comprises: Extracting candidate text vectors with the semantic distance larger than the preset distance threshold from the candidate text vectors based on the semantic distances between the candidate text vectors, wherein the preset distance threshold is determined according to the largest semantic distance in the semantic distances corresponding to the candidate text vectors; And determining the candidate text vector with the semantic distance larger than the preset distance threshold as the reference vector.
- 4. The method of claim 1, wherein generating an interpolation sequence based on the text vector and the reference vector comprises: Generating a plurality of initial interpolation points between the reference vector and the text vector based on a preset step size; Determining gradient change rates between two adjacent initial interpolation points in a plurality of initial interpolation points, wherein the gradient change rates are used for representing gradient change degrees between the two adjacent initial interpolation points; Determining sampling density between two adjacent initial interpolation points based on gradient change rate between the two adjacent initial interpolation points; And carrying out interpolation sampling among a plurality of initial interpolation points according to the sampling density to obtain the interpolation sequence.
- 5. The method of claim 4, wherein determining a gradient rate of change between adjacent two of the plurality of initial interpolation points comprises: Taking interpolation vectors corresponding to the initial interpolation points as input vectors, and inputting the input vectors into the comment model to be tested for processing to obtain output vectors; determining gradient vectors corresponding to a plurality of initial interpolation points respectively based on the output vector and the input vector, wherein the gradient vectors are used for representing the change degree between the output vector and the input vector; and determining the gradient change rate between two adjacent initial interpolation points based on the difference vector of the gradient vectors corresponding to the two adjacent initial interpolation points in the plurality of initial interpolation points.
- 6. The method of claim 4, wherein determining the sampling density between two adjacent initial interpolation points based on the gradient rate of change between the two adjacent initial interpolation points comprises: Determining the sampling density between two adjacent initial interpolation points as a first sampling density in response to the gradient change rate between the two adjacent initial interpolation points being greater than a first preset gradient change rate threshold; And determining the sampling density between the two adjacent initial interpolation points to be a second sampling density in response to the gradient change rate between the two adjacent initial interpolation points being smaller than a second preset gradient change rate threshold, wherein the second preset gradient change rate threshold is smaller than the first preset gradient change rate threshold, and the second sampling density is smaller than the first sampling density.
- 7. The method of claim 1, wherein determining an attribution vector for the text to be analyzed based on a gradient vector corresponding to each of the interpolation points in the interpolation sequence and a difference vector between the text vector and the reference vector, comprises: Carrying out average calculation on the gradient vectors corresponding to the interpolation points in the interpolation sequence to obtain average vectors; And multiplying the average vector by the difference vector between the text vector and the reference vector to obtain the attribution vector of the text to be analyzed.
- 8. The method of claim 1, wherein encoding the text to be analyzed to obtain a text vector comprises: word segmentation processing is carried out on the text to be analyzed, so that a plurality of phrases corresponding to the text to be analyzed are obtained; Coding the plurality of phrases to obtain phrase vectors corresponding to the plurality of phrases respectively; and splicing the phrase vectors to obtain the text vector.
- 9. The method according to any one of claims 1 to 8, further comprising: and visually displaying the evaluation result.
- 10. An evaluation device for a model, comprising: The coding unit is used for obtaining a text to be analyzed and coding the text to be analyzed to obtain a text vector, wherein the text vector comprises semantic information of a plurality of phrases in the text to be analyzed; the obtaining unit is used for obtaining a reference vector corresponding to the text vector, wherein the reference vector is that the semantic distance between the reference vector and the text vector is larger than a preset distance threshold; A generation unit configured to generate an interpolation sequence based on the text vector and the reference vector, wherein the interpolation sequence includes a plurality of interpolation points from the reference vector to the text vector; The determining unit is used for determining an attribution vector of the text to be analyzed based on gradient vectors respectively corresponding to a plurality of interpolation points in the interpolation sequence and difference vectors between the text vector and the reference vector, wherein the gradient vectors are used for representing output characteristics of a to-be-analyzed comment model on the plurality of interpolation points, and the attribution vector is used for representing the influence degree of the plurality of phrases in the text to be analyzed on the output characteristics of the to-be-analyzed comment model relative to the change degree of the input characteristics of the to-be-analyzed comment model; And the evaluation unit is used for evaluating the to-be-tested comment model based on the attribution vector to obtain an evaluation result, wherein the evaluation result is used for representing the reliability degree and/or the stability degree of the to-be-tested comment model.
- 11. An electronic device, comprising: a memory storing an executable program; A processor for executing the program, wherein the program when run performs the method of any of claims 1 to 9.
- 12. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored executable program, wherein the executable program when run controls a device in which the storage medium is located to perform the method of any one of claims 1 to 9.
- 13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.
- 14. A vehicle, characterized by comprising: a memory storing an executable program; A processor for executing the program, wherein the program when run performs the method of any of claims 1 to 9.
Description
Model evaluation method and device, electronic equipment and readable storage medium Technical Field The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for evaluating a model, an electronic device, and a readable storage medium. Background With the deep application of artificial intelligence in the vehicle industry, a large-scale language model (Large Language Models, abbreviated as LLMs) is becoming a core technology in the scenes of intelligent interaction, fault diagnosis, technical document analysis and the like. However, the complexity and "black box" nature of the models present serious challenges to the evaluation of the models, and particularly when the models exhibit beyond conventional capabilities in the professional arts, the decision logic of how to parse the models becomes critical. In the related art, the evaluation of the model mainly depends on a post-interpretation method, which relies on the output result of the model, for example, the decision logic of the model is interpreted through the output result of the model to determine the input characteristics influencing the output of the model, but the evaluation mode is excessively dependent on the output result of the model, and the dynamic change of the model in the decision process is ignored. This means that the interpretation result is easily affected by accidental or abnormal factors in the model output, and these factors may not represent the actual decision logic of the model, and there is a technical problem that the evaluation accuracy of the model is low. Aiming at the technical problem of low evaluation accuracy of the model, no effective solution is proposed at present. Disclosure of Invention The embodiment of the application provides a method, a device, electronic equipment and a readable storage medium for evaluating a model, which are used for at least solving the technical problem of low evaluation accuracy of the model. According to one aspect of the embodiment of the application, an evaluation method of a model is provided. The method comprises the steps of obtaining a text to be analyzed, carrying out coding processing on the text to be analyzed to obtain a text vector, wherein the text vector comprises semantic information of a plurality of phrases in the text to be analyzed, obtaining a reference vector corresponding to the text vector, wherein the semantic distance between the reference vector and the text vector is larger than a preset distance threshold value, generating an interpolation sequence based on the text vector and the reference vector, wherein the interpolation sequence comprises a plurality of interpolation points from the reference vector to the text vector, determining an attribution vector of the text to be analyzed based on gradient vectors corresponding to the interpolation points in the interpolation sequence and difference vectors between the text vector and the reference vector, wherein the gradient vectors are used for representing output characteristics of a to-be-analyzed comment language model on the interpolation points, and the attribution vector is used for representing the influence degree of the plurality of phrases in the text to be analyzed on the output characteristics of the to be-evaluated language model relative to the change degree of the to the input characteristics of the to be-tested comment language model, and carrying out evaluation on the basis of the attribution vector, and obtaining an evaluation result, wherein the evaluation result is used for representing the reliability degree and/or stability degree of the to be-tested comment language model. Optionally, obtaining the reference vector corresponding to the text vector comprises retrieving a plurality of candidate text vectors similar to the text vector in a semantic space, wherein the candidate text vectors meet natural language logic rules, and determining the reference vector corresponding to the text vector from the plurality of candidate text vectors based on semantic distances between the plurality of candidate text vectors and the text vector respectively. Optionally, determining the reference vector corresponding to the text vector from the plurality of candidate text vectors based on semantic distances between the plurality of candidate text vectors and the text vector respectively comprises extracting candidate text vectors with semantic distances greater than a preset distance threshold from the plurality of candidate text vectors based on semantic distances between the plurality of candidate text vectors and the text vector respectively, wherein the preset distance threshold is determined according to the largest semantic distance in the semantic distances corresponding to the plurality of candidate text vectors respectively, and determining the candidate text vectors with semantic distances greater than the preset distance threshold as