Search

CN-121996517-A - Optimization method and device of large model evaluation system, electronic equipment and storage medium

CN121996517ACN 121996517 ACN121996517 ACN 121996517ACN-121996517-A

Abstract

The invention discloses an optimization method and device of a large model evaluation system, electronic equipment and a storage medium, and relates to the technical fields of artificial intelligence and the like. The method comprises the steps of collecting an error sample, generating evaluation information based on first input information, first output information and feedback information of errors of the first output information of a large model, determining that the large model evaluation system needs to be optimized based on the evaluation information, and optimizing the large model evaluation system based on the first input information, the first output information, the feedback information and the evaluation information.

Inventors

  • LI BOWEN

Assignees

  • 北京百度网讯科技有限公司

Dates

Publication Date
20260508
Application Date
20251219

Claims (16)

  1. 1. An optimization method of a large model evaluation system, comprising: Collecting an error sample, wherein the error sample comprises first input information, first output information of a large model and feedback information for identifying errors of the first output information; Generating evaluation information based on the first output information and the feedback information of the error sample; Determining that a large model evaluation system needs to be optimized based on the evaluation information; optimizing the large model evaluation system based on the first input information, the first output information, the feedback information, and the evaluation information.
  2. 2. The method of claim 1, wherein the generating evaluation information based on the first output information and the feedback information of the error sample comprises: And carrying out statistical analysis on the first output information and the feedback information of the error sample to obtain error rates of various types of errors.
  3. 3. The method of claim 2, wherein the determining that a large model evaluation system needs to be optimized based on the evaluation information comprises: detecting whether the error rate of each type of error is larger than or equal to a preset trigger threshold; and if the error rate is larger than or equal to the preset trigger threshold, determining that the large model evaluation system needs to be optimized.
  4. 4. The method of claim 1, wherein the generating evaluation information based on the first output information and the feedback information of the error sample comprises: and analyzing data characteristics causing errors based on the first output information and the feedback information of the error sample by adopting a data analysis agent.
  5. 5. The method of claim 4, wherein the determining that a large model evaluation system needs to be optimized based on the evaluation information comprises: Detecting whether the large model evaluation system includes an analysis of the data features; Responsive to the large model evaluation system not including analysis of the data features, it is determined that the large model evaluation system needs to be optimized.
  6. 6. The method of claim 4, wherein the optimizing the large model evaluation system based on the first input information, the first output information, the feedback information, and the evaluation information comprises: generating modification requirement specification information based on the first output information and the feedback information; optimizing the large model evaluation system based on the first input information, the first output information, and the modification-requirement specification information.
  7. 7. The method of claim 6, wherein the optimizing the large model evaluation system based on the first input information, the first output information, and the modification-requirement specification information comprises: Adopting a problem positioning intelligent agent to position a problem node in the large model evaluation system based on the first input information, the first output information and the modification requirement specification information; and optimizing the problem node based on the first input information, the first output information and the modification requirement specification information by adopting an encoding intelligent agent.
  8. 8. The method of claim 7, wherein the optimizing the problem node with the encoding agent based on the first input information, the first output information, and the modification need specification information comprises: And adopting the coding intelligent agent, and based on the first input information, the first output information and the modification requirement specification information, rewriting node codes of the problem nodes, or adjusting processing logic of the problem nodes, adjusting configuration of the problem nodes or adding new nodes based on the problem nodes.
  9. 9. The method of any of claims 1-8, wherein after determining that a large model evaluation system needs to be optimized based on the evaluation information, prior to optimizing the large model evaluation system based on the first input information, the first output information, the feedback information, and the evaluation information, the method further comprises: Operating the large model evaluation system based on the first input information and the first output information to obtain an operation result; based on the operation result, the feedback information is detected and determined to be authentic.
  10. 10. The method of any of claims 1-8, wherein after optimizing the large model evaluation system based on the first input information, the first output information, the feedback information, and the evaluation information, the method further comprises: testing the optimized large model evaluation system to obtain a test result; if the test result identifies that the large model evaluation system is successfully optimized, determining that the optimized large model evaluation system is effective; and if the test result identifies that the large model evaluation system fails to optimize, the large model evaluation system is optimized again.
  11. 11. The method of claim 10, wherein the testing the optimized large model evaluation system to obtain test results comprises: testing whether the optimized large model evaluation system is executable or not; Testing whether the optimized large model evaluation system can repair errors in the error samples; Testing whether the optimized large model introduces new errors; and if the large model evaluation system after optimization is confirmed to be executable and capable of repairing errors in the error samples and new errors are not introduced, confirming that the large model evaluation system is successful in optimization, otherwise, confirming that the large model evaluation system is failed in optimization.
  12. 12. The method of any one of claims 1-8, wherein the method further comprises: obtaining an example sample of the large model, the example sample including second input information and second output information; And constructing an intelligent body by adopting a system, and constructing the large model evaluation system based on the second input information and the second output information, wherein the large model evaluation system comprises analysis nodes with arrangement sequence relations, at least one check node and a summary node.
  13. 13. An optimization apparatus of a large model evaluation system, comprising: The system comprises an acquisition module, a sampling module and a sampling module, wherein the acquisition module is used for acquiring an error sample, and the error sample comprises first input information, first output information and feedback information for identifying errors of the first output information of a large model; A generation module for generating evaluation information based on the first output information and the feedback information of the error sample; The determining module is used for determining that the large model evaluation system needs to be optimized based on the evaluation information; and the optimization module is used for optimizing the large model evaluation system based on the first input information, the first output information, the feedback information and the evaluation information.
  14. 14. An electronic device, comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1-12.
  15. 15. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-12.
  16. 16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-12.

Description

Optimization method and device of large model evaluation system, electronic equipment and storage medium Technical Field The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence and the like, and particularly relates to an optimization method, an optimization device, electronic equipment and a storage medium of a large model evaluation system. Background Under the background of rapid development of large-scale artificial intelligence models, model evaluation becomes an important link for ensuring the quality, reliability and controllability of the models. In the prior art, these evaluation tasks of the model may be supported by manually encoding evaluation logic. However, the manual coding scheme is costly, has poor flexibility, is difficult to maintain, and lacks scalability and automation capabilities in the face of varying business requirements, different model types, and different format specifications. If the current manually-encoded evaluation logic has a loophole, the evaluation logic with more rigorous manual recoding is needed to evaluate the model. Disclosure of Invention The disclosure provides an optimization method and device of a large model evaluation system, electronic equipment and a storage medium. According to an aspect of the present disclosure, there is provided an optimization method of a large model evaluation system, including: Collecting an error sample, wherein the error sample comprises first input information, first output information of a large model and feedback information for identifying errors of the first output information; Generating evaluation information based on the first output information and the feedback information of the error sample; Determining that a large model evaluation system needs to be optimized based on the evaluation information; optimizing the large model evaluation system based on the first input information, the first output information, the feedback information, and the evaluation information. According to another aspect of the present disclosure, there is provided an optimizing apparatus of a large model evaluation system, including: The system comprises an acquisition module, a sampling module and a sampling module, wherein the acquisition module is used for acquiring an error sample, and the error sample comprises first input information, first output information and feedback information for identifying errors of the first output information of a large model; A generation module for generating evaluation information based on the first output information and the feedback information of the error sample; The determining module is used for determining that the large model evaluation system needs to be optimized based on the evaluation information; And the optimization module is used for optimizing the large model evaluation system based on the first input information, the first output information, the feedback information and the evaluation information. According to still another aspect of the present disclosure, there is provided an electronic apparatus including: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aspects and methods of any one of the possible implementations described above. According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of the aspects and any possible implementation described above. According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspects and any one of the possible implementations described above. The technology disclosed by the invention can effectively improve the optimization efficiency of a large model evaluation system. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Detailed Description The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure; FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure; FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure; FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure; Fig. 5 is a block diagram of an electronic dev