CN-122020091-A - Double-track evaluation drive optimization processing method, device, equipment and medium

CN122020091ACN 122020091 ACN122020091 ACN 122020091ACN-122020091-A

Abstract

The invention relates to the technical field of model construction, which can be applied to business scenes such as financial science and technology, and discloses a double-track evaluation drive optimization processing method, device, equipment and medium, comprising the steps of constructing a double-track evaluation system, analyzing training data to generate optimized training data, and establishing a target model; the method comprises the steps of generating capability indexes by utilizing a double-track evaluation system and multi-role intelligent agent simulation, generating alignment analysis results by combining reinforcement learning performance, obtaining real service feedback indexes to form a training evaluation index set and determine attribution links, generating training adjustment information based on the attribution links and driving a target model to retrain to obtain an optimized target model, and processing service input data and outputting service processing results. According to the invention, the training closed loop is realized by linking the evaluation result and the service feedback, so that the model capacity gap can be identified and corrected, and the compliance, reliability and application effect of the model in actual service are improved.

Inventors

WANG JIANZONG
ZHANG NAN
QU XIAOYANG

Assignees

平安科技（深圳）有限公司

Dates

Publication Date: 20260512
Application Date: 20260206

Claims (10)

1. The double-track evaluation driving optimization processing method is characterized by comprising the following steps of: constructing a double-track evaluation system comprising a generated evaluation set and a service scene evaluation set; Analyzing training data by using the double-track evaluation system to form a data health analysis result, executing a data optimization strategy according to the data health analysis result to generate optimized training data, and building a target model by using the optimized training data; analyzing the target model by using the double-track evaluation system and the multi-role intelligent agent simulation environment to generate a base capacity index and an interactive task execution index; Analyzing the balance relation between the compliance safety performance and the service availability performance of the target model in the reinforcement learning stage to generate an alignment analysis result; Acquiring a real business feedback index of backflow, and determining an attribution link between a training evaluation index set and the real business feedback index, wherein the training evaluation index set consists of the base capacity index, the interactive task execution index and the alignment analysis result; Determining a capacity gap of the target model based on the attribution link, generating training adjustment information comprising parameter configuration, data proportion and rewarding weight by using the capacity gap, and driving the target model to retrain by using the training adjustment information and the optimized training data to obtain an optimized target model; And processing the business input data by using the optimized target model to generate a business processing result.
2. The method for optimizing a dual track evaluation drive of claim 1, wherein constructing a dual track evaluation system comprising generating an evaluation set and a traffic scenario evaluation set comprises: analyzing the knowledge graph, the supervision text library and the product description document, and extracting a core entity and constraint logic to form a basic knowledge domain set; generating evaluation data for stability testing based on the basic knowledge domain set, and compiling the evaluation data to form a generated evaluation set; acquiring historical service interaction data, removing sensitive information in the historical service interaction data by utilizing a privacy processing module, and reserving service context logic to obtain desensitized service interaction data; extracting features of the desensitized service interaction data by using preset service scene screening conditions, identifying records containing high-risk tags and complex logic tags, and reconstructing the records containing the high-risk tags and the complex logic tags to form the service scene evaluation set; And integrating the generated evaluation set with the service scene evaluation set to construct a double-track evaluation system.
3. The dual track evaluation drive optimization processing method of claim 1, wherein analyzing training data with the dual track evaluation system to form a data health analysis result, executing a data optimization strategy according to the data health analysis result to generate optimized training data, and building a target model with the optimized training data, comprises: acquiring training data, and carrying out entity linking and rule scanning on the training data by utilizing a knowledge graph and a compliance detection model in the double-track evaluation system to determine the knowledge coverage rate and the compliance sample proportion; Carrying out fact consistency check on the training data to determine a noise sample proportion, and forming a data health analysis result based on the knowledge coverage rate, the compliance sample proportion and the noise sample proportion; comparing the data health analysis result with a preset threshold value, and determining a data optimization strategy containing data supplementation, noise rejection and compliance enhancement content according to the comparison result; executing the data optimization strategy to clean and enhance the training data to generate optimized training data; Inputting the optimized training data into a basic network architecture to be trained for parameter initialization and pre-training, and establishing a target model.
4. The dual-track evaluation drive optimization processing method of claim 1, wherein analyzing the target model by using the dual-track evaluation system and multi-role agent simulation environment to generate a base capacity index and an interactive task execution index comprises: Extracting definition type test data, comparison type test data and inverse fact type test data from the generation evaluation set of the double-track evaluation system; inputting the definition type test data, the contrast type test data and the inverse fact type test data into the target model; analyzing the output response of the target model to the definition type test data, the comparison type test data and the inverse fact type test data, and determining the accuracy of concept understanding and consistency of logic reasoning to obtain a base capacity index; in a multi-role intelligent agent simulation environment, configuring a simulation evaluation group comprising a simulation user intelligent agent, an attack resisting intelligent agent, a compliance examining intelligent agent and a judge intelligent agent; Establishing a multi-round dialogue channel between the simulation evaluation group and the target model, and recording interaction track data; And analyzing the interaction track data by using the compliance examining agent and the referee agent, and determining the task execution success rate, the multi-round dialogue consistency, the logic integrity and the risk exposure degree to obtain the interaction task execution index.
5. The dual track evaluation drive optimization processing method of claim 1, wherein analyzing the balance relationship between the compliance safety performance and the business availability performance of the target model in the reinforcement learning stage to generate an alignment analysis result comprises: monitoring real-time response data of the target model in a reinforcement learning stage, and counting a compliance safety hit rate, a business completion rate, a refusal rate and a illusion occurrence rate; Quantifying a compliance security performance by using the compliance security hit rate, and quantifying a service availability performance by using the service completion rate, the response rejection rate and the illusion occurrence rate; establishing a balance analysis relation comprising a positive gain factor and a negative penalty factor to define a balance relation between the compliance security performance and the service availability performance; Analyzing the compliance safety hit rate, the service completion rate, the refusal rate and the illusion occurrence rate by utilizing the balance analysis relation to analyze a balance state and tracking the change trend of the balance state; judging whether the current training strategy has excessive defense or rewarding collapse risk according to the change trend, and generating an alignment analysis result.
6. The dual-track evaluation drive optimization processing method of claim 1, wherein obtaining a real business feedback indicator of a reflow, determining an attribution link between a training evaluation indicator set and the real business feedback indicator, wherein the training evaluation indicator set is composed of the base capability indicator, the interactive task execution indicator and the alignment analysis result, comprises: The satisfaction index, the conversion rate index and the manual takeover rate index of the target model after being on line are acquired in real time through a service monitoring interface, so that a real service feedback index is formed; extracting a base capacity index, an interactive task execution index and an alignment analysis result corresponding to the target model, and performing time sequence alignment processing on the extracted data to construct a training evaluation index set; Analyzing the association mapping relation between the training evaluation index set and the real business feedback index by using a causal association analysis method, and identifying key evaluation indexes which have dominant influence on the real business feedback index; And positioning a specific training capacity dimension causing service performance fluctuation based on the association mapping relation between the key evaluation index and the real service feedback index, and determining an attribution link.
7. The method of claim 1, wherein determining a capability gap of the target model based on the attribution link, generating training adjustment information including parameter configuration, data proportioning and rewarding weight by using the capability gap, and driving the target model to retrain by using the training adjustment information and the optimized training data to obtain an optimized target model, and comprising: Analyzing the attribution link to locate a training capacity dimension causing business index deviation, quantifying the deficiency of the target model in the training capacity dimension, and determining a capacity gap; determining parameter configuration for adjusting model super parameters, data proportion for sample sampling and rewarding weight for adjusting alignment strength according to the capability gap; Integrating the parameter configuration, the data proportion and the rewarding weight to generate executable training adjustment information; loading the training adjustment information to a training controller, and screening retraining samples from the optimized training data according to the data proportion; And driving the target model to execute parameter updating and retraining operation by using the screened retraining sample to obtain an optimized target model.
8. The double-track evaluation drive optimization processing device is characterized by comprising: the double-track evaluation construction module is used for constructing a double-track evaluation system comprising a generated evaluation set and a service scene evaluation set; The training data optimization module is used for analyzing training data by utilizing the double-track evaluation system to form a data health analysis result, executing a data optimization strategy according to the data health analysis result to generate optimized training data, and establishing a target model by utilizing the optimized training data; The model capability evaluation module is used for analyzing the target model by utilizing the double-track evaluation system and the multi-role intelligent agent simulation environment to generate a base capability index and an interactive task execution index; the alignment relation analysis module is used for analyzing the balance relation between the compliance safety performance and the service availability performance of the target model in the reinforcement learning stage and generating an alignment analysis result; The service feedback attribution module is used for acquiring real service feedback indexes of the backflow and determining attribution links between a training evaluation index set and the real service feedback indexes, wherein the training evaluation index set consists of the base capacity indexes, the interactive task execution indexes and the alignment analysis results; The model retraining module is used for determining a capacity gap of the target model based on the attribution link, generating training adjustment information comprising parameter configuration, data proportion and rewarding weight by utilizing the capacity gap, and driving the target model to retrain by utilizing the training adjustment information and the optimized training data to obtain an optimized target model; and the business processing execution module is used for processing business input data by utilizing the optimized target model and generating a business processing result.
9. A computer device comprising a memory, a processor and a dual track evaluation drive optimization handler stored on the memory and executable on the processor, which dual track evaluation drive optimization handler when executed by the processor implements the steps of the dual track evaluation drive optimization method of any one of claims 1-7.
10. A computer readable storage medium, wherein a dual track evaluation drive optimization procedure is stored on the storage medium, which dual track evaluation drive optimization procedure, when executed by a processor, implements the steps of the dual track evaluation drive optimization method according to any one of claims 1-7.

Description

Double-track evaluation drive optimization processing method, device, equipment and medium Technical Field The invention relates to the technical field of model construction, in particular to a double-track evaluation driving optimization processing method, device, equipment and medium. Background Along with the gradual introduction of large language models into financial business scenes such as banks, securities, insurance and consultation, the industry begins to explore the use of the models for tasks such as customer service questioning and answering, business guidance, marketing contact, risk analysis, compliance assistance and the like. To ensure reliable operation of such models in high regulatory requirement environments, the industry generally takes some form of model evaluation to verify knowledge coverage and task response capabilities. However, most of these evaluations follow the testing method of the general model, and do not fully meet the system requirements of the financial scene. The existing evaluation means generally have the problems of training and evaluating disjoincy, and the evaluation usually occurs after model training and is only used as an effect test standard, so that the auxiliary optimization of training data or training strategies is difficult. Meanwhile, the existing evaluation dimensions of the industry are relatively narrow, the knowledge question-answering accuracy rate or single service index is concentrated, the model is lacked to be systematically evaluated from multiple capability dimensions such as risk control, compliance constraint, service reasoning and multiple rounds of interaction, and the evaluation result is difficult to reflect the real service capability of the model. In addition, the existing evaluation mode generally adopts a fixed question set and a static scoring mechanism, and has a significant difference with real business interaction. For example, model performance and evaluation performance often deviate in the face of real user input, complex decision links, and regulatory constraints. Meanwhile, the evaluating result and the online operation index lack clear association, so that the relation between the model capacity and the conversion rate, the manual intervention rate or the compliance hit performance is difficult to interpret, and an effective basis cannot be provided for training adjustment. Together, these problems result in an inability to evaluate a full life cycle of a coverage model, and difficulty in supporting the continuous optimization requirements for real services. Disclosure of Invention The invention mainly aims to provide a double-track evaluation drive optimization processing method, device, equipment and storage medium, and aims to solve the technical problem that the evaluation result cannot be continuously and reversely used for training data optimization, model capacity diagnosis and retraining adjustment in a closed loop manner in the prior art, so that the evaluation and model performance improvement is disjointed. In order to achieve the above object, the present invention provides a double-track evaluation driving optimization processing method, including: constructing a double-track evaluation system comprising a generated evaluation set and a service scene evaluation set; Analyzing training data by using the double-track evaluation system to form a data health analysis result, executing a data optimization strategy according to the data health analysis result to generate optimized training data, and building a target model by using the optimized training data; analyzing the target model by using the double-track evaluation system and the multi-role intelligent agent simulation environment to generate a base capacity index and an interactive task execution index; Analyzing the balance relation between the compliance safety performance and the service availability performance of the target model in the reinforcement learning stage to generate an alignment analysis result; Acquiring a real business feedback index of backflow, and determining an attribution link between a training evaluation index set and the real business feedback index, wherein the training evaluation index set consists of the base capacity index, the interactive task execution index and the alignment analysis result; Determining a capacity gap of the target model based on the attribution link, generating training adjustment information comprising parameter configuration, data proportion and rewarding weight by using the capacity gap, and driving the target model to retrain by using the training adjustment information and the optimized training data to obtain an optimized target model; And processing the business input data by using the optimized target model to generate a business processing result. Further, in order to achieve the above object, the present invention provides a dual-track evaluation drive optimization processing device, including: the