CN-122021728-A - Automatic end-to-end data formula generation method and system for large language model adaptation

CN122021728ACN 122021728 ACN122021728 ACN 122021728ACN-122021728-A

Abstract

The invention provides an automatic end-to-end data formula generation method and system for large language model adaptation. The method comprises the steps of providing a training task set, modeling a target adaptation task into a triplet, providing a cold start data set and initializing a strategy model, generating a data formula containing a data processing pipeline and training data by the strategy model according to the target adaptation task and the training task set, providing a data verifier to evaluate the quality of the training data to obtain instant agent rewards, and updating strategy model parameters through a reinforcement learning algorithm. The system comprises a task modeling module, a strategy model module, a quality evaluation module, a supervision fine adjustment module and an online optimization module. The invention constructs a multi-domain comprehensive data set for realizing end-to-end automatic generation of the data formula, and generates the end-to-end high-quality data processing pipeline and training data in a low-cost and automatic mode, thereby improving the performance of the large language model on specific target tasks.

Inventors

CHEN YICHENG
MA ZERUN
XIE XINCHEN
LI YINING
CHEN KAI

Assignees

上海人工智能创新中心

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (12)

1. An automatic end-to-end data recipe generation method for large language model adaptation, comprising: providing a training task set and a strategy model; providing a cold start data set and initializing a policy model using the cold start data set; Generating a data formula according to the target adaptation task and the training task set by using the initialized strategy model; Evaluating the quality of the training data in the data formulation, and And updating parameters of the strategy model through a reinforcement learning algorithm according to the evaluation result.
2. The method according to claim 1, wherein the method further comprises: modeling a target adaptation task as a triplet t= (I, τ, D), wherein I is task natural language instruction and data source and/or evaluation protocol meta-information, D is an available raw data source set, and τ is a downstream reference task.
3. The method of claim 1, wherein the providing a set of training tasks comprises: Providing a set of raw data sources corresponding to the domain of the target adaptation task, and And constructing a training task set according to the original data source set.
4. The method of claim 3, wherein providing the set of raw data sources corresponding to the domain of the target adaptation task comprises: Dividing the field of the target adaptation task; Selecting representative target adaptation tasks as reference tasks in each of the fields, all of the selected reference tasks constituting a total reference set, and And providing a data set semantically related to the reference task for each reference task, and screening the data set to obtain an original data source set corresponding to the reference task.
5. The method of claim 3, wherein constructing a training task set from the raw data source set comprises: dividing all the reference tasks into training tasks and reserved evaluating tasks; Probability sampling the training task, uniformly sampling a subset from the original data source set corresponding to the training task to form a task instance, and And performing de-duplication processing on the task instance to obtain a training task set.
6. The method of claim 1, wherein providing a cold start data set comprises: providing an example candidate recipe, the example candidate recipe comprising: A data recipe plan generated by an inference model based on a given target adaptation task, the data recipe plan including selected data sources, selected reasons, ways of processing each type of data, order of operation, and/or key parameters, and A data recipe code script, which is formed by converting the data recipe plan by a code generation model; And Screening the example candidate formulas, the screening comprising: Executing a data recipe code script for each of the example candidate recipes, rejecting the example candidate recipe if code execution fails, a null set is output, or a training format is violated, and And evaluating the agent rewards scores of the example candidate formulas by using a data verifier, eliminating the example candidate formulas with the agent rewards scores lower than a preset threshold, and taking the reserved example candidate formulas as a cold start data set.
7. The method of claim 1, wherein the data recipe includes a data processing pipeline and training data.
8. The method of claim 1, wherein evaluating the quality of training data in the data formulation comprises: Providing a data validator; sampling the training data set output by the strategy model to obtain a sampling subset, and The subset of samples is input to the data validator, which determines the class and quality of each sample in the subset of samples and outputs a corresponding instant proxy rewards score.
9. The method of claim 6 or 8, wherein the data validator assigns scores to samples in the example candidate recipe and/or the training subset as instant proxy rewards scores in the following categories using a large language model: invalid samples, assigned a score of 0; A format error sample is assigned a score of 0; Error samples, assigned a score of 0; task mismatch samples, assigned 0.4 points, and A score of 1 was assigned by the sample.
10. An end-to-end data recipe auto-generation system for large language model adaptation, the system comprising: A task modeling module configured to model a target adaptation task; a policy model module configured to output a corresponding data recipe according to the modeled target adaptation task, wherein the data recipe includes a data processing pipeline and training data; the quality evaluation module is configured to judge the quality of the training data set output by the strategy model module and give out instant agency rewards; a supervisory trimming module configured to provide a cold start data set and to initialize the supervisory trimming of the policy model module using the cold start data set, and An online optimization module configured to iteratively optimize the policy model module by a reinforcement learning algorithm according to the instant agent rewards score.
11. A computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-9 or the system of claim 10 when the computer program is executed.
12. A computer readable storage medium having stored thereon a computer program, which, when executed by a processor, is operable to perform the method of any of claims 1-9 or to perform the system of claim 10.

Description

Automatic end-to-end data formula generation method and system for large language model adaptation Technical Field The invention relates to the technical field of large language model adaptation, in particular to an end-to-end data formula automatic generation method and system for large language model adaptation. Background With the development of artificial intelligence technology, the adaptation requirement of a large language model in various fields is increasing. The composition and quality of the training data is a key factor in determining the performance of the downstream model. In practice, building effective training data is not simply a simple data collection, but rather requires designing a complex multi-stage processing pipeline. The pipeline typically involves a variety of operations on heterogeneous raw data including data conversion, filtering, blending, synthesis, and refinement. The combination containing the data processing pipeline specification and the final yield training data is referred to as a "data recipe (DATA RECIPE)". The data formula is used as a core link for connecting the original data with the model training, and the quality of the data formula directly determines the adaptation effect of the large language model. Currently, the design of efficient data formulas is of extremely high practical value, but the prior art still relies primarily on manual design and extensive experience by human experts. Although large language models have been used for a single data processing step in data recipe generation (e.g., data filtering, selection, or synthesis using large language models), these operations still follow the hints or patterns of human design, lacking overall end-to-end generation capabilities. Furthermore, the prior art automated data recipe generation methods rely on expensive model training feedback. The existing automation scheme represented by Data-Juicer Sandbox generally depends on the training performance of a downstream model as a feedback signal, and has the problems of high cost and high feedback delay along with the continuous expansion of the Data scale and model parameters and the increase of the complexity of Data processing operation. Meanwhile, the lack of standardized evaluation benchmarks and data sets generated specifically for end-to-end data formulation in the prior art limits the development of related technologies. Therefore, an automatic end-to-end data recipe generation scheme is needed, the problems that in the prior art, data recipe generation depends on manual design and experience of experts, an automatic scheme lacks end-to-end generation capability, expensive downstream model training feedback is relied on, standardized standard and data support are lacking are solved, full automation, high efficiency and high quality generation of the data recipe are achieved, model automatic recipe generation capability is improved, and a large language model is supported to rapidly adapt to various field tasks. Disclosure of Invention Starting from the prior art, the task of the invention is to provide an automatic end-to-end data formula generation method and system for large language model adaptation, which can define and realize full-automatic data formula generation, and solve the technical problems that in the large language model adaptation process in the prior art, the design of the data formula generation is seriously dependent on manual expert experience and has low automation degree, and the existing automatic search scheme has extremely high calculation cost and low efficiency due to huge search space and dependence on model training feedback. In a first aspect of the present invention, there is provided an end-to-end data recipe automatic generation method for large language model adaptation, comprising: providing a training task set; providing a cold start data set and initializing a policy model using the cold start data set; Generating a data formula according to the target adaptation task and the training task set by using the initialized strategy model; Evaluating the quality of the training data in the data formulation, and And updating parameters of the strategy model through a reinforcement learning algorithm according to the evaluation result. Further, the method further comprises: modeling a target adaptation task as a triplet t= (I, τ, D), wherein I is task natural language instruction and data source and/or evaluation protocol meta-information, D is an available raw data source set, and τ is a downstream reference task. Further, the providing a training task set includes: Providing a set of raw data sources corresponding to the domain of the target adaptation task, and And constructing a training task set according to the original data source set. Further, the providing the original data source set corresponding to the field of the target adaptation task includes: Dividing the field of the target adaptation task; Selecting represe