CN-121998136-A - Sub-model weight optimization method for large language model integrated learning

CN121998136ACN 121998136 ACN121998136 ACN 121998136ACN-121998136-A

Abstract

The invention provides a sub-model weight optimization method for large language model integrated learning, which relates to the technical field of data processing, and comprises the steps of generating an initial sub-model weight set, combining and encoding into binary gene character strings to serve as an initial population for evolutionary computation, constructing a proxy experiment environment, generating a proxy corpus by randomly sampling an original pre-training corpus, selecting at least three representative natural language processing tasks to serve as proxy tasks for evaluating adaptation values of the gene character strings, selecting the gene character strings with high adaptation values by adopting a wheel disc method based on the evaluation results of the proxy tasks, generating next generation gene character strings by crossing and mutation operation, iteratively executing S2 and S3 until convergence to obtain an optimal sub-model weight combination, and performing task-independent meta-learner fine tuning training on the original full corpus to obtain a final integrated model.

Inventors

TAO JIANG

Assignees

人保信息科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260225

Claims (10)

1. A sub-model weight optimization method for large language model integrated learning is characterized by comprising the following steps: s1, generating an initial submodel weight set, combining and encoding the initial submodel weight set into a binary gene character string, and taking the binary gene character string as an initial population for evolutionary computation; S2, constructing a proxy experiment environment, generating a proxy corpus by randomly sampling an original pre-training corpus, and selecting at least three representative natural language processing tasks as proxy tasks for evaluating adaptation values of the gene strings; S3, selecting a high-adaptation value gene character string by adopting a wheel disc method based on an evaluation result of the agent task, and generating a next generation gene character string through crossing and mutation operation; and S4, iteratively executing the steps S2 and S3 until convergence, obtaining an optimal sub-model weight combination, and performing task-independent meta-learner fine tuning training on the original full corpus to obtain a final integrated model.
2. The method of claim 1, wherein the step of generating the initial submodel weight set and encoding into a binary gene string comprises: s11, bernoulli distribution is adopted Independent sampling generates a '0' or a '1' in the binary string; S12, each sub-model weight combination is encoded into a length Wherein As the number of sub-models, The number of steps is discretized for the weight.
3. The method of claim 1, wherein the step of constructing a proxy experiment environment comprises: s21, selecting SST-2, MNLI and SQUAD v 1.1.1 as proxy tasks; s22, adopting a weighted evaluation formula Calculating an adaptation value, wherein And is also provided with And presetting a task weight coefficient.
4. The method of claim 1, wherein the step of evaluating results based on proxy tasks comprises: S31, setting cross probability To interchange the sub-model weight codes of the corresponding positions in the two genome; S32, configuring variation probability Each gene bit in the binary string is randomly flipped with this probability.
5. The method of claim 1, wherein the step of iteratively performing S2 and S3 until convergence comprises: s41, limiting the iteration times to be not more than 50 times; S42, setting the proxy corpus sampling rate to be not lower than 10% and not more than 20%.
6. The method as recited in claim 1, further comprising: S5, adopting GENFUSER as a meta learner, inserting a < extra_id_i > separator and adding a weight mark in the sequence to input Encoding as a fused input, wherein Represent the first Weight parameters of the sub-model.
7. A submodel weight optimization device for large language model ensemble learning, comprising: The initial weight coding module is used for generating initial sub-model weight groups, combining and coding the initial sub-model weight groups into binary gene character strings, and taking the binary gene character strings as initial populations for evolutionary computation; The agent environment construction module is used for constructing an agent experiment environment, generating agent corpus by randomly sampling original pre-training corpus, and selecting at least three representative natural language processing tasks as agent tasks for evaluating adaptation values of the gene strings; the evolution operation module is used for selecting a high-adaptation value gene character string by adopting a wheel disc method based on the evaluation result of the agent task, and generating a next generation gene character string through crossover and mutation operation; and the iterative optimization and fine adjustment module is used for iteratively executing the S2 and the S3 until convergence, obtaining the optimal sub-model weight combination, and carrying out task-independent meta-learner fine adjustment training on the original full corpus so as to obtain a final integrated model.
8. An electronic device comprising a processor and a memory communicatively coupled to the processor; The memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-6.
9. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-6.
10. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-6.

Description

Sub-model weight optimization method for large language model integrated learning Technical Field The invention relates to the technical field of data processing, in particular to a sub-model weight optimization method for large language model integrated learning. Background The large language model is used as a core technology in the field of natural language processing and is widely applied to scenes such as text generation, semantic understanding, multi-modal interaction and the like. Along with the continuous expansion of the parameter scale of ChatGLM, LLaMA and other open source models, the integrated learning technology constructs a better solution system through multi-model collaborative operation. Specifically, the technology covers the whole process from model selection and weight distribution to output fusion, and comprises key links such as a stacked integration framework based on a genetic algorithm, task correlation analysis and the like. In the related art, CN111553117B adopts a genetic algorithm to optimize a primary learner combination, and realizes the intrinsic viscosity control of the polyester production process through iterative search, and the technical system of the CN111553117B has verified the feasibility of integrated learning in industrial scenes. However, in the existing integrated learning method, when the submodel weight search is directly performed by using a full corpus, the computational complexity is as high as O (N≡K) (where K is the number of submodels), and an optimization mechanism for the characteristics of a large language model does not exist, so that the computational resource waste or the model performance reduction may be caused. In particular, in the training of a task-independent meta-learner, a traditional genetic algorithm is required to complete single iterative optimization on PB-level corpus, the GPU time is up to several hours to several days, meanwhile, the existing agent task evaluation system (such as SST-2/MNLI/SQUAD three task combination) does not establish a weight selection strategy and a relevance verification mechanism of the performance of the whole corpus, and in the selection strategy based on a wheel disc method, the use of an adaptive value normalization formula does not form collaborative optimization with agent environment construction parameters (such as sampling rate and task combination). This technical limitation not only affects the ensemble learning efficiency, but also may lead to performance degradation of the final model on critical NLP tasks. Disclosure of Invention The present invention aims to solve at least one of the technical problems in the related art to some extent. Therefore, a first object of the present invention is to provide a sub-model weight optimization method for large language model ensemble learning. The second object of the present invention is to provide a sub-model weight optimization device for large language model ensemble learning. A third object of the present invention is to propose an electronic device. A fourth object of the present invention is to propose a computer readable storage medium. A fifth object of the invention is to propose a computer programme product. In order to achieve the above object, an embodiment of a first aspect of the present invention provides a sub-model weight optimization method for large language model ensemble learning, including: s1, generating an initial submodel weight set, combining and encoding the initial submodel weight set into a binary gene character string, and taking the binary gene character string as an initial population for evolutionary computation; S2, constructing a proxy experiment environment, generating a proxy corpus by randomly sampling an original pre-training corpus, and selecting at least three representative natural language processing tasks as proxy tasks for evaluating adaptation values of the gene strings; S3, selecting a high-adaptation value gene character string by adopting a wheel disc method based on an evaluation result of the agent task, and generating a next generation gene character string through crossing and mutation operation; and S4, iteratively executing the steps S2 and S3 until convergence, obtaining an optimal sub-model weight combination, and performing task-independent meta-learner fine tuning training on the original full corpus to obtain a final integrated model. In one embodiment of the present invention, the step of generating the initial submodel weight set and encoding into a binary gene string includes: s11, bernoulli distribution is adopted Independent sampling generates a '0' or a '1' in the binary string; S12, each sub-model weight combination is encoded into a length WhereinAs the number of sub-models,The number of steps is discretized for the weight. In one embodiment of the present invention, the step of constructing a proxy experiment environment includes: s21, selecting SST-2, MNLI and SQUAD v 1.1.1 as proxy