CN-120893528-B - Determination method and device for model super parameters, electronic equipment and medium
Abstract
The application discloses a method, a device, electronic equipment and a medium for determining model superparameter, which comprise the steps of obtaining a pre-built initial sequence model and a target big model, iteratively searching a superparameter space of the target big model through a specified optimization algorithm so as to construct a label data sequence based on the superparameter space, performing supervised learning training on the initial sequence model through the label data sequence to obtain a target sequence model, and determining target superparameter of the target big model through the target sequence model. Therefore, the label data sequence comprising the super-parameter label is obtained through the appointed optimization algorithm, and the initial sequence model is subjected to supervised learning training through the label data sequence, so that the initial sequence model learns the super-parameter recommendation strategy of the appointed optimization algorithm, and the super-parameter of the target large model can be rapidly and accurately determined by directly using the target sequence model.
Inventors
- XIA YOUHUA
- WANG JUN
- ZHANG BAOLONG
Assignees
- 之江实验室
Dates
- Publication Date
- 20260512
- Application Date
- 20250723
Claims (8)
- 1. A method of controlling intelligence analysis, the method comprising: acquiring a pre-constructed initial sequence model and a target large model; Iteratively searching a super-parameter space of the target large model through a specified optimization algorithm so as to construct a tag data sequence based on the super-parameter space; performing supervised learning training on the initial sequence model through the tag data sequence to obtain a target sequence model; Determining target super parameters of the target large model through the target sequence model so as to carry out information analysis through the target large model, wherein the sequence model is a model based on a transducer architecture; iteratively searching a hyper-parameter space of the target large model by a specified optimization algorithm so as to construct a tag data sequence based on the hyper-parameter space, wherein the method comprises the following steps: acquiring a test data set of the target large model, preset initial super parameters and initial model accuracy corresponding to the initial super parameters, wherein the test data set is text data comprising questions and answers; taking the initial super-parameters and the initial model accuracy as current data sequences; inputting the current data sequence into the appointed optimization algorithm to obtain an algorithm recommendation hyper-parameter; After the super parameters of the target large model are set as the algorithm recommended super parameters, performing instruction fine tuning training on the target large model through the test data set so as to determine the accuracy of the current model of the target large model; adding a sequence formed by the algorithm recommendation hyper-parameters and the accuracy of the current model to the current data sequence, returning the current data sequence, inputting the specified optimization algorithm, and obtaining the algorithm recommendation hyper-parameters so as to perform iterative search until a preset iterative condition is reached; and constructing the tag data sequence based on the current data sequence after the preset iteration condition is reached.
- 2. The method for controlling intelligence analysis according to claim 1, wherein the specified optimization algorithm is a bayesian optimization algorithm.
- 3. The method for controlling intelligence analysis according to claim 1, wherein the determining, by the target sequence model, a target super parameter of the target large model includes: The method comprises the steps of obtaining a historical data sequence and a test data set of the target large model, wherein the historical data sequence consists of historical super parameters and the accuracy of the historical model; Inputting the historical data sequence into the target sequence model to obtain a model recommendation hyper-parameter; After the super parameters of the target large model are set as the model recommended super parameters, performing instruction fine tuning training on the target large model through the test data set so as to determine the target model accuracy of the target large model; and adding a sequence formed by the model recommendation superparameter and the target model accuracy into the historical data sequence, returning the historical data sequence, inputting the target sequence model, and obtaining the model recommendation superparameter until the target model accuracy reaches an accuracy threshold.
- 4. The method for controlling intelligence analysis according to claim 3, wherein the target sequence model comprises a first multi-layer sensing mechanism, a second multi-layer sensing mechanism and a multi-layer attention mechanism, wherein the step of inputting the historical data sequence into the target sequence model to obtain model recommendation superparameters comprises the steps of: encoding the historical data sequence into a target vector through the first multi-layer perception mechanism; inputting the target vector into the multi-layer attention mechanism to generate a global aggregate representation; and inputting the global aggregation representation into the second multi-layer perception mechanism to obtain the model recommendation hyper-parameters.
- 5. The control method of intelligence analysis according to claim 1, characterized by comprising, before said supervised learning training of said initial sequence model by said tag data sequence: acquiring a test data set of expert recommended hyper-parameters and the target large model, wherein the expert recommended hyper-parameters are a plurality of; after the super parameters of the target large model are set as the expert recommended super parameters, performing instruction fine tuning training on the target large model through the test data set so as to determine the accuracy of the target model corresponding to the expert recommended super parameters; And constructing the tag data sequence based on the expert recommended hyper-parameters and the target model accuracy.
- 6. A control device for intelligence analysis, the device comprising: the first acquisition module is used for acquiring a pre-constructed initial sequence model and a target large model; The tag data sequence generation module is used for iteratively searching a super-parameter space of the target large model through a specified optimization algorithm so as to construct a tag data sequence based on the super-parameter space; the supervised learning module is used for performing supervised learning training on the initial sequence model through the tag data sequence to obtain a target sequence model; the system comprises a target super-parameter determining module, a sequence model, a data analysis module and a data analysis module, wherein the target super-parameter determining module is used for determining target super-parameters of the target large model through the target sequence model so as to carry out information analysis through the target large model; The second acquisition module is used for acquiring a test data set of the target large model, preset initial super parameters and initial model accuracy corresponding to the initial super parameters, wherein the test data set is text data comprising questions and answers; the current sequence determining module is used for taking the initial super-parameters and the initial model accuracy as current data sequences; The algorithm recommendation hyper-parameter acquisition module is used for inputting the current data sequence into the specified optimization algorithm to obtain algorithm recommendation hyper-parameters; The first training module is used for performing instruction fine tuning training on the target large model through the test data set after the super parameters of the target large model are set as the algorithm recommended super parameters so as to determine the accuracy of the current model of the target large model; The current data sequence updating module is used for adding a sequence formed by the algorithm recommendation super-parameters and the current model accuracy to the current data sequence, returning to the step of inputting the current data sequence into the specified optimization algorithm to obtain the algorithm recommendation super-parameters, and carrying out iterative search until a preset iterative condition is reached; And the sequence first construction module is used for constructing the tag data sequence based on the current data sequence after the preset iteration condition is reached.
- 7. An electronic device comprising a memory and a processor, said memory having stored thereon a computer program executable on said processor, characterized in that the processor, when executing said computer program, carries out the steps of the control method of intelligence analysis according to any one of claims 1 to 6.
- 8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, carries out the steps of the control method of intelligence analysis according to any one of claims 1 to 6.
Description
Determination method and device for model super parameters, electronic equipment and medium Technical Field The present application relates to the field of computer technologies, and in particular, to a method and apparatus for determining a model hyper parameter, an electronic device, and a medium. Background Along with the development of artificial intelligence technology, a large language model is gradually and widely applied to complex information processing tasks in various fields. However, the current general large language model has poor performance in the field of information analysis with high requirements on accuracy, real-time performance and safety for task requirements in specific fields, and is difficult to meet the actual demands of users. In order to improve the performance of the large model in a specific task scene, training adjustment can be performed on the large model through instruction fine adjustment, so that the large model can better understand the intention of a user. In a specific implementation process, how to reasonably, efficiently and accurately select the hyper-parameters of a large model in a training process is critical to the final performance of the model. At present, the hyper-parameters of a large model are recommended by the reinforcement learning algorithm, however, as the historical data increases, the complexity of the reinforcement learning algorithm inference increases, so that the efficiency and the accuracy in large-scale problems or high-frequency recommendation tasks are all in a decreasing trend. Therefore, how to accurately and efficiently obtain the hyper-parameters of the large model to meet the requirements of different downstream tasks is a problem to be solved by those skilled in the art. Disclosure of Invention In view of this, an aspect of the present application provides a method for determining a model hyper-parameter, the method comprising: acquiring a pre-constructed initial sequence model and a target large model; Iteratively searching a super-parameter space of the target large model through a specified optimization algorithm so as to construct a tag data sequence based on the super-parameter space; performing supervised learning training on the initial sequence model through the tag data sequence to obtain a target sequence model; and determining the target super parameters of the target large model through the target sequence model. Optionally, the iteratively searching the hyper-parameter space of the target large model by a specified optimization algorithm to construct a tag data sequence based on the hyper-parameter space includes: Acquiring a test data set of the target large model, a preset initial super parameter and an initial model accuracy corresponding to the initial super parameter; taking the initial super-parameters and the initial model accuracy as current data sequences; inputting the current data sequence into the appointed optimization algorithm to obtain an algorithm recommendation hyper-parameter; After the super parameters of the target large model are set as the algorithm recommended super parameters, performing instruction fine tuning training on the target large model through the test data set so as to determine the accuracy of the current model of the target large model; adding a sequence formed by the algorithm recommendation hyper-parameters and the accuracy of the current model to the current data sequence, returning the current data sequence, inputting the specified optimization algorithm, and obtaining the algorithm recommendation hyper-parameters so as to perform iterative search until a preset iterative condition is reached; and constructing the tag data sequence based on the current data sequence after the preset iteration condition is reached. Optionally, the specified optimization algorithm is a bayesian optimization algorithm. Optionally, the determining, by the target sequence model, the target super parameter of the target large model includes: The method comprises the steps of obtaining a historical data sequence and a test data set of the target large model, wherein the historical data sequence consists of historical super parameters and the accuracy of the historical model; Inputting the historical data sequence into the target sequence model to obtain a model recommendation hyper-parameter; After the super parameters of the target large model are set as the model recommended super parameters, performing instruction fine tuning training on the target large model through the test data set so as to determine the target model accuracy of the target large model; and adding a sequence formed by the model recommendation superparameter and the target model accuracy into the historical data sequence, returning the historical data sequence, inputting the target sequence model, and obtaining the model recommendation superparameter until the target model accuracy reaches an accuracy threshold. Optionally, the targ