Search

CN-122021813-A - Model training method, device, storage medium and equipment

CN122021813ACN 122021813 ACN122021813 ACN 122021813ACN-122021813-A

Abstract

The application discloses a model training method, a model training device, a model training storage medium and model training equipment, and belongs to the field of artificial intelligence. The method comprises the steps of firstly setting initial values of all super-parameters, determining the super-parameters to be adjusted, then carrying out multi-batch training test by using all the super-parameters, automatically adjusting the super-parameters to be adjusted according to actual resource occupation and set training resource threshold values in the test process until a group of super-parameters meeting the set training resource threshold values is obtained, and finally carrying out formal training according to the obtained group of super-parameters. The application can automatically update the super parameters to meet the set conditions, reasonably utilize the allocated training resources, avoid the occurrence of idle calculation or low occupancy rate of the allocated training resources, thereby more rapidly training out the model result and reducing the occupation time of the resources.

Inventors

  • ZHANG ZHENGUO
  • LI QIANG
  • LI JIE
  • LI SHENG
  • ZHOU JUN

Assignees

  • 北京眼神智能科技有限公司
  • 北京眼神科技有限公司
  • 深圳爱酷智能科技有限公司

Dates

Publication Date
20260512
Application Date
20241111

Claims (10)

  1. 1. A method of model training, the method comprising: setting initial values of all the super parameters, and determining the super parameters to be adjusted; Setting a training resource threshold; performing training test by using the current values of the super parameters, and acquiring actual resource occupation in the training test; Judging whether the actual resource occupation meets the training resource threshold, if so, executing the next step, otherwise, automatically adjusting the super parameters to be adjusted, and returning to the step of training test according to the current values of the super parameters; training is performed using the current values of the respective super-parameters.
  2. 2. The model training method according to claim 1, wherein the automatically adjusting the super-parameters to be adjusted includes: And automatically adjusting the super-parameters to be adjusted by using a weight algorithm and an adaptive algorithm.
  3. 3. The model training method of claim 2, wherein the super-parameters to be adjusted include one or more of batch size, gradient accumulation parameters, regularization parameters, and optimizer parameters.
  4. 4. A model training method according to any of claims 1-3, characterized in that said determining whether said actual resource occupancy meets said training resource threshold comprises: Collecting hardware specification information used by a training task; And judging whether the actual resource occupation meets the training resource threshold according to the hardware specification information, the actual resource occupation and the training resource threshold.
  5. 5. A model training apparatus, the apparatus comprising: the initialization module is used for setting initial values of all the super parameters and determining the super parameters to be adjusted; the threshold setting module is used for setting a training resource threshold; The training test module is used for carrying out training test by using the current values of the super parameters and obtaining actual resource occupation in the training test; the judging module is used for judging whether the actual resource occupation meets the training resource threshold, if yes, executing the training module, otherwise, automatically adjusting the super-parameters to be adjusted, and returning to the training test module; and the training module is used for training by using the current values of the super parameters.
  6. 6. The model training apparatus of claim 5 wherein the determination module automatically adjusts the hyper-parameters to be adjusted using a weight algorithm and an adaptive algorithm.
  7. 7. The model training apparatus of claim 6 wherein the super-parameters to be adjusted comprise one or more of batch size, gradient accumulation parameters, regularization parameters, and optimizer parameters.
  8. 8. The model training apparatus of any one of claims 5-7 wherein the determination module comprises: The information collection unit is used for collecting hardware specification information used by the training task; And the judging unit is used for judging whether the actual resource occupation meets the training resource threshold according to the hardware specification information, the actual resource occupation and the training resource threshold.
  9. 9. A computer readable storage medium for model training, comprising a memory for storing processor executable instructions which, when executed by the processor, implement steps comprising the model training method of any of claims 1-4.
  10. 10. An apparatus for model training comprising at least one processor and a memory storing computer executable instructions that when executed by the processor perform the steps of the model training method of any of claims 1-4.

Description

Model training method, device, storage medium and equipment Technical Field The application relates to the field of artificial intelligence, in particular to a model training method, a device, a storage medium and equipment. Background In recent years, deep learning techniques have been widely used in the fields of computer vision, natural language processing, speech, recommendation systems, and the like. The deep learning model needs Training (Training) before use, and the Training refers to a model structure or a deep neural network structure constructed by utilizing the characteristics of data, and the process of Training parameters of the model by taking the data as input. The model training platform is a system for training a deep neural network by utilizing clustered resources, and is a management platform for customizing and developing a model with higher degree of freedom by adopting an easy-to-use development environment, a self-defining task and an interface configuration parameter-adjusting mode. Model training platforms typically include model training and verification, model management, model deployment, and the like. In the conventional training platform scheme, a user generally selects a model scheme, determines a network structure, selects a pre-trained model, or performs training from scratch, and sets super parameters to perform training of the model. The set parameters have larger influence on the occupation of the training resources, and the selected network, the super-parameter values and the data size (such as the size of the image data) have larger influence on the occupation of the training resources. After the network and the data are selected, what super-parameter value is selected can be used for training the hardware resources more fully, and no clear method for automatically adjusting the super-parameters to realize reasonable utilization of the training resources exists at present. In the prior art, the training is generally performed by using an empirical value or manually modifying parameters for a plurality of times in the environment of the same hardware specification, checking the occupation of training hardware resources, and selecting a proper hyper-parameter value for training. However, the hardware occupation test needs to be manually performed for many times to select the proper hyper-parameter value when the hardware occupation test is replaced into the hardware training resources with other specifications. Disclosure of Invention In order to overcome the defects in the prior art, the application provides a model training method, a model training device, a model training storage medium and model training equipment, which can reasonably utilize allocated training resources so as to train model results more rapidly. The technical scheme provided by the application is as follows: In a first aspect, the present application provides a model training method, the method comprising: setting initial values of all the super parameters, and determining the super parameters to be adjusted; Setting a training resource threshold; performing training test by using the current values of the super parameters, and acquiring actual resource occupation in the training test; Judging whether the actual resource occupation meets the training resource threshold, if so, executing the next step, otherwise, automatically adjusting the super parameters to be adjusted, and returning to the step of training test according to the current values of the super parameters; training is performed using the current values of the respective super-parameters. Further, the automatically adjusting the super parameter to be adjusted includes: And automatically adjusting the super-parameters to be adjusted by using a weight algorithm and an adaptive algorithm. Further, the super-parameters to be adjusted include one or more of a batch size, a gradient accumulation parameter, a regularization parameter, and an optimizer parameter. Further, the determining whether the actual resource occupation meets the training resource threshold includes: Collecting hardware specification information used by a training task; And judging whether the actual resource occupation meets the training resource threshold according to the hardware specification information, the actual resource occupation and the training resource threshold. In a second aspect, the present application provides a model training apparatus, the apparatus comprising: the initialization module is used for setting initial values of all the super parameters and determining the super parameters to be adjusted; the threshold setting module is used for setting a training resource threshold; The training test module is used for carrying out training test by using the current values of the super parameters and obtaining actual resource occupation in the training test; the judging module is used for judging whether the actual resource occupation meets the training r