CN-116935158-B - Model training duration determining method and device
Abstract
The embodiment of the application provides a model training duration determining method and device, which relate to the technical field of training platforms and comprise the steps that a training server acquires a target training task for training a target image processing model based on a target sample image; the method comprises the steps of obtaining a target sample image, determining the acquisition time length required for acquiring the target sample image based on the data quantity of the target sample image and a first corresponding relation between the preset data quantity and the acquisition time length, obtaining the data preparation time length of a target training task, determining the task execution time length of the target training task based on a second corresponding relation between target training parameters of the target training task and a preset training parameter set and the execution time length, calculating the total training time length of the target training task based on the data preparation time length and the task execution time length of the target training task, and sending the total training time length of the target training task to a terminal. The terminal displays the total training time of the target training task, so that a user knows the time for completing training of the target image processing model, and user experience is improved.
Inventors
- SHI ZHIPING
Assignees
- 杭州海康机器人股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20230714
Claims (14)
- 1. A method for determining a model training duration, wherein the method is applied to a training server in a model training platform, the model training platform further comprises a terminal, and the method comprises: the method comprises the steps of obtaining a target training task, wherein the target training task is used for training a target image processing model based on a target sample image; Determining the acquisition time length required for acquiring the target sample image based on the data size of the target sample image and a first corresponding relation between the preset data size and the acquisition time length to obtain the data preparation time length of the target training task, wherein the first corresponding relation is determined based on the data size of a historical sample image corresponding to the historical training task executed by the training server and the acquisition time length for acquiring the historical sample image; Determining task execution time of the target training task based on target training parameters of the target training task and a second corresponding relation between a preset training parameter set and execution time, wherein the target training parameters comprise algorithm parameters of the target image processing model, image parameters of the target sample image and execution parameters of the target training task; Calculating the total training time length of the target training task based on the data preparation time length and the task execution time length of the target training task; the total training duration of the target training task is sent to the terminal, so that the terminal displays the total training duration of the target training task; the determining the task execution duration of the target training task based on the target training parameters of the target training task and the second corresponding relation between the preset training parameter set and the execution duration includes: Determining the execution time length corresponding to the training parameter set to which the target training parameter belongs in a second corresponding relation between the preset training parameter set and the execution time length, and taking the execution time length as the estimated single execution time length of the target training task for executing one iteration calculation; Calculating the total iteration number of the target training task based on the number of the target sample images, the batch size of the target training task and the batch iteration number of the target training task; and calculating the product of the total iteration times of the target training task and the estimated single execution time length of the target training task to obtain the task execution time length of the target training task.
- 2. The method of claim 1, wherein prior to the calculating the total training time for the target training task based on the data preparation time for the target training task and the task execution time, the method further comprises: based on a preset task scheduling strategy, determining a display card used for executing the target training task in the training server as a target display card; determining a training task which is required to be executed by the target display card before the target training task is executed as a first training task; from the first training tasks, determining a first training task currently executed by the target display card as a second training task, and determining a first training task to be executed by the target display card as a third training task; Calculating the sum of the residual execution time of the second training task, the data preparation time of the third training task and the task execution time to obtain the waiting time of the target training task; the calculating the total training duration of the target training task based on the data preparation duration and the task execution duration of the target training task includes: And calculating the sum of the waiting time, the data preparation time and the task execution time of the target training task to obtain the total training time of the target training task.
- 3. The method of claim 2, wherein prior to said calculating a sum of the remaining execution time of the second training task, the data preparation time of the third training task, and the task execution time, the method further comprises: Calculating the total iteration number of the second training task based on the number of the first sample images corresponding to the second training task, the batch size of the second training task and the batch iteration number of the second training task; calculating the difference value between the total iteration times of the second training task and the iteration times of the iterative calculation executed by the second training task to obtain the residual iteration times of the second training task; and calculating the residual execution time length of the second training task based on the single execution time length of the iterative calculation executed by the second training task and the residual iteration times of the second training task.
- 4. The method of claim 1, wherein the algorithm parameters of the target image processing model are the algorithm type of the target image processing model, the image parameters of the target sample image comprise the image type, width and height of the target sample image, and the execution parameters of the target training task comprise the batch size of the target training task; One training parameter set in the second corresponding relation comprises an algorithm type, an image type, a preset image width interval, a preset image height interval and a batch size; The target training parameters belong to a training parameter set, wherein the algorithm type in the training parameter set is the same as the algorithm type of the target image processing model, the image type is the same as the image type of the target sample image, the batch size is the same as the batch size of the target training task, a preset image width interval comprises the image width of the target sample image, and a preset image height interval comprises the image height of the target sample image.
- 5. The method of claim 1, wherein prior to determining the task execution duration of the target training task based on the target training parameters of the target training task and the second correspondence of the set of preset training parameters to the execution duration, the method further comprises: calculating the ratio of the task execution time length of executing the historical training task to the total iteration number of the historical training task aiming at each historical training task to obtain the single execution time length of executing one iteration calculation of the historical training task; Dividing historical training tasks with training parameters belonging to the same training parameter group into a group to obtain a plurality of first training task groups, wherein one training parameter group comprises an algorithm type, an image type, a preset image width interval, a preset image height interval and a batch size; Calculating the average value of the single execution time durations of each historical training task in each first training task group aiming at each first training task group to obtain the single execution time duration corresponding to the training parameter group corresponding to the first training task group; and recording the corresponding relation between each training parameter set and the single execution duration to obtain a second corresponding relation between the training parameter set and the execution duration.
- 6. The method of claim 1, wherein after the calculating the total training time period of the target training task based on the data preparation time period and the task execution time period of the target training task, the method further comprises: After the target training task is executed, calculating the ratio of the task execution time length for executing the target training task to the total iteration number of the target training task to obtain the actual single execution time length for executing one iteration calculation of the target training task; And updating the second corresponding relation based on the actual single execution time length of the target training parameter and the target training task for executing one iteration calculation.
- 7. The method according to claim 1, wherein determining the acquisition time period required for acquiring the target sample image based on the data amount of the target sample image and the first correspondence between the preset data amount and the acquisition time period, to obtain the data preparation time period of the target training task, includes: calculating the ratio of the total data quantity to the number of each target sample image to obtain the average data quantity of each target sample image; in a first corresponding relation between a preset data quantity and an acquisition time length, determining the acquisition time length corresponding to the data quantity with the smallest difference value between the calculated average data quantity, and obtaining a predicted acquisition time length for acquiring a target sample image; Calculating the product of the estimated acquisition time length of one target sample image and the number of each target sample image to obtain the total acquisition time length required by acquiring each target sample image, wherein the total acquisition time length is used as the data preparation time length of the target training task.
- 8. The method of claim 7, wherein before determining the acquisition time period required for acquiring the target sample image based on the data amount of the target sample image and the first correspondence between the preset data amount and the acquisition time period, and obtaining the data preparation time period of the target training task, the method further comprises: for each historical training task, calculating the ratio of the total data quantity to the number of each historical sample image corresponding to the historical training task to obtain the average data quantity of each historical sample image; calculating the ratio of the total acquisition time length of each history sample image to the number of each history sample image when the history training task is executed, and obtaining the average acquisition time length of each history sample image; Dividing the historical training tasks with the same average data quantity of each historical sample image into a group to obtain a plurality of second training task groups; Calculating the average value of average acquisition time lengths corresponding to each historical training task in each second training task group aiming at each second training task group to obtain average acquisition time lengths corresponding to average data quantity corresponding to the second training task group; And recording the corresponding relation between each average data quantity and the average acquisition time length to obtain a first corresponding relation between the data quantity and the acquisition time length.
- 9. The method of claim 7, wherein after the calculating the total training time period of the target training task based on the data preparation time period and the task execution time period of the target training task, the method further comprises: After the target training task is executed, calculating the ratio of the total acquisition time length of each target sample image to the number of each target sample image when the target training task is executed, and obtaining the average acquisition time length of each target sample image; and updating the first corresponding relation based on the average data volume and the average acquisition time length of each target sample image.
- 10. A model training platform is characterized by comprising a training server and a terminal, wherein, The training server is used for acquiring a target training task, wherein the target training task is used for training a target image processing model based on a target sample image, determining acquisition time required for acquiring the target sample image based on a first corresponding relation between the data volume of the target sample image and the acquisition time to obtain data preparation time of the target training task, wherein the first corresponding relation is determined based on the data volume of a historical sample image corresponding to the historical training task executed by the training server and the acquisition time of the historical sample image, determining task execution time of the target training task based on a target training parameter of the target training task and a second corresponding relation between a preset training parameter set and execution time, wherein the target training parameter comprises algorithm parameters of the target image processing model, image parameters of the target sample image and execution parameters of the target training task; the terminal is used for displaying the received total training time length of the target training task; The training server is specifically configured to: Determining the execution time length corresponding to the training parameter set to which the target training parameter belongs in a second corresponding relation between the preset training parameter set and the execution time length, and taking the execution time length as the estimated single execution time length of the target training task for executing one iteration calculation; Calculating the total iteration number of the target training task based on the number of the target sample images, the batch size of the target training task and the batch iteration number of the target training task; and calculating the product of the total iteration times of the target training task and the estimated single execution time length of the target training task to obtain the task execution time length of the target training task.
- 11. A model training duration determining apparatus, wherein the apparatus is applied to a training server in a model training platform, the model training platform further comprising a terminal, the apparatus comprising: the target training task acquisition module is used for acquiring a target training task, wherein the target training task is used for training a target image processing model based on a target sample image; The data preparation time length determining module is used for determining the acquisition time length required for acquiring the target sample image based on the data amount of the target sample image and a first corresponding relation between the preset data amount and the acquisition time length to obtain the data preparation time length of the target training task, wherein the first corresponding relation is determined based on the data amount of a historical sample image corresponding to the historical training task executed by the training server and the acquisition time length for acquiring the historical sample image; The task execution time length determining module is used for determining the task execution time length of the target training task based on target training parameters of the target training task and a second corresponding relation between a preset training parameter set and execution time length, wherein the target training parameters comprise algorithm parameters of the target image processing model, image parameters of the target sample image and execution parameters of the target training task; the total training time length determining module is used for calculating the total training time length of the target training task based on the data preparation time length and the task execution time length of the target training task; the total training time display module is used for sending the total training time of the target training task to the terminal so that the terminal can display the total training time of the target training task; The task execution duration determining module is specifically configured to determine, in a second correspondence between a preset training parameter set and an execution duration, an execution duration corresponding to the training parameter set to which the target training parameter belongs, as an estimated single execution duration for performing iterative computation on the target training task; Calculating the total iteration number of the target training task based on the number of the target sample images, the batch size of the target training task and the batch iteration number of the target training task; and calculating the product of the total iteration times of the target training task and the estimated single execution time length of the target training task to obtain the task execution time length of the target training task.
- 12. The apparatus of claim 11, wherein the apparatus further comprises: The target display card determining module is used for determining a display card used for executing the target training task in the training server as a target display card by executing a task scheduling strategy based on a preset before the total training time determining module executes the data preparation time and the task execution time based on the target training task and calculates the total training time of the target training task; the first training task determining module is used for determining a training task which needs to be executed by the target display card before the target training task is executed, and the training task is used as a first training task; The second training task determining module is used for determining a first training task currently executed by the target display card from the first training tasks as a second training task and determining a first training task to be executed by the target display card as a third training task; The waiting time length determining module is used for calculating the sum of the residual execution time length of the second training task, the data preparation time length of the third training task and the task execution time length to obtain the waiting time length of the target training task; the total training time length determining module is specifically configured to calculate a sum of a waiting time length, a data preparation time length and a task execution time length of the target training task, so as to obtain a total training time length of the target training task; The apparatus further comprises: the total iteration number determining module is used for executing the calculation of the residual execution time length of the second training task, the data preparation time length of the third training task and the sum value of the task execution time length before the waiting time length of the target training task is obtained, and executing the calculation of the total iteration number of the second training task based on the number of first sample images corresponding to the second training task, the batch size of the second training task and the batch iteration number of the second training task; the remaining iteration number determining module is used for calculating a difference value between the total iteration number of the second training task and the iteration number of the iterative calculation executed by the second training task to obtain the remaining iteration number of the second training task; The remaining execution duration determining module is used for calculating the remaining execution duration of the second training task based on the single execution duration of iterative calculation executed by the second training task and the remaining iteration times of the second training task; The algorithm parameters of the target image processing model are the algorithm type of the target image processing model, the image parameters of the target sample image comprise the image type, the width and the height of the target sample image, and the execution parameters of the target training task comprise the batch size of the target training task; One training parameter set in the second corresponding relation comprises an algorithm type, an image type, a preset image width interval, a preset image height interval and a batch size; The target training parameters belong to a training parameter group, wherein the algorithm type in the training parameter group is the same as the algorithm type of the target image processing model, the image type is the same as the image type of the target sample image, the batch size is the same as the batch size of the target training task, a preset image width interval comprises the image width of the target sample image, and a preset image height interval comprises the image height of the target sample image; The apparatus further comprises: the first single execution duration determining module is used for executing a ratio of the task execution duration of executing the historical training task to the total iteration number of the historical training task for each historical training task before the task execution duration determining module executes target training parameters based on the target training task and a second corresponding relation between a preset training parameter set and the execution duration to determine the task execution duration of the target training task, so as to obtain the single execution duration of executing one iteration calculation of the historical training task; The first training task grouping module is used for dividing the historical training tasks of which the training parameters belong to the same training parameter group into a group to obtain a plurality of first training task groups, wherein one training parameter group comprises an algorithm type, an image type, a preset image width interval, a preset image height interval and a batch size; The second single execution duration determining module is used for calculating the average value of the single execution durations of the historical training tasks in each first training task group aiming at each first training task group to obtain the single execution duration corresponding to the training parameter group corresponding to the first training task group; the second corresponding relation determining module is used for recording the corresponding relation between each training parameter set and the single execution duration to obtain a second corresponding relation between the training parameter set and the execution duration; The apparatus further comprises: The actual single execution time length determining module is used for calculating the ratio of the task execution time length for executing the target training task to the total iteration number of the target training task after the total training time length of the target training task is calculated after the total training time length determining module executes the data preparation time length and the task execution time length based on the target training task, and obtaining the actual single execution time length for executing one iteration calculation by the target training task; The second corresponding relation updating module is used for updating the second corresponding relation based on the actual single execution time length of the target training parameter and the target training task for executing one iteration calculation; The data preparation time length determining module is specifically used for calculating the ratio of the total data quantity to the number of each target sample image to obtain the average data quantity of each target sample image; in a first corresponding relation between a preset data quantity and an acquisition time length, determining the acquisition time length corresponding to the data quantity with the smallest difference value between the calculated average data quantity, and obtaining a predicted acquisition time length for acquiring a target sample image; calculating the product of the estimated acquisition time length of one target sample image and the number of each target sample image to obtain the total acquisition time length required by acquiring each target sample image, wherein the total acquisition time length is used as the data preparation time length of the target training task; The apparatus further comprises: The average data amount determining module is used for executing a ratio of total data amount to number of each historical sample image corresponding to each historical training task for each historical training task before the data preparation time length determining module executes the data amount based on the target sample image and a first corresponding relation between the preset data amount and the acquisition time length to determine the acquisition time length required for acquiring the target sample image and obtains the average data amount of each historical sample image; The first average acquisition time length determining module is used for calculating the ratio of the total acquisition time length of each history sample image to the number of each history sample image when the history training task is executed, so as to obtain the average acquisition time length of each history sample image; the second training task grouping module is used for dividing the historical training tasks with the same average data volume of each historical sample image into a group to obtain a plurality of second training task groups; The second average acquisition time length determining module is used for calculating the average value of the average acquisition time length corresponding to each historical training task in each second training task group aiming at each second training task group to obtain the average acquisition time length corresponding to the average data volume corresponding to the second training task group; the first corresponding relation determining module is used for recording the corresponding relation between each average data quantity and the average acquisition time length and obtaining the first corresponding relation between the data quantity and the acquisition time length; The device comprises: The third average obtaining duration determining module is used for obtaining the ratio of the total obtaining duration of each target sample image to the number of each target sample image when the target training task is executed after the total training duration determining module executes the data preparation duration and the task execution duration based on the target training task and calculates the total training duration of the target training task, and obtaining the average obtaining duration of each target sample image; and the first corresponding relation updating module is used for updating the first corresponding relation based on the average data volume and the average acquisition time length of each target sample image.
- 13. A training server, comprising: a memory for storing a computer program; a processor for implementing the method of any of claims 1-9 when executing a program stored on a memory.
- 14. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-9.
Description
Model training duration determining method and device Technical Field The application relates to the technical field of model training platforms, in particular to a method and a device for determining model training duration. Background The image processing model may be trained prior to processing such as image recognition, object detection, etc., based on the image processing model. A user may submit training tasks for the image processing model to an AI (ARTIFICIAL INTELLIGENCE ) training platform through the terminal. The AI training platform can execute training tasks to complete training of the image processing model. Because the display cards of the AI training platform are limited, one display card can only execute one training task at the same time, and when the number of the training tasks exceeds the number of the display cards of the AI training platform, the training tasks need to be queued. In the related art, after the training task is submitted to the AI training platform, the terminal may display a list page of the training task, where the current state of the training task is displayed in the list page. The states of the training task include four states in-line waiting, data preparing, training and converting. However, the terminal can only display the state of the training task, and in the related art, a corresponding method is not provided to determine the training duration of the training task, so that the user cannot know what time the image processing model can be trained to complete, and the user experience is reduced. Disclosure of Invention The embodiment of the application aims to provide a method and a device for determining model training duration, so as to determine the training duration of a training task and display the training duration of the training task to a user, so that the user can know the specific time of the image processing model to complete training, and the user experience is improved. The specific technical scheme is as follows: In order to achieve the above object, an embodiment of the present application provides a method for determining a model training duration, where the method is applied to a training server in a model training platform, and the model training platform further includes a terminal, and the method includes: the method comprises the steps of obtaining a target training task, wherein the target training task is used for training a target image processing model based on a target sample image; Determining the acquisition time length required for acquiring the target sample image based on the data size of the target sample image and a first corresponding relation between the preset data size and the acquisition time length to obtain the data preparation time length of the target training task, wherein the first corresponding relation is determined based on the data size of a historical sample image corresponding to the historical training task executed by the training server and the acquisition time length for acquiring the historical sample image; Determining task execution time of the target training task based on target training parameters of the target training task and a second corresponding relation between a preset training parameter set and execution time, wherein the target training parameters comprise algorithm parameters of the target image processing model, image parameters of the target sample image and execution parameters of the target training task; Calculating the total training time length of the target training task based on the data preparation time length and the task execution time length of the target training task; And sending the total training duration of the target training task to the terminal so that the terminal displays the total training duration of the target training task. Optionally, before the calculating the total training duration of the target training task based on the data preparation duration and the task execution duration of the target training task, the method further includes: based on a preset task scheduling strategy, determining a display card used for executing the target training task in the training server as a target display card; determining a training task which is required to be executed by the target display card before the target training task is executed as a first training task; from the first training tasks, determining a first training task currently executed by the target display card as a second training task, and determining a first training task to be executed by the target display card as a third training task; Calculating the sum of the residual execution time of the second training task, the data preparation time of the third training task and the task execution time to obtain the waiting time of the target training task; the calculating the total training duration of the target training task based on the data preparation duration and the task execution duration of the target t