Search

CN-122018870-A - Model development management method and device and computing equipment

CN122018870ACN 122018870 ACN122018870 ACN 122018870ACN-122018870-A

Abstract

The invention discloses a model development management method, a device and a computing device, wherein the method comprises the steps of receiving training task creation operation triggered at a client, wherein the training task creation operation comprises operation of inputting training basic information, and the training basic information comprises a training data set, a training code version and training parameters; the method comprises the steps of creating a training task of a model based on training basic information, starting a training container through a trainer in response to a starting request of a client to the training task, and executing the training task through the training container, wherein the step of executing codes of the training code version through the training container based on training parameters is performed so as to train the model based on a training data set, and a trained model corresponding to the training task is obtained. Based on the method, efficient and convenient model development and management can be realized.

Inventors

  • CHEN JIAN
  • QIAO NAN
  • Zhai Xiaogeng

Assignees

  • 北京并行科技股份有限公司
  • 北京北龙超级云计算有限责任公司

Dates

Publication Date
20260512
Application Date
20260129

Claims (11)

  1. 1. A model development management method is executed at a server side and comprises the following steps: Receiving training task creation operation triggered at a client, wherein the training task creation operation comprises operation of inputting training basic information, and the training basic information comprises a training data set, a training code version and training parameters; creating a training task of the model based on the training basic information; Responding to a starting request of the client to the training task, and starting a training container through a trainer; Executing the training task through the training container comprises executing codes of the training code version through the training container based on the training parameters so as to train the model based on the training data set, and obtaining a trained model corresponding to the training task.
  2. 2. The method of claim 1, further comprising: receiving an inference task creation operation triggered at a client, wherein the inference task creation operation comprises an operation of inputting inference basic information, and the inference basic information comprises an inference data set, an inference code version and an inference parameter; creating an inference task of a model based on the inference basic information; starting an inference container through an inference engine in response to a starting request of the client for the inference task; executing the reasoning task through the reasoning container comprises executing the codes of the reasoning code version based on the reasoning parameters through the reasoning container so as to conduct reasoning on the model based on the reasoning data set to obtain a reasoning result.
  3. 3. The method of claim 1 or 2, further comprising: receiving management operations of a data set, codes, training tasks, reasoning tasks, models or running environments triggered on an interface of a client, and executing the management operations; Wherein the management operations include one or more of list viewing operations, creation operations, deletion operations, modification operations, query operations, file downloading operations.
  4. 4. A method according to any one of claims 1-3, further comprising: the state of the training task and the state of the reasoning task are regularly monitored through the task monitor, so that the state of the training task or the state of the reasoning task is updated when the state of the training task or the state of the reasoning task is monitored to change.
  5. 5. The method according to any one of claims 1-4, wherein obtaining a trained model corresponding to the training task comprises: and after training the model, acquiring a check point from the output path of the training task, and obtaining and storing the trained model based on the check point.
  6. 6. The method of any of claims 1-5, wherein the training container is started by a trainer, comprising: Creating a trainer, starting a training container through the trainer based on the running environment corresponding to the training task, and transmitting training parameters corresponding to the training task to the training container.
  7. 7. The method of any of claims 1-6, wherein creating a training task for model training based on the training base information comprises: generating a training ID (identity) uniquely corresponding to the training task, and storing the training basic information and the training ID in a training table of a database system in an associated manner; creating a folder corresponding to the training ID in a training catalog as an output path of the training task.
  8. 8. The method of any one of claim 1 to 7, wherein, The client comprises a browser; the server is suitable for carrying out data interaction with the client through an OpenAPI.
  9. 9. A model development management apparatus deployed at a server, adapted to perform the method of any one of claims 1-8, the apparatus comprising: The receiving module is suitable for receiving training task creation operation triggered at the client, wherein the training task creation operation comprises operation of inputting training basic information, and the training basic information comprises a training data set, a training code version and training parameters; The creating module is suitable for creating training tasks of the model based on the training basic information; The starting module is suitable for responding to a starting request of the client to the training task and starting the training container through the trainer; The execution module is suitable for executing the training task through the training container and comprises executing codes of the training code version based on the training parameters through the training container so as to train the model based on the training data set and obtain a trained model corresponding to the training task.
  10. 10. A computing device, comprising: at least one processor, and A memory storing program instructions, wherein the program instructions are configured to be adapted to be processed by the at least one processor, the program instructions comprising instructions for processing the method of any of claims 1-9.
  11. 11. A computer program product comprising computer program instructions which, when executed by a processor, implement the method of any of claims 1-9.

Description

Model development management method and device and computing equipment Technical Field The present invention relates to the field of artificial intelligence technologies, and in particular, to a model development management method, a model development management device, and a computing device. Background With the rapid development of artificial intelligence technology, the development process of machine learning models becomes increasingly complex. Conventional model development processes typically have the following pain points: 1) The environment configuration is complex, the deep learning framework and the dependent library version are numerous, the environment configuration is time-consuming and labor-consuming, the consistency of training and reasoning environments is difficult to ensure, and the compatibility problem is easy to cause. 2) Task management is inconvenient, training and reasoning tasks lack effective recording means, and historical running records and results cannot be checked. 3) And the resource management confusion is that the management confusion of the data set, the model version and the code version is difficult to trace and reproduce the experimental result of the specific version when the team cooperates. 4) Process monitoring is lacking-training processes are typically monitored by printing logs, lacking an intuitive, visual real-time monitoring interface. The developer cannot view key indexes such as training progress, loss function change curves and the like in real time, so that problems are found out with hysteresis. 5) The output result is inconvenient to obtain, namely, training output model files, logs, reasoning output result files (such as pictures) and the like, and the user is required to manually search and download, so that the process is complicated. In view of this, a model development management method is needed to solve the problems in the above technical solutions. Disclosure of Invention Accordingly, the present invention provides a model development management method and a model development management apparatus to solve or at least alleviate the above-mentioned problems. According to one aspect of the invention, a model development management method is provided, and the method is executed on a server side, and comprises the steps of receiving training task creation operation triggered on a client side, wherein the training task creation operation comprises operation of inputting training basic information, the training basic information comprises a training data set, training code versions and training parameters, creating training tasks of a model based on the training basic information, starting a training container through a trainer in response to a starting request of the training tasks by the client side, and executing the training tasks through the training container, wherein the training tasks comprise the step of executing codes of the training code versions based on the training parameters through the training container so as to train the model based on the training data set, and accordingly a trained model corresponding to the training tasks is obtained. Optionally, the model development management method further comprises the steps of receiving an inference task creation operation triggered by a client, wherein the inference task creation operation comprises the operation of inputting inference basic information, the inference basic information comprises an inference data set, an inference code version and an inference parameter, creating an inference task of a model based on the inference basic information, starting an inference container through an inference machine in response to a starting request of the client for the inference task, and executing the inference task through the inference container, wherein the inference task is executed through the inference container and comprises the step of executing codes of the inference code version based on the inference parameter so as to conduct inference on the model based on the inference data set to obtain an inference result. Optionally, the model development management method further comprises the step of receiving management operations of the data set, the codes, the training tasks, the reasoning tasks, the models or the running environment, which are triggered on the interface of the client, and executing the management operations, wherein the management operations comprise one or more of list viewing operations, creating operations, deleting operations, modifying operations, inquiring operations and file downloading operations. Optionally, the model development management method according to the invention further comprises the step of monitoring the states of the training tasks and the inference tasks through the task monitor at regular time so as to update the states of the training tasks or the inference tasks when the states of the training tasks or the inference tasks are monitored to chang