Search

CN-117114057-B - Model generation method, storage medium and electronic device

CN117114057BCN 117114057 BCN117114057 BCN 117114057BCN-117114057-B

Abstract

The application discloses a model generation method, a storage medium and electronic equipment. The method comprises the steps of obtaining model demand information, selecting a plurality of callback components to be used and a plurality of processing components to be expanded based on the model demand information, wherein the callback components are used for splitting the realization process of a target model, the processing components are used for expanding the realization process of the target model to the realization process of an initial model, and the realization process of the target model comprises at least one of the training process of the target model and the reasoning process of the target model, and generating the target model by utilizing the callback components and the processing components. The method solves the technical problems of high difficulty in use and high switching cost of the training frame in the training process of the model generation method provided by the related technology.

Inventors

  • ZHAO YUZE
  • ZHOU WENMENG

Assignees

  • 杭州阿里巴巴飞天信息技术有限公司

Dates

Publication Date
20260512
Application Date
20230817

Claims (12)

  1. 1. A model generation method, characterized by comprising: The method comprises the steps of obtaining model demand information, wherein the model demand information is used for determining the implementation demand of a target model corresponding to a target application scene according to the implementation process of a pre-built initial model, the implementation process of the initial model comprises at least one of the training process of the initial model and the reasoning process of the initial model, and the implementation demand of the target model comprises at least one of the training demand of the target model and the reasoning demand of the target model; selecting a plurality of callback components to be used and a plurality of processing components to be expanded based on the model demand information, wherein the callback components are used for splitting the implementation process of the target model, the processing components are used for expanding the implementation process of the target model to the implementation process of the initial model, and the implementation process of the target model comprises at least one of a training process of the target model and an reasoning process of the target model; generating the target model with the plurality of callback components and the plurality of processing components; providing, by the electronic device, a graphical user interface, the content displayed by the graphical user interface at least partially comprising a model-generating scene, the model-generating method further comprising: Determining an implementation mode of the target model in response to a first touch operation acting on the graphical user interface, wherein the implementation mode comprises a training mode and an reasoning mode; Responding to a second touch operation acting on the graphical user interface, and acquiring the model requirement information corresponding to the implementation mode and a data set corresponding to the implementation mode, wherein the data set comprises a training data set and an reasoning data set; Responding to a third touch operation acting on the graphical user interface, selecting the callback components and the processing components based on the model requirement information, splitting the implementation process of the target model into the callback components, inheriting the processing components in a multi-machine multi-card mode, and expanding the implementation process corresponding to the callback components to the implementation process of the initial model respectively so as to generate the target model by utilizing the data set.
  2. 2. The model generation method according to claim 1, wherein acquiring the model demand information includes: determining an implementation mode of the target model, wherein the implementation mode is used for selecting an implementation category of the target model, and the implementation category comprises at least one of a training category and an reasoning category; and acquiring the model demand information based on the implementation mode.
  3. 3. The model generation method according to claim 2, wherein selecting the plurality of callback components to be used and the plurality of processing components to be extended based on the model requirement information comprises: determining an implementation device corresponding to the implementation mode according to the model demand information; Registering the plurality of callback components through the implementation device; and determining the plurality of processing components corresponding to the plurality of callback components based on the number and the types of the plurality of callback components.
  4. 4. The model generation method according to claim 3, wherein the implementation modes include a training mode and an inference mode, and determining the implementation device corresponding to the implementation mode according to the model requirement information includes: Responding to the model demand information as model training demand information, and determining a trainer corresponding to the training mode; and responding to the model demand information as model reasoning demand information, and determining an reasoner corresponding to the reasoning mode.
  5. 5. The model generation method of claim 4, wherein registering the plurality of callback components by the implementation device comprises: Responding to the model demand information to obtain model training demand information, and registering a plurality of training callback components through the trainer; and in response to the model requirement information, a plurality of reasoning callback components are registered for the model reasoning requirement information through the reasoner.
  6. 6. The model generation method of claim 5, wherein determining the plurality of processing components corresponding to the plurality of callback components based on the number and type of the plurality of callback components comprises: responding to the model demand information as the model training demand information, and determining a plurality of training processing components corresponding to the plurality of training callback components based on the number and the types of the plurality of training callback components; And responding to the model demand information as the model reasoning demand information, and determining a plurality of reasoning processing components corresponding to the plurality of reasoning callback components based on the number and the types of the plurality of reasoning callback components.
  7. 7. The model generation method of claim 1, wherein generating the target model with the plurality of callback components and the plurality of processing components comprises: splitting the implementation process of the target model into the plurality of callback components; And inheriting the plurality of processing components in a multi-machine multi-card mode, and respectively expanding the implementation processes corresponding to the callback components to the implementation process of the initial model to generate the target model.
  8. 8. The model generation method of claim 1, wherein a graphical user interface is provided by the electronic device, the graphical user interface displaying content that at least partially includes an instruction response scene, the model generation method further comprising: displaying a command input box and a problem category and a problem example corresponding to the target application scene in the graphical user interface, wherein the problem category and the problem example are used for prompting a mode of initiating a question in the command input box; Responding to a fourth touch operation acting on the graphical user interface, and acquiring question information input in the instruction input box based on the question category and the question example; and responding to a fifth touch operation acting on the graphical user interface, and feeding back answer information corresponding to the question information in the graphical user interface.
  9. 9. A model generation method, characterized by comprising: Obtaining model demand information through a first processing component, wherein the first processing component is used for providing the realization demand of a target model, the model demand information is determined according to the realization process of an initial model, the realization demand of the target model comprises at least one of the training demand of the target model and the reasoning demand of the target model, and the realization process of the initial model comprises at least one of the training process of the initial model and the reasoning process of the initial model; Selecting a plurality of callback components to be used and a plurality of second processing components to be expanded based on the model demand information, wherein the callback components are used for splitting the implementation process of the target model, the second processing components are used for expanding the implementation process of the target model to the implementation process of the initial model, and the implementation process of the target model comprises at least one of a training process of the target model and an reasoning process of the target model; Inheriting the plurality of second processing components in a multi-machine multi-card mode, and respectively expanding the implementation processes corresponding to the callback components to the implementation process of the initial model to generate the target model; providing, by the electronic device, a graphical user interface, the content displayed by the graphical user interface at least partially comprising a model-generating scene, the model-generating method further comprising: Determining an implementation mode of the target model in response to a first touch operation acting on the graphical user interface, wherein the implementation mode comprises a training mode and an reasoning mode; Responding to a second touch operation acting on the graphical user interface, and acquiring the model requirement information corresponding to the implementation mode and a data set corresponding to the implementation mode, wherein the data set comprises a training data set and an reasoning data set; Responding to a third touch operation acting on the graphical user interface, selecting the callback components and the second processing components based on the model requirement information, splitting the realization process of the target model into the callback components, inheriting the second processing components in a multi-machine multi-card mode, and expanding the realization process corresponding to the callback components to the realization process of the initial model respectively so as to generate the target model by utilizing the data set.
  10. 10. A model generation method, characterized by comprising: The method comprises the steps of obtaining model demand information, wherein the model demand information is used for determining the implementation demand of an e-commerce large model corresponding to an e-commerce application scene according to the implementation process of a general large model, the general large model is a general basic model of a plurality of application scenes, the implementation process of the general large model comprises at least one of the training process of the general large model and the reasoning process of the general large model, and the implementation demand of the e-commerce large model comprises at least one of the training demand of the e-commerce large model and the reasoning demand of the e-commerce large model; Selecting a plurality of callback components to be used and a plurality of processing components to be expanded based on the model demand information, wherein the callback components are used for splitting the implementation process of the E-commerce large model, and the processing components are used for expanding the implementation process of the E-commerce large model to the implementation process of the general large model; generating the e-commerce large model by utilizing the callback components and the processing components; providing, by the electronic device, a graphical user interface, the content displayed by the graphical user interface at least partially comprising a model-generating scene, the model-generating method further comprising: determining an implementation mode of the E-commerce large model in response to a first touch operation acting on the graphical user interface, wherein the implementation mode comprises a training mode and an reasoning mode; Responding to a second touch operation acting on the graphical user interface, and acquiring the model requirement information corresponding to the implementation mode and a data set corresponding to the implementation mode, wherein the data set comprises a training data set and an reasoning data set; Responding to a third touch operation acting on the graphical user interface, selecting the callback components and the processing components based on the model requirement information, splitting the implementation process of the E-commerce large model into the callback components, inheriting the processing components in a multi-machine multi-card mode, and expanding the implementation process corresponding to the callback components into the implementation process of the general large model respectively so as to generate the E-commerce large model by utilizing the data set.
  11. 11. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored executable program, wherein the executable program, when run, controls a device in which the storage medium is located to perform the model generation method of any one of claims 1 to 10.
  12. 12. A model generation system, comprising: A processor; A memory, coupled to the processor, for providing instructions to the processor to process the following processing steps: The method comprises the steps of obtaining model demand information, wherein the model demand information is used for determining the implementation demand of a target model corresponding to a target application scene according to the implementation process of a pre-built initial model, the implementation process of the initial model comprises at least one of the training process of the initial model and the reasoning process of the initial model, and the implementation demand of the target model comprises at least one of the training demand of the target model and the reasoning demand of the target model; selecting a plurality of callback components to be used and a plurality of processing components to be expanded based on the model demand information, wherein the callback components are used for splitting the implementation process of the target model, the processing components are used for expanding the implementation process of the target model to the implementation process of the initial model, and the implementation process of the target model comprises at least one of a training process of the target model and an reasoning process of the target model; generating the target model with the plurality of callback components and the plurality of processing components; Determining an implementation mode of the target model in response to a first touch operation acting on a graphical user interface, wherein the implementation mode comprises a training mode and an reasoning mode; Responding to a second touch operation acting on the graphical user interface, and acquiring the model requirement information corresponding to the implementation mode and a data set corresponding to the implementation mode, wherein the data set comprises a training data set and an reasoning data set; Responding to a third touch operation acting on the graphical user interface, selecting the callback components and the processing components based on the model requirement information, splitting the implementation process of the target model into the callback components, inheriting the processing components in a multi-machine multi-card mode, and expanding the implementation process corresponding to the callback components to the implementation process of the initial model respectively so as to generate the target model by utilizing the data set.

Description

Model generation method, storage medium and electronic device Technical Field The application relates to the technical field of computers and artificial intelligence, in particular to a model generation method, a storage medium and electronic equipment. Background In the training process of the large-scale language model (Large Language Model, LLM), a plurality of graphics cards are usually required to work cooperatively, the training process is more complex than the single Zhang Xianka training process, and the training process is easy to be blocked and interrupted. In order to adapt to the multi-machine multi-card training scenario, some bottom training frames are given in the related art for video memory optimization, model segmentation, training acceleration, etc., such as a depth speed (DEEPSPEED) frame, a large-scale model (Megatron) frame, a Distributed DATA PARALLEL, DDP frame, and full-slice data parallel (Fully SHARDED DATA PARALLEL, FSDP) frames. However, these existing training frameworks are used in a different manner, and have high difficulty in use and high framework switching cost in the model training process. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a model generation method, a storage medium and electronic equipment, which at least solve the technical problems of high difficulty in use and high switching cost of a training frame in the training process of the model generation method provided in the related technology. According to one aspect of the embodiment of the application, a model generating method is provided, which comprises the steps of obtaining model demand information, wherein the model demand information is used for determining the implementation demand of a target model corresponding to a target application scene according to the implementation process of a pre-built initial model, the implementation process of the initial model comprises at least one of the training process of the initial model and the reasoning process of the initial model, the implementation demand of the target model comprises at least one of the training demand of the target model and the reasoning demand of the target model, a plurality of callback components to be used and a plurality of processing components to be expanded are selected based on the model demand information, the callback components are used for splitting the implementation process of the target model, the processing components are used for expanding the implementation process of the target model to the implementation process of the initial model, the implementation process of the target model comprises at least one of the training process of the target model and the reasoning process of the target model, and the target model is generated by the callback components and the processing components. According to another aspect of the embodiment of the application, a model generating method is further provided, which comprises the steps of obtaining model demand information through a first processing component, wherein the first processing component is used for providing the implementation demand of a target model, the model demand information is determined according to the implementation process of the initial model, the implementation demand of the target model comprises at least one of the training demand of the target model and the reasoning demand of the target model, the implementation process of the initial model comprises at least one of the training process of the initial model and the reasoning process of the initial model, a plurality of callback components to be used and a plurality of second processing components to be expanded are selected based on the model demand information, the callback components are used for splitting the implementation process of the target model, the second processing components are used for expanding the implementation process of the target model to the implementation process of the initial model, the implementation process of the target model comprises at least one of the training process of the target model and the reasoning process of the target model, and the implementation process of the callback components are respectively expanded to the implementation process of the initial model by adopting a plurality of second processing components in a multi-machine multi-card mode, and the generation of the target model is generated. According to another aspect of the embodiment of the application, a model generation method is further provided, which comprises the steps of obtaining model demand information, wherein the model demand information is used for determining the implementation demand of an E-commerce large model corresponding to an E-commerce application scene according to the implementation process of a general large model, the general large model is a general basic model of a plurality