CN-121979693-A - Model training method, system, medium and product based on artificial intelligence

CN121979693ACN 121979693 ACN121979693 ACN 121979693ACN-121979693-A

Abstract

A model training method, system, medium and product based on artificial intelligence relates to the field of data processing system or method specially suitable for administrative, business, finance, management, supervision or prediction purposes, in which a plurality of ordered logic execution sequences of model training tasks are analyzed and extracted, tasks with the same prepositive continuous logic units are aggregated into a training task group, the prepositive units are used as a common base line workflow to be separated from each task, and the rest independent training logic of each task is converted into a personalized derivative workflow. And allocating the first computing resource to perform single centralized execution on the common baseline workflow, and generating a reference context slice. And distributing a second computing resource for each derivative workflow, distributing the reference context slice for environment reconstruction, and executing each personalized derivative workflow in parallel, thereby completing concurrent training of multiple tasks. The application reduces the computational redundancy consumption caused by the training task with high similarity and improves the overall efficiency of task scheduling.

Inventors

WANG YIRU

Assignees

四川边缘算力科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260409

Claims (10)

1. A model training method based on artificial intelligence, comprising: Acquiring a plurality of artificial intelligent model training tasks submitted concurrently in a preset time window, analyzing each artificial intelligent model training task, and extracting an ordered logic execution sequence formed by a plurality of network construction logic units and model training logic units corresponding to each artificial intelligent model training task; Aggregating the artificial intelligent model training tasks with the same prepositive continuous logic units in all the ordered logic execution sequences into a training task group; defining the same preposed continuous logic unit as a common baseline workflow of the training task group, wherein the common baseline workflow characterizes common artificial intelligence business logic of unified training sample set loading and pre-training neural network model initialization; for each artificial intelligent model training task in the training task group, separating the common baseline workflow from the corresponding ordered logic execution sequence, and converting the rest logic units in the ordered logic execution sequence into personalized derivative workflow, wherein the personalized derivative workflow is independent training logic comprising personalized super-parameter configuration and network layer parameter fine adjustment; Distributing a first computing resource for the common baseline workflow for single centralized execution, and determining the produced intermediate state operation data as a reference context slice when the execution of the common baseline workflow is finished, wherein the reference context slice is a persistence data packet of a network topological structure and an initial weight matrix of the pre-trained neural network model; Respectively distributing second computing resources for each personalized derivative workflow in the training task group; And distributing the reference context slice to each second computing resource for model loading and environment reconstruction, and executing each personalized derivative workflow in parallel based on the personalized hyper-parameter configuration and network layer parameter fine adjustment in the reconstructed environment so as to complete each artificial intelligent model training task.
2. The method according to claim 1, wherein the aggregating the artificial intelligence model training tasks with the same pre-continuum logic units into a training task group in all the ordered logic execution sequences specifically comprises: performing bit-by-bit comparison on all the ordered logic execution sequences, and outputting a sequence comparison result; Analyzing the sequence comparison result, and extracting bifurcation position information of the first occurrence of logic unit difference; Intercepting the ordered logic execution sequence according to the bifurcation position information to generate a common prefix segment; determining the common prefix fragment as the same pre-continuum logic unit; And aggregating the artificial intelligent model training tasks comprising the same prepositive continuous logic units into the training task group.
3. The method according to claim 1, wherein the aggregating the artificial intelligence model training tasks with the same pre-continuum logic units into a training task group in all the ordered logic execution sequences specifically comprises: extracting head execution sequences of all the ordered logic execution sequences to generate an initial prefix fragment set; calculating the logic feature similarity between each head execution sequence in the initial prefix fragment set to obtain a similarity matrix; filtering the similarity matrix according to a preset similarity threshold value, and screening out a target equivalent fragment; determining the target equivalent segment as the same preposed sequential logic unit; And aggregating the artificial intelligent model training tasks comprising the same prepositive continuous logic units into the training task group.
4. The method of claim 1, wherein after said completing each of said artificial intelligence model training tasks, said method further comprises: extracting trained network layer parameters obtained after the artificial intelligent model training tasks are completed; calculating the difference value of the initial weight matrix in the trained network layer parameter and the reference context slice to obtain a personalized parameter difference value set corresponding to each artificial intelligent model training task; Loading the network topological structure and the initial weight matrix in the reference context slice into a memory to obtain a shared basic model instance; Establishing an association relation between task identifiers of the artificial intelligent model training tasks and the corresponding personalized parameter difference value sets, and generating a task-parameter mapping table; Receiving a business data processing request carrying a target task identifier, inquiring the task-parameter mapping table according to the target task identifier, and determining a corresponding target parameter difference value set; And dynamically loading the target parameter difference value set into the shared basic model instance, processing the service data processing request through the loaded shared basic model instance, and outputting a service processing result.
5. The method according to claim 4, wherein dynamically loading the target parameter difference set into the shared basic model instance, and processing the service data processing request through the loaded shared basic model instance, and outputting a service processing result, specifically includes: analyzing the target parameter difference value set, and determining a target network layer to be updated and a corresponding weight difference value matrix in the shared basic model instance; Inputting the business data processing request into the shared basic model instance; When the shared basic model instance runs to the target network layer, parallel matrix calculation is carried out on the input data of the current layer based on the initial weight matrix and the weight difference matrix respectively, so as to obtain basic characteristic data and incremental characteristic data; and adding and fusing the basic feature data and the incremental feature data to obtain output data of the target network layer so as to output the service processing result through the shared basic model instance.
6. The method of claim 4, wherein after said processing said business data processing request by said loaded shared base model instance and outputting a business processing result, said method further comprises: Acquiring real service feedback data aiming at the service processing result, and calculating a service loss value based on the service processing result and the real service feedback data; Under the condition that the memory data of the shared basic model instance is kept in a read-only locking state, back propagation is carried out based on the service loss value, and an incremental gradient matrix is obtained through calculation; Updating the target parameter difference value set in the task-parameter mapping table by using the incremental gradient matrix to obtain an updated target parameter difference value set; and when receiving a next service data processing request carrying the target task identifier, dynamically loading the updated target parameter difference value set into the shared basic model instance for processing.
7. The method according to claim 6, wherein updating the target parameter difference set in the task-to-parameter mapping table with the incremental gradient matrix results in an updated target parameter difference set, and specifically comprises: Writing the calculated increment gradient matrix into a gradient accumulation buffer area independent of the current business processing thread; detecting whether the gradient accumulation times aiming at the target task identification in the gradient accumulation buffer zone reach a preset updating period threshold value or not; If the preset updating period threshold is reached, merging the accumulated gradient matrix in the gradient accumulated buffer area into the target parameter difference value set according to a preset optimizer algorithm, and emptying the gradient accumulated buffer area; and if the preset updating period threshold is not reached, keeping the target parameter difference value set unchanged.
8. An artificial intelligence based model training system, the system comprising: One or more processors and memory coupled with the one or more processors, the memory to store computer program code, the computer program code comprising computer instructions that the one or more processors invoke to cause the system to perform the method of any of claims 1-7.
9. A computer readable storage medium comprising instructions which, when run on a system, cause the system to perform the method of any of claims 1-7.
10. A computer program product, characterized in that the computer program product, when run on a system, causes the system to perform the method according to any of claims 1-7.

Description

Model training method, system, medium and product based on artificial intelligence Technical Field The application belongs to the field of data processing systems or methods specially suitable for administrative, commercial, financial, management, supervision or prediction purposes, and particularly relates to a model training method, system, medium and product based on artificial intelligence. Background Along with the rapid development of artificial intelligence technology, various model training platforms based on artificial intelligence are widely applied to education practical training and scientific research scenes so as to assist learners in mastering model development skills. The related art generally adopts an integrated artificial intelligence education and training platform technology based on a cloud architecture. According to the technology, a plurality of mainstream algorithm frames are deployed in advance in a cloud server, a public data set subjected to basic cleaning is stored in a concentrated mode, a learner only needs to log in a platform through a browser, select a required data set and algorithm frames and submit training tasks, the platform can allocate corresponding cloud computing resources (such as a GPU) for the tasks to perform independent model training, and a basic model version recording function is provided. The technology enables a learner to break away from the limitation of local hardware, concentrate on the design of model logic, reduce the threshold of model development and improve the practical training efficiency of the foundation. However, in the training course of hundred-person artificial intelligence developed in a university, many students often perform model training around the same topic, and training tasks submitted by the students often have higher similarity. When these students intensively submit training tasks within the same time period, the related art platform management system generally regards the task of each student as an independent individual, allocates computing resources for each task in the order of submission and performs a complete training process, which increases the redundancy consumption of the computing power of the underlying GPU cluster. With the increase of the concurrent task quantity, the redundancy calculation can reduce the turnover rate of calculation resources, so that the queuing waiting time of a large number of student training tasks is increased, and the overall teaching efficiency of the training course and the utilization rate of platform computing power resources are reduced. Disclosure of Invention The application provides a model training method, a system, a medium and a product based on artificial intelligence, which are used for reducing the computational redundancy consumption caused by high-similarity training tasks and improving the overall efficiency of task scheduling. In a first aspect, the application provides an artificial intelligence-based model training method, which comprises the steps of acquiring a plurality of artificial intelligence model training tasks submitted concurrently in a preset time window, analyzing each artificial intelligence model training task, and extracting an ordered logic execution sequence which corresponds to each artificial intelligence model training task and consists of a plurality of network construction logic units and model training logic units; Aggregating artificial intelligent model training tasks with the same prepositive continuous logic units in all ordered logic execution sequences into a training task group; The same prepositive continuous logic unit is defined as a common baseline workflow of the training task group, and the common baseline workflow characterizes common artificial intelligent business logic of unified training sample set loading and pre-training neural network model initialization; Aiming at each artificial intelligent model training task in the training task group, separating a common baseline workflow from a corresponding ordered logic execution sequence, and converting the rest logic units in the ordered logic execution sequence into a personalized derivative workflow, wherein the personalized derivative workflow is an independent training logic comprising personalized super-parameter configuration and network layer parameter fine adjustment; Distributing a first computing resource for the common baseline workflow for single centralized execution, and determining the produced intermediate state operation data as a reference context slice when the execution of the common baseline workflow is finished, wherein the reference context slice is a persistent data packet of a network topological structure and an initial weight matrix, and the network topological structure and the initial weight matrix are packaged with a pre-training neural network model; Respectively distributing second computing resources for each personalized derivative workflow in the