Search

CN-122019193-A - Model dynamic migration decision-making method based on parallel discrete event scheduling

CN122019193ACN 122019193 ACN122019193 ACN 122019193ACN-122019193-A

Abstract

The application relates to a model dynamic migration decision method based on parallel discrete event scheduling. The method comprises the steps of obtaining multidimensional load data of each model in a distributed system, constructing a mixed prediction model, inputting the multidimensional load data into the trained mixed prediction model, building a mapping model of task quantity and running time for each simulation event scheduler according to output load prediction results, dynamically updating parameters of the mapping model by adopting an incremental learning strategy to obtain the predicted running time of the simulation event scheduler, constructing a dynamic migration model by taking a scheduler running time variance in a minimized engine running process as a target and taking only mounting constraint, self-migration inhibition constraint and migration logic coherence constraint of the model as constraint conditions, and solving the dynamic migration model by taking the predicted running time of each simulation event scheduler as decision basis to output an optimal model migration scheme. By adopting the method, the operation efficiency and the simulation precision of the simulation system can be improved.

Inventors

  • AI CHUAN
  • LIU MENGZHU
  • YIN QUANJUN
  • DAI ZHONGXIANG
  • LI XIN
  • WU HONGPING
  • ZHANG BIN
  • YE TENGFEI

Assignees

  • 中国人民解放军国防科技大学

Dates

Publication Date
20260512
Application Date
20260413

Claims (10)

  1. 1. A model live migration decision method based on parallel discrete event scheduling, the method comprising: The method comprises the steps of acquiring multidimensional load data of each model in a distributed system, wherein the distributed system comprises a plurality of simulation event schedulers, and each model is mounted on one simulation event scheduler; Constructing a hybrid prediction model, training the hybrid prediction model based on historical multidimensional load data, and optimizing model parameters to convergence through a loss function, wherein the hybrid prediction model comprises an input layer, a plurality of long-short-term memory networks and an output layer, and the input layer comprises a plurality of graph convolution networks; Inputting the multidimensional load data into a trained hybrid prediction model, constructing a load sequence through an input layer, carrying out standardization processing to obtain a standard load sequence, establishing graph structure data representing association relations among the models, extracting space topological features among the models through a graph convolution network, mining time dependency relations of the load data through long-term and short-term memory networks, outputting a plurality of time-space joint features, and carrying out weighted fusion on the time-space joint features through an attention mechanism of an output layer to obtain a load prediction result; According to the load prediction result, a mapping model of the task quantity and the running time is established for each simulation event scheduler, and the parameters of the mapping model are dynamically updated by adopting an incremental learning strategy, so that the predicted running time of the simulation event scheduler is obtained; the method comprises the steps of constructing a dynamic migration model by taking a scheduler operation time variance in the operation process of a minimized engine as a target and taking a model unique mounting constraint, a self migration inhibition constraint and a migration logic coherence constraint as constraint conditions; and taking the predicted running time of each simulation event scheduler as a decision basis, solving the dynamic migration model by adopting a simulated annealing algorithm, and outputting an optimal model migration scheme.
  2. 2. The method of claim 1, wherein constructing a load sequence comprises: Based on a sliding time window mechanism, intercepting continuous data in a preset continuous time range from the obtained multidimensional load data of each model step by step, and constructing a load sequence, wherein the length of the sliding time window is set according to the fluctuation characteristic of the system load, the load sequence comprises various heterogeneous task amount data of each model in the corresponding time step, and the heterogeneous task amount data are all non-negative values.
  3. 3. The method of claim 1, wherein the creating graph structure data characterizing the association between models comprises: and constructing a node characteristic matrix by taking the models in the distributed simulation system as nodes and the association relation among the models as edges and the number of the tasks of each model in the corresponding time step standard load sequence, and constructing graph structure data according to the node characteristic matrix and the node correlation matrix of each node, wherein the node correlation matrix is used for calculating and determining the adjacent relation among the models.
  4. 4. The method of claim 1, wherein the number of graph rolling networks and the long and short term memory networks corresponds to the number of time steps of a load sequence.
  5. 5. The method of claim 1, wherein extracting spatial topological features between models through a graph rolling network comprises: Receiving the graph structure data through the graph convolution network, and adding a self-loop structure into an adjacent matrix of the graph structure data to obtain an optimized adjacent matrix; And carrying out convolution operation according to the optimized adjacency matrix, the degree matrix of the graph structure data and the node characteristic matrix, and extracting the space topological characteristics among the models.
  6. 6. The method of claim 1, wherein building a mapping model of task volume versus run time for each simulation event scheduler based on the load prediction results comprises: The mapping model is a linear regression model, various task quantity predicted values of each simulation event scheduler are taken as input feature vectors, the running time of the simulation event scheduler is taken as an output scalar, and the training target is to minimize the deviation between the predicted running time and the actual running time; the linear regression model characterizes the influence degree of different types of tasks on the running time through the weight parameters of the attention mechanism, characterizes the basic running time of the simulation event scheduler through the bias item, and dynamically adapts the time change characteristic of the task quantity according to the long-term and short-term time dependency relationship of the load data output by the mixed prediction model.
  7. 7. The method of claim 1, wherein the model unique mount constraints include that each model must be and can only be mounted on one simulation event scheduler at any time step; in the migration operation of the model, the migration scheduler and the migration scheduler cannot be the same; the migration logic coherence constraint comprises the mounting state of the model in the current time step, and is obtained by updating the mounting state and the migration operation in the last time step.
  8. 8. The method of claim 1, wherein solving the dynamic migration model using a simulated annealing algorithm based on the predicted run time of each simulated event scheduler, and outputting an optimal model migration scheme comprises: Setting initial temperature, cooling rate and iteration times corresponding to each temperature of a simulated annealing algorithm, taking a mounting scheme of each current model on a simulated event scheduler as an initial solution, calculating corresponding scheduler operation time variance based on the initial solution to serve as an initial objective function value, and taking the initial solution as a current optimal solution; Dynamically adjusting the use probability of a greedy strategy according to the current temperature, generating a neighborhood solution of the initial solution in a greedy strategy and random strategy mixed mode, and generating the number of models limiting single migration in the neighborhood solution; calculating the scheduler operation time variance corresponding to the neighborhood solution based on the predicted operation time of each simulation event scheduler, and taking the scheduler operation time variance as the objective function value of the neighborhood solution; judging whether the neighborhood solution is accepted according to a preset criterion, and if so, updating the neighborhood solution into a new current optimal solution; Judging whether the temperature of the current algorithm is reduced to a preset termination temperature, if not, reducing the temperature according to a preset cooling rate, and iteratively updating the optimal solution, and if so, outputting the current optimal solution as an optimal model migration scheme.
  9. 9. The method of claim 8, wherein the predetermined criteria is a Metropolis criteria; if the objective function value of the neighborhood solution is smaller than the objective function value of the current optimal solution, directly receiving the neighborhood solution; if the objective function value of the neighborhood solution is larger than or equal to the objective function value of the current optimal solution, calculating the acceptance probability based on the current algorithm temperature, and determining whether to accept the neighborhood solution or not through random judgment; And if the neighborhood solution is accepted, synchronously updating the objective function value of the neighborhood solution into the objective function value of the new current optimal solution.
  10. 10. The method of claim 9, wherein the determining whether to accept the neighborhood solution by random judgment comprises: Generating a random number, and comparing the random number with the calculated acceptance probability; and if the random number is greater than or equal to the acceptance probability, rejecting the neighborhood solution.

Description

Model dynamic migration decision-making method based on parallel discrete event scheduling Technical Field The application relates to the technical field of artificial intelligence, in particular to a model dynamic migration decision method based on parallel discrete event scheduling. Background With the development of the technology in the field of complex scene simulation, the multi-thread parallel simulation technology is widely applied, can support a plurality of models to run simultaneously, adapts to the dynamic change of tasks in the scene and the data interaction requirement among the models, and provides a basic support for the simulation and analysis of various complex scenes. However, in the existing multithreading parallel simulation process, the load of each running unit presents complex space-time characteristics, on one hand, the load has non-stable fluctuation characteristics in the time dimension, the related calculated amount can be obviously increased in a specific scene, the dynamic change is difficult to accurately capture through a traditional time sequence model, on the other hand, a plurality of models in a simulation system have tight logic association and data interaction to form a complex space topological structure, so that the load is mutually influenced among different running units, the existing load prediction method only focuses on the characteristic extraction of the time dimension, ignores the topological relation among the models, cannot effectively cope with the space-time coupling characteristics of the load, is difficult to output accurate load prediction results, and further causes the lack of reliable basis for the follow-up model migration decision. In addition, the logic dependence relationship among the models further increases the decision difficulty, the data consistency and the logic consistency among the models are required to be ensured in the migration process, if the decision is improper, the simulation result deviation is easy to be caused, the conventional algorithm is difficult to comprehensively balance various factors in a short time, the overall optimal migration decision is made, and the operation efficiency and the simulation precision of the simulation system are restricted. Disclosure of Invention Based on this, it is necessary to provide a model live migration decision method based on parallel discrete event scheduling in order to solve the above technical problems. A model live migration decision method based on parallel discrete event scheduling, the method comprising: The method comprises the steps of acquiring multidimensional load data of each model in a distributed system, wherein the distributed system comprises a plurality of simulation event schedulers, and each model is mounted on one simulation event scheduler; Constructing a hybrid prediction model, training the hybrid prediction model based on historical multidimensional load data, and optimizing model parameters to convergence through a loss function, wherein the hybrid prediction model comprises an input layer, a plurality of long-short-term memory networks and an output layer, and the input layer comprises a plurality of graph convolution networks; Inputting the multidimensional load data into a trained hybrid prediction model, constructing a load sequence through an input layer, carrying out standardization processing to obtain a standard load sequence, establishing graph structure data representing association relations among the models, extracting space topological features among the models through a graph convolution network, mining time dependency relations of the load data through long-term and short-term memory networks, outputting a plurality of time-space joint features, and carrying out weighted fusion on the time-space joint features through an attention mechanism of an output layer to obtain a load prediction result; According to the load prediction result, a mapping model of the task quantity and the running time is established for each simulation event scheduler, and the parameters of the mapping model are dynamically updated by adopting an incremental learning strategy, so that the predicted running time of the simulation event scheduler is obtained; the method comprises the steps of constructing a dynamic migration model by taking a scheduler operation time variance in the operation process of a minimized engine as a target and taking a model unique mounting constraint, a self migration inhibition constraint and a migration logic coherence constraint as constraint conditions; and taking the predicted running time of each simulation event scheduler as a decision basis, solving the dynamic migration model by adopting a simulated annealing algorithm, and outputting an optimal model migration scheme. According to the model dynamic migration decision-making method based on parallel discrete event scheduling, through establishing graph structure data representing the ass