CN-116306897-B - Neural network distributed automatic parallel training method based on AC reinforcement learning

CN116306897BCN 116306897 BCN116306897 BCN 116306897BCN-116306897-B

Abstract

The invention discloses a neural network distributed automatic parallel training method based on AC reinforcement learning. According to the invention, firstly, performance data sampling analysis is carried out on the neural network model based on a performance analysis method, the performance data of a model operator are extracted, and operator grouping fusion is realized. And secondly, realizing the global feature vector representation of the calculation graph of the message transmission, and constructing a state search space for reinforcement learning. And then, adopting an AC algorithm based on time sequence differential sampling to complete iterative optimization of the reinforcement learning model so as to search for an optimal distributed parallel strategy. And finally, constructing a multithreading simulation executor based on event driving, and simulating the calculation process of the neural network model. The invention reduces the state search space scale of the AC reinforcement learning, improves the search performance and the universality of the parallel strategies, reduces the iterative execution time of the strategies, and reduces the hardware execution cost in the iterative search process.

Inventors

LI XIANGGAN
ZENG YAN
REN YONGJIAN
ZHANG JILIN
WAN JIAN

Assignees

杭州电子科技大学

Dates

Publication Date: 20260508
Application Date: 20230206

Claims (9)

1. The distributed automatic parallel training method of the neural network based on the AC reinforcement learning is characterized by comprising the following steps of: step 1, performing performance data sampling analysis on a neural network model by a performance analysis profiling method, extracting multi-dimensional performance data of a model operator, fitting a communication cost function in a real execution environment, and realizing computation graph operator grouping fusion based on computation-communication cost constraint; step 2, according to the original characteristics of the fused calculation graph, realizing calculation graph characteristic coding based on message transmission, namely a state search space; step 3, for the state search space, performing iterative optimization on the reinforcement learning model by adopting an AC algorithm based on time sequence differential sampling to finish the output of an optimal scheduling strategy; step 4, constructing a multithreading simulation executor based on event driving, simulating a neural network model execution process of a real environment, and providing efficient execution and optimization environment; the multi-dimensional performance data in step 1 comprises a calculation cost, a memory cost and a tensor transmission size.
2. The method for distributed automatic parallel training of a neural network based on AC reinforcement learning according to claim 1, wherein in step 1, the grouping rule of the grouping fusion is as follows: Wherein, the And Representing different operator nodes in the computational graph, Representation of Is used for the subsequent nodes of the (c), Representation of Precursor node output of (a); Representation of Subsequent degree of penetration of (a); Representation of And The cost of the fit communication between them, Representation of And (5) calculating the cost on average.
3. The neural network distributed automatic parallel training method based on the AC reinforcement learning as claimed in claim 2, wherein the specific process of the step 2 is as follows: 2.1, extracting original characteristics of the calculation map after operator grouping fusion; 2.2, constructing an original feature vector of the calculation map by utilizing the original features; 2.3, based on the original feature vector, acquiring neighbor information of an operator by utilizing a message transmission mechanism, and realizing global feature representation of the computational graph by capturing local information to construct a state search space for reinforcement learning, namely, computational graph feature coding.
4. A neural network distributed auto-parallel training method based on AC reinforcement learning according to claim 3, characterized in that the raw features in 2.1 include computation cost, magnitude of degree of ingress and egress and magnitude of operator output tensor.
5. The method for distributed automatic parallel training of the neural network based on the AC reinforcement learning according to claim 4, wherein the specific process of the step 3 is as follows: 3.1 constructing a State search space for reinforcement learning In the reinforcement learning iteration process, the Agent interacts with the environment to change the state of the current calculation graph, namely the original feature vector in the step 2, and the different original feature vectors are subjected to graph feature coding to form a reinforcement learning state search space; 3.2, realizing a Markov decision process, and dividing the Agent into an Actor Agent and a critique Agent CRITIC AGENT in a single-step iteration process based on a k-step time sequence differential learning mode; The Actor Agent realizes a time sequence difference parameter iteration updating algorithm by sampling the values of a plurality of groups of adjacent states and rewards of state conversion, and the parameter iteration formula is as follows: , Wherein, the Representation of The state of the time-of-day environment, A policy network representing the current Actor Agent, Is at At the moment, the Actor Agent makes actions based on the current strategy; is a parameter of the policy network; The entropy regularization term is used for improving the exploration capability of the Actor Agent; then it is the dominance function, indicating that in the current state Down, act Is a dominant size of (2); is a parameter of CRITIC AGENT's value network; the Critic itself performs network parameter iteration by bootstrap bootstrapping by using the mean square error of two adjacent state values as a loss function; And 3.3, inputting the parallel strategy into a simulation execution engine for simulation execution, outputting an execution rewarding value for iterative optimization of the Actor and Critic, and realizing the search of the automatic optimal parallel strategy.
6. The method for distributed automatic parallel training of neural network based on AC reinforcement learning according to claim 5, wherein in 3.2, said Actor is a feed-forward neural network with SoftMax layer, responsible for policy iterative optimization and based on current action probability distribution Output action ; The Critic is a multi-layer perceptron MLP network, and outputs value estimation of two states aiming at two adjacent states before and after the action The value estimate for this state represents the value magnitude of the current state, with higher values representing better current actions.
7. The method of claim 6, wherein in 3.2 the merit function is constructed based on the estimated state value of CRITIC AGENT's value network and the prize value of state transition, wherein the prize value is defined by the following formula: Wherein, the Representing the current action The prize to be achieved is a prize, Representing the execution time of the current policy, Representing the upper limit of the memory of the hardware device, Representing the memory consumption of the current policy, Representing a penalty factor, the current strategy adds a memory penalty term if the memory consumption is greater than the upper memory limit of the device.
8. The method for distributed automatic parallel training of the neural network based on the AC reinforcement learning according to claim 7, wherein the specific process of the step 4 is as follows: 4.1, aiming at the calculation cost and the communication cost of an execution main body in the execution of the neural network model, constructing an equipment execution queue and an equipment communication queue, and simulating the calculation and the communication process of an operator in the neural network model; 4.2, constructing an event queue by utilizing various event mechanisms, and realizing interaction between an event-driven equipment execution queue and an equipment communication queue; and 4.3, traversing states of three queues, namely an equipment execution queue, an equipment communication queue and an event queue, and finishing the training by simulating the execution of the neural network model when the queue states are empty.
9. The method for distributed automatic parallel training of a neural network based on AC reinforcement learning of claim 8, wherein the event mechanism comprises a computation event, a communication event, and a topology refresh event in 4.2.

Description

Neural network distributed automatic parallel training method based on AC reinforcement learning Technical Field The invention belongs to the field of large-scale complex neural network parallel training, and particularly relates to a neural network distributed automatic parallel training method based on AC reinforcement learning. Technical Field With the continuous acceleration of deep learning research and innovation, deep learning models are widely applied in Computer Vision (CV), natural language processing (Natural Language Processing, NLP), search recommendation and other scenes. The deep learning model is characterized in that a hierarchical neural network with a complex structure, such as Bert (the Bidirectional transformer language model BERT) network, is used and is constructed by a combinable module of an encoder, a decoder, an attention mechanism and the like, and the CNN network is constructed based on components of a convolution layer, a pooling layer and the like. Neural networks have proven to have significant combinatorial expansibility, and can improve prediction accuracy by training large-scale model parameters through larger-scale data sets. However, since single-device resources are limited, large-batch input data and complex Model parameters cannot be processed, in order to train such complex neural networks effectively, a large-scale data set needs to be segmented and scheduled, namely data Parallel (DATA PARALLEL), or a neural network Model is segmented, scheduled and executed across multiple devices, namely Model Parallel (Model Parallel), so that the performance of the computing device is fully utilized. Currently, there are many frameworks such as TensorFlow, pytorch, mindSpore, etc. that can be used for distributed training. However, the existing method is mainly based on expert experience to manually search the parallel strategy, requires developers to have professional knowledge such as AI, distributed computation, architecture and the like and the capability of performing professional selection in the knowledge of the fields, and is very difficult to manually search the optimal parallel strategy, so that in order to simplify the design and implementation of the neural network model parallel method, the universality of the parallel strategy design is improved, and the industry starts to research the automatic parallel training method of the neural network to realize the automatic search and optimization of the distributed parallel strategy. In recent years, reinforcement learning has been highlighted in complex decision-making problems such as games, automatic driving and the like, and has reached or even exceeded the decision-making level of human beings, so that reinforcement learning becomes the research focus of parallel strategy automatic search. Google proposes a Hierachical method for the first time, by extracting neural network models and cluster features, and using reinforcement learning (Reinforcement Learning, RL) to guide model parallel strategy search, the method needs frequent sampling and has large search space, and the strategy search process is expensive, so that compared with the model parallel method based on expert experience, the performance improvement is limited. Gao et al propose Spotlight to model the neural network operator scheduling problem for the first time as a markov decision process (Markov decision process, MDP). However, the method is only effective on a specific network model, and when a new network model is encountered, the parallel strategy needs to be searched again, and the method does not have the capability of being transplanted to other similar networks, and for different network models, the design and implementation cost of the parallel strategy is still high. In order to solve the problem, addanki et al propose Placeto to give a parallel strategy portability capability by introducing a graph embedding coding method, avoid repeated training of similar unknown networks, autoMap proposed by Wang Siyu et al performs automatic parallel strategy searching based on a XLA-IR graph with finer granularity, but has low sampling efficiency of the DQN used and needs to perform a large amount of historical experience storage, so that the overall execution efficiency is not high, and a hundred-degree laboratory combines a pipeline technology with a reinforcement learning method to realize coarse-granularity Layer-level scheduling, so as to improve the training throughput, reduce the training cost of a model, but still realize reinforcement learning based on whole-process sampling of monte carlo, the sampling method has low efficiency, and under the condition that the complexity of a neural network model is doubled, the single-round sampling efficiency is doubled, the model convergence rate is lowered, and the model is sunk into a locally optimal solution, so that the continuously expanded neural network model is difficult to deal with. Disclosure o