CN-121995933-A - Multi-unmanned aerial vehicle path planning obstacle avoidance method and system based on Rainbow DQN algorithm
Abstract
The invention provides a multi-unmanned aerial vehicle path planning obstacle avoidance method and system based on a Rainbow DQN algorithm, and the technical scheme adopted by the invention is that firstly, a multi-unmanned aerial vehicle system model is constructed, and a training data set is initialized by utilizing track fragments generated by a reciprocal collision avoidance algorithm and multi-step time sequence differential target values; in the training process, priority sampling is adopted to carry out importance weighting on samples, a attention mechanism is introduced to carry out enhancement processing on state characteristics, a local value network of each unmanned aerial vehicle is built by combining an improved depth Q network, the local values are fused into global values through a mixed value network, and finally, joint training is carried out based on a global loss function of a correction weight, so that collaborative path planning and obstacle avoidance of multiple unmanned aerial vehicles are realized. The invention can solve the technical problems of low collaborative obstacle avoidance efficiency, large value estimation deviation and unstable training of multiple unmanned aerial vehicles in the complex dynamic environment in the prior art.
Inventors
- CAO HAILIN
- Deng Haoning
- YANG LISHENG
Assignees
- 重庆大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260128
Claims (10)
- 1. A multi-unmanned aerial vehicle path planning obstacle avoidance method based on a Rainbow DQN algorithm is characterized by comprising the following steps: S1, constructing a system model containing a plurality of unmanned aerial vehicles and environmental information; step S2, defining a local state vector and a global state vector of the unmanned aerial vehicle; S3, generating a collision-free running track of the multi-unmanned aerial vehicle, calculating the target value of the track segment by utilizing a multi-step time sequence difference formula, and storing a training sample containing a local state vector, a global state vector and the target value into a training data set; s4, sampling the training samples from the training data set, and respectively carrying out sample importance weighting and state feature enhancement on the training samples to obtain a correction weight and an enhanced state vector; s5, building a local value network for each unmanned mechanism, and outputting local action values by using the enhanced state vectors; s6, constructing a mixed value network, receiving the local action values of all unmanned aerial vehicles, and combining the global state vectors to fuse the local action values into global values; And S7, calculating a global loss function by using the correction weight, and carrying out joint training on the local value network and the mixed value network.
- 2. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein the step S1 specifically comprises the following steps: establishing a three-dimensional coordinate system as an environmental space, and defining the spatial attributes of terrain, static barriers and dynamic barriers; Defining a collision avoidance region for each unmanned aerial vehicle, wherein the collision avoidance region is a spherical region taking the unmanned aerial vehicle as an origin and taking a preset collision avoidance threshold value as a radius; And defining a local perception range of each unmanned aerial vehicle, and acquiring relative information of the unmanned aerial vehicle and surrounding environment.
- 3. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein the step S2 specifically comprises the following steps: The local state vector comprises the position, speed and flying azimuth angle of the unmanned aerial vehicle, and the relative distance, relative speed included angle and collision risk coefficient between the unmanned aerial vehicle and the adjacent unmanned aerial vehicle and between the unmanned aerial vehicle and the obstacle in the local perception range; The global state vector comprises three-dimensional position coordinates of all unmanned aerial vehicles in an environment space and distance information of each unmanned aerial vehicle from a target point of the unmanned aerial vehicle; The calculation formula of the collision risk coefficient is as follows: Wherein, the Representing the risk factor of the person, The relative distance is indicated as such, Indicating the relative velocity angle.
- 4. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein step S3 specifically comprises the following steps: Defining the number of unmanned aerial vehicles and target points in an environment space, calling a reciprocal collision avoidance algorithm to generate collision-free running tracks of a plurality of unmanned aerial vehicles, and taking the running tracks as initial training samples; Defining a step length n of multi-step time sequence differential learning, and decomposing the initial training sample into continuous track fragments containing n time steps; Constructing an experience playback pool, storing the continuous track segments, and analyzing each track segment into a state vector sequence in n time steps; calculating a target value of the trajectory segment using a multi-step time series differential formula, the multi-step time series differential formula expressed as: Wherein, the Representing the target value of the time step t, To the point of Representing from time step t to Is a real-time reward of (1), Representing the discount factor(s), Representing target network versus time step Status of Value assessment of (2); And forming a training sample by the local state vector, the global state vector and the target value corresponding to the track segment, and storing the training sample into a training data set.
- 5. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein in step S4, the training samples are weighted for sample importance, and the specific steps include: constructing an experience playback pool supporting priority, and storing training samples and corresponding time sequence differential errors and priorities of the training samples, wherein the time sequence differential errors of initial training samples are set to be a preset maximum value; Calculating the sampling priority of the samples by utilizing the time sequence differential error, and extracting batches of samples according to the sampling priority; the calculation formula of the sampling priority is expressed as follows: Wherein P (i) represents sampling probability, δi represents time sequence differential error of the ith training sample, and α represents priority weight coefficient; Calculating importance sampling weights of the batch of samples for correcting gradient update step sizes in subsequent steps; The calculation formula of the importance sampling weight is as follows: Wherein, the Representing the importance sampling weight of the ith sample, N represents the total sample capacity of the experience playback pool, Representing the sample sampling probability or normalized priority, A parameter representing the balance priority deviation, Representing the largest weight value in the current lot.
- 6. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein in step S4, the training samples are subjected to state feature enhancement, and the specific steps include: Calculating the attention weight among the unmanned aerial vehicle, the neighbor unmanned aerial vehicle and the obstacle, wherein the attention weight is calculated by adopting exponential decay based on distance and weight decay based on collision risk coefficient; and weighting the environment interaction information in the local perception range by using the attention weight, and splicing the weighted characteristics with the motion state of the unmanned aerial vehicle to obtain the enhanced state vector.
- 7. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein in step S5, the local value network adopts an improved deep Q network architecture, and comprises a main network and a target network which are the same in structure; the main network and the target network are configured with the following structures: The competition network is used for splitting the output of the network full-connection layer into a state value branch V(s) and a dominant value branch A (s, a), and calculating the action value according to the following formula: Wherein, the The value of the action is indicated, The value of the state is represented by, Representing the dominant value of acquisition action a in state s, Representing the average of all action dominance values; The noise network is used for introducing Gaussian noise parameters into the full-connection layer, and realizing an exploration strategy to replace a greedy strategy through the randomness of the network parameters; a distributed value estimation for discretizing the output of the action value into a probability distribution comprising a number of value atoms; and in the training process, selecting the optimal action in the current state by using the main network, and evaluating the value distribution of the optimal action by using the target network.
- 8. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein step S6 specifically comprises; Constructing QMIX a mixed value network, wherein the network comprises a mixed layer for inputting local action values of each unmanned aerial vehicle and a super network for inputting global state vectors; according to global state vector through the super network Generating non-negative weights Bias term ; And the local action values [ Q1, Q2, ], qn ] of all unmanned aerial vehicles are fused into a global value by utilizing the mixing layer, and the calculation formula is as follows: Wherein, the Representing the global value of the device, Representing the local value of the ith unmanned aerial vehicle, sglobal representing the global state; The global target value network with the same structure as QMIX mixed value network is constructed and used for calculating the global target value, and the calculation formula is as follows: Wherein, the The representation is made of a combination of a first and a second color, Indicating a global reward that is to be offered, Representing the discount factor(s), Representing a multi-step learning step size, Representing global target value network according to next state Optimal action combination And (5) calculating an evaluation value.
- 9. The multi-unmanned aerial vehicle path planning obstacle avoidance method based on the Rainbow DQN algorithm according to claim 1, wherein step S7 specifically comprises: Constructing a global loss function, and weighting the prediction error of the global value according to the correction weight; Simultaneously updating parameters of the local value network and the mixed value network by minimizing the global loss function; the global loss function is expressed as: Wherein, the Representing the number of samples of the training batch, The index of the sample is represented and, Representing the importance sampling weight of the i-th sample, The mean square error function is represented as a function of the mean square error, Representing global target value of the ith sample , Representing global value of the ith sample 。
- 10. Many unmanned aerial vehicle route planning keep away barrier system based on rain DQN algorithm, characterized by comprising: The environment modeling module is used for constructing a system model containing a plurality of unmanned aerial vehicles and environment information and defining local state vectors and global state vectors of the unmanned aerial vehicles; The track generation and sample initialization module is used for generating a collision-free running track of the multi-unmanned aerial vehicle, calculating the target value of the track segment by utilizing a multi-step time sequence difference formula, and storing a training sample containing a local state vector, a global state vector and the target value into a training data set; The priority sampling and feature enhancing module is used for sampling the training samples from the training data set, and respectively carrying out sample importance weighting and state feature enhancement on the training samples to obtain correction weights and enhanced state vectors; The local value network construction module is used for constructing a local value network for each unmanned mechanism and outputting local action values by utilizing the enhanced state vectors; The global value mixed training module is used for constructing a mixed value network, receiving the local action values of all unmanned aerial vehicles, combining the global action values with the global state vector to be fused into a global value, calculating a global loss function by using the correction weight, and carrying out combined training on the local value network and the mixed value network.
Description
Multi-unmanned aerial vehicle path planning obstacle avoidance method and system based on Rainbow DQN algorithm Technical Field The invention relates to the technical field of unmanned aerial vehicle path planning, in particular to a multi-unmanned aerial vehicle path planning obstacle avoidance method and system based on a Rainbow DQN algorithm. Background Along with the rapid development of unmanned aerial vehicle technology, the application of the multi-unmanned aerial vehicle cooperative system in the fields of agricultural plant protection, post-disaster rescue, logistics distribution, military reconnaissance and the like is increasingly wide. In these complex task scenarios, multiple unmanned aerial vehicles need to avoid static terrain and dynamic obstacles in real time while meeting the requirement of collaborative operation, and avoid mutual collision inside unmanned aerial vehicle clusters. Therefore, an efficient and robust path planning and obstacle avoidance algorithm is a key for realizing autonomous cooperation of multiple unmanned aerial vehicles. The existing multi-unmanned aerial vehicle path planning method mainly comprises a traditional planning algorithm (such as an artificial potential field method and an ORCA algorithm) and a method based on deep reinforcement learning. Although the traditional algorithm has small calculation amount, when facing a complex dynamic environment, the traditional algorithm tends to be easy to fall into a local optimal solution, so that a path is oscillated or deadlocked, and complex cooperation relation among multiple machines is difficult to process. In recent years, deep reinforcement learning-based methods (such as DQN, madppg, etc.) have been focused on due to their strong perceptions and decision-making capabilities, but many challenges remain in practical applications: Firstly, the multi-unmanned aerial vehicle system is a typical high-dimensional state space and an unsteady state environment, the traditional independent reinforcement learning method ignores the mutual influence among intelligent bodies, so that the training is unstable and difficult to converge, secondly, the traditional reinforcement learning algorithm generally adopts a random exploration strategy, the problems of high blindness, high trial-and-error cost and low convergence speed exist at the initial stage of training, and the utilization rate of sparse but critical samples such as collision edges, short-distance obstacle avoidance and the like is low, so that the obstacle avoidance success rate is insufficient. Furthermore, when the dense cluster flies, the unmanned aerial vehicle needs to process a large amount of neighbor information, the existing network structure tends to weight all perceived information evenly, key targets with the greatest threat to the unmanned aerial vehicle cannot be automatically focused from redundant information, so that decision delay is caused, finally, the basic DQN algorithm has the problem of over-estimation of value, the single scalar value cannot accurately reflect the randomness distribution of the environment, and the superiority of strategies is limited. Therefore, how to design a path planning method which can ensure individual independent obstacle avoidance safety and realize efficient group cooperation and has high training efficiency and value estimation precision is a problem to be solved currently. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a multi-unmanned aerial vehicle path planning obstacle avoidance method based on a Rainbow DQN algorithm, which aims to solve the technical problems of low collaborative obstacle avoidance efficiency, large value estimation deviation and unstable training of the multi-unmanned aerial vehicle in the prior art under a complex dynamic environment. The method comprises the steps of firstly constructing a multi-unmanned aerial vehicle system model, initializing a training data set by utilizing track fragments generated by a reciprocal collision avoidance algorithm and multi-step time sequence difference target values, weighting importance of samples by adopting priority sampling in the training process, introducing an attention mechanism to enhance state characteristics, constructing a local value network of each unmanned aerial vehicle by combining an improved depth Q network, fusing each local value into a global value through the mixed value network, and finally carrying out joint training based on a global loss function of correction weight to realize collaborative path planning and obstacle avoidance of the multi-unmanned aerial vehicle. In a first implementation manner, a multi-unmanned aerial vehicle path planning obstacle avoidance method based on a rain DQN algorithm is provided, including: S1, constructing a system model containing a plurality of unmanned aerial vehicles and environmental information; step S2, defining a local state vector and a glob