CN-121977554-A - Unmanned aerial vehicle autonomous path planning method based on gray wolf optimization and double-depth Q network
Abstract
The invention discloses an unmanned aerial vehicle autonomous path planning method based on a wolf optimization (GWO) and a double-depth Q network (DDQN), and belongs to the technical field of unmanned aerial vehicle control. Firstly, aiming at an unknown complex environment, establishing a Markov Decision Process (MDP) model of unmanned plane path planning, designing a composite rewarding function fusing boundary constraint, obstacle avoidance constraint and target guidance, and secondly, constructing a fusion self-adaption And finally, realizing the autonomous path planning of the unmanned aerial vehicle by simulation experiments and real environment experiments of different complexity environments based on a MATLAB simulation platform and an indoor optical positioning unmanned aerial vehicle experiment platform. The method can realize safe and efficient path planning of the unmanned aerial vehicle in an unknown complex environment, reduce path redundancy, improve environmental adaptability, and is suitable for autonomous operation of the unmanned aerial vehicle in multiple scenes such as topographic mapping, ocean monitoring, security inspection and the like.
Inventors
- TANG XIAOMING
- LI HAOLAN
- ZHANG KAIBI
- CAI LINQIN
- LI ZHEXING
- HU TAO
- Qin jiale
Assignees
- 重庆邮电大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260106
Claims (9)
- 1. An unmanned aerial vehicle autonomous path planning method based on a wolf optimization and a double-depth Q network is characterized by comprising the following steps: Step one, establishing a Markov decision process MDP model of unmanned plane path planning, and defining a state space of the model Space of action Reward function Probability of state transition ; Step two, designing a composite rewarding function Wherein Is awarded for boundary, Awarding a prize to the obstacle, The unmanned aerial vehicle is respectively restrained from crossing the boundary, avoiding obstacles and guiding the unmanned aerial vehicle to fly towards the target point for target rewarding; Step three, an improved dual-depth Q network DDQN is constructed, a dual mechanism of a current network and a target network is introduced, the current network is responsible for selecting actions in real time, Q values of all actions are output based on the current environment state, the target network is responsible for evaluating the target Q value of the action selected by the current network, and a GWO gray wolf optimization algorithm and self-adaption are fused in the action selection stage -A greedy strategy; Step four, constructing a 3D environment data set containing random obstacles with different complexity, training an improved DDQN model by using the data set, and updating network parameters by a gradient descent method to obtain an optimal path planning strategy; And fifthly, deploying the trained strategy on the unmanned aerial vehicle with the functions of autonomous positioning and environment sensing, wherein the unmanned aerial vehicle autonomously executes path planning by taking a task starting point as an initial position and a task ending point as a target position, and generating a safe and feasible optimal flight path.
- 2. The unmanned aerial vehicle autonomous path planning method based on the wolf optimization and the double-depth Q network according to claim 1, wherein the MDP model in the step one is defined as follows: State space The method is characterized in that the current three-dimensional coordinates (x, y, z) of the unmanned aerial vehicle and the distribution information of environmental barriers are represented, and the task area size is 20m multiplied by 20m, x, y and z epsilon [0,20]; Action space The unmanned aerial vehicle comprises 8 motion directions of east, north, west, south, northeast, northwest, southeast and southwest, and the flying speed of the unmanned aerial vehicle is kept constant; Probability of state transition The expression is Describing the probability of transition from state s to s' after performing action a, wherein Probability attributes for identifying subsequent events in brackets, Representing the environment state corresponding to the next moment; Reward function Awarding by boundary Obstacle rewards Target rewards Linear superposition.
- 3. The unmanned aerial vehicle autonomous path planning method based on the wolf optimization and the double-depth Q network according to claim 1, wherein the specific expression of the compound rewarding function in the second step is as follows: Wherein, the 、 Boundary rewarding coefficient and obstacle rewarding coefficient are respectively selected, and the value range is 。 The value range is that for the target rewarding coefficient 。
- 4. The unmanned aerial vehicle autonomous path planning method based on the wolf optimization and the double-depth Q network according to claim 1, wherein the construction of the improved DDQN in the third step comprises the following steps: The network structure is a Deep Neural Network (DNN) composed of an input layer, a plurality of hidden layers and an output layer, wherein the input is environmental state information, and the input is a Q value corresponding to each action; dual network mechanism current network parameterization is expressed as For action selection, target network parameterization expressed as For action evaluation, the target network parameter update formula is: wherein Is the parameter of the ultrasonic wave to be used as the ultrasonic wave, For the updated parameters of the current network, Current parameters of the target network; the loss function adopts a mean square error loss function, and the expression is as follows: Wherein the method comprises the steps of For the purpose of instant rewards, As a discount factor, the number of times the discount is calculated, For the output of the target network at the next moment, Representing the desire to square the output difference of the current network and the target network, And representing the corresponding action when the current network Q value is maximum.
- 5. The unmanned aerial vehicle autonomous path planning method based on the wolf optimization and the double-depth Q network according to claim 1, wherein the GWO algorithm in the third step specifically comprises: dynamically updating a value formula: wherein Is an initial greedy factor that is used to determine the strength of the signal, As a final greedy factor, Is the attenuation rate; action selection logic when Randomly explore actions when When generating actions through GWO, when Selecting an action corresponding to the maximum Q value of the current network; GWO algorithm application by 、 、 Optimum position guidance of wolves Wolf update action, the position update formula is: Wherein the method comprises the steps of 、 、 Respectively is 、 、 The corresponding action position of the wolf is that, 、 、 、 、 、 Is the random coefficient of the interval of [0,1], Respectively represent Wolf shape 、 And Step size when wolf moves.
- 6. The unmanned aerial vehicle autonomous path planning method based on the wolf optimization and the double-depth Q network according to claim 1, wherein the specific process of model training in the fourth step comprises the following steps: Experience playback pool capacity 10000, batch size 64, learning rate ; The training iteration times are 3000 rounds, and the unmanned aerial vehicle is crashed, out of range or arrives at the target point in each round of training; The data set contains 4 complexity environments, the number of barriers is sequentially increased, and a starting point and an ending point are randomly generated.
- 7. The unmanned aerial vehicle autonomous path planning method based on the wolf optimization and the double-depth Q network according to claim 1, wherein the specific requirements of unmanned aerial vehicle deployment in the fifth step include: The unmanned aerial vehicle is configured with an autonomous positioning function (supporting GPS/Beidou positioning), an environment sensing function (carrying a vision sensor) and a flight control module; and (3) path execution, namely loading a trained path planning strategy by the unmanned aerial vehicle, sensing an environment state in real time, outputting an optimal action through the improved DDQN, generating a continuous flight path, and avoiding obstacles and boundaries in the whole course until reaching a target point.
- 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the method for autonomous path planning for an unmanned aerial vehicle based on a wolf optimization and dual depth Q network of any one of claims 1 to 7.
- 9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method for autonomous path planning of an unmanned aerial vehicle based on a wolf's optimization and dual depth Q network according to any one of claims 1 to 7.
Description
Unmanned aerial vehicle autonomous path planning method based on gray wolf optimization and double-depth Q network Technical Field The invention relates to the technical field of unmanned aerial vehicle path planning, in particular to an unmanned aerial vehicle autonomous path planning method integrating Gray Wolf Optimization (GWO) and a double-depth Q network (DDQN), which is suitable for an unknown complex environment containing random obstacles. Background With the rapid development of unmanned aerial vehicle technology, the unmanned aerial vehicle has been widely applied to various fields such as topographic mapping, ocean monitoring, security inspection, agricultural production and the like by virtue of excellent maneuverability, flexibility and autonomous flight capability. The path planning is used as a core technology for autonomous operation of the unmanned aerial vehicle, and the core aim is to generate an optimal collision-free flight path on the premise of ensuring flight safety, so that the task efficiency is maximized and the energy consumption is minimized. Currently, the mainstream unmanned plane path planning algorithm can be divided into a traditional algorithm and a deep learning algorithm. The traditional algorithm comprises a Dijkstra algorithm, an A algorithm, an RRT algorithm and the like, wherein the Dijkstra algorithm has high searching cost and low searching efficiency in a complex environment, the A algorithm depends on heuristic function design, the adaptability in an unknown environment is poor, and the RRT algorithm has the problems of poor path smoothness and insufficient searching precision. In the deep learning algorithm, the DQN algorithm based on deep reinforcement learning is focused on being capable of processing a high-dimensional state space, but the traditional DQN algorithm has the defects of overestimation of a Q value, slow convergence speed, redundancy of a planned path, insufficient safety and the like, and is difficult to meet the requirement of efficient path planning in an unknown obstacle-containing environment. The Gray Wolf Optimization (GWO) algorithm is used as a population intelligent optimization algorithm, has the characteristics of strong convergence, few adjustable parameters and prominent global searching capability, but has limitation on local optimization accuracy. The Dual Deep Q Network (DDQN) alleviates the Q-value overestimation problem by introducing a dual network mechanism, but still faces the challenge of exploring and utilizing the balance in a complex environment. The method combines the global searching advantage of GWO with the decision making capability of DDQN because a single algorithm cannot solve the core pain points such as safety, efficiency and the like of path planning in an unknown obstacle-containing environment, DDQN is generally used in the fields of intelligent path planning, autonomous control and the like, GWO makes up the exploration short board of DDQN, DDQN makes up the local optimization limitation of GWO, thereby improving the model convergence speed and the path quality, and the combination method is to fuse GWO and self-adaption in the action selection stage of DDQN-Greedy strategy, optimizing action selection in training phase, hopefully solving the shortfall of existing algorithms. Disclosure of Invention The invention aims to solve the technical problems of weak global searching capability, low convergence speed, insufficient path safety and high redundancy of the existing unmanned aerial vehicle path planning algorithm in an unknown environment containing random obstacles, and provides an unmanned aerial vehicle autonomous path planning method based on the gray wolf optimization and a double-depth Q network. The technical scheme of the invention is as follows: An unmanned aerial vehicle autonomous path planning method based on a wolf optimization and a double-depth Q network comprises the following steps: Step one, establishing a Markov decision process MDP model of unmanned plane path planning, and defining a state space of the model Space of actionReward functionProbability of state transition; Step two, designing a composite rewarding functionWhereinIs awarded for boundary,Awarding a prize to the obstacle,The unmanned aerial vehicle is respectively restrained from crossing the boundary, avoiding obstacles and guiding the unmanned aerial vehicle to fly towards the target point for target rewarding; step three, an improved dual-depth Q network DDQN is constructed, a dual mechanism of a current network and a target network is introduced, the current network is responsible for selecting actions in real time, Q values of all actions are output based on the current environment state, the target network is responsible for evaluating the target Q value of the actions selected by the current network, and GWO gray wolf optimization algorithm and self-adaption are fused in the action selection stage -A greedy strate