CN-121977578-A - Unmanned plane bee colony path planning method based on reinforcement learning under complex environment

CN121977578ACN 121977578 ACN121977578 ACN 121977578ACN-121977578-A

Abstract

The invention discloses a path planning method of unmanned aerial vehicle bee colony under a complex environment based on reinforcement learning, which comprises the steps of 1, obtaining a converged near-end strategy network, 2, obtaining a virtual leading expected path formed by a plurality of waypoints, 3, calculating the probability of unmanned aerial vehicle bee colony detected by a radar cluster system at each waypoint and the newly increased unmanned aerial vehicle failure number of the unmanned aerial vehicle bee colony, judging whether the unmanned aerial vehicle bee colony at each waypoint equally flies, 4, obtaining the expected path of each unmanned aerial vehicle, constructing a reinforcement learning path planning framework by adopting a near-end strategy optimization algorithm, and simultaneously, obtaining a stable and high-quality strategy under the complex three-dimensional environment and a high-dimensional state space by considering the path length, the probability of radar detection and a composite rewarding function of failure probability, and reducing the risk of being easy to be trapped into local optimum based on a manual rule or a single cost function method.

Inventors

FAN LIYUAN
XU ZHAO
LV MINGWEI
LEI YIFEI
HU JINWEN
MA XIAONING

Assignees

西北工业大学
中国航空工业集团公司沈阳飞机设计研究所

Dates

Publication Date: 20260505
Application Date: 20260403

Claims (5)

1. The path planning method of the unmanned aerial vehicle bee colony under the complex environment based on reinforcement learning is characterized by comprising the following steps: Step 1, one is provided with The method comprises the steps that a centroid point of an unmanned aerial vehicle bee colony of an unmanned aerial vehicle formation group is defined as a virtual leader, the current state of the virtual leader is used as the state input of reinforcement learning, a strategy network and a value network are trained by using a near-end strategy optimization algorithm, and a converged near-end strategy network is obtained; Step 2, forward deduction is carried out based on a dynamics model according to the control quantity of the virtual leader output by the converged near-end strategy network so as to obtain an expected path of the virtual leader formed by a plurality of waypoints; Step 3, calculating the probability of the unmanned aerial vehicle bee colony detected by the radar cluster system and the newly-increased unmanned aerial vehicle failure number in the unmanned aerial vehicle bee colony at each route point on the expected route of the virtual leader; When the probability of the unmanned aerial vehicle swarm detected by the radar swarm system is larger than a first preset threshold value or the failure number of the unmanned aerial vehicles is larger than a second preset threshold value, equally dividing each unmanned aerial vehicle formation group of the unmanned aerial vehicle swarm at the previous waypoint into two formation subgroups for flying, and adjusting the positions of each formation subgroup so that the distance between the centroids of the two formation subgroups is larger than or equal to the maximum distance between two adjacent radar resolution units; And 4, obtaining the expected path of each unmanned aerial vehicle according to the expected path of the virtual leader, the deviation vector of the mass center of each unmanned aerial vehicle formation group and the virtual leader, and the formation deviation vector of the mass center of each unmanned aerial vehicle formation group and a plurality of unmanned aerial vehicles led by the unmanned aerial vehicle formation group.
2. The method for planning the path of the unmanned aerial vehicle bee colony under the complex environment based on the reinforcement learning according to claim 1, wherein the current state of the virtual leader in the step 1 comprises a flight state and an environment state, wherein the flight state is a position, a speed, an inclination angle, an azimuth angle and a roll angle, and the environment state is a current terrain obstacle space, a relative distance and an azimuth of the virtual leader and a target point, and a relative distance of the virtual leader and each radar.
3. The method for planning the path of the unmanned aerial vehicle bee colony under the complex environment based on the reinforcement learning according to claim 2, wherein the constraints in training the strategy network and the value network in the step 1 are as follows: ; Wherein, the For the probability of a drone swarm being detected by the radar trunking system, As a total number of radars, Is the first Is used for the radar of (a), Is radar Is used for the performance parameters of the (c) a, For unmanned aerial vehicle center and radar in resolution unit The distance between the two plates is set to be equal, For each radar in radar cluster system Number of unmanned aerial vehicles in corresponding resolution unit Is set at the maximum value of (c), Is radar The number of unmanned aerial vehicles in the corresponding resolution units; Is a radar cross section of a single drone.
4. The method for planning a path of a drone swarm based on reinforcement learning according to claim 3, wherein the reward function when training the strategy network and the value network in step 1 is: ; Wherein, the Is a reward function; Rewards after the unmanned aerial vehicle swarm reaches the target position; And Are all weight coefficients, through And Balancing unmanned aerial vehicle swarms tend to either directly reach the target or avoid radar detection; rewarding the unmanned aerial vehicle for the swarm approaching the target position; rewards flying at a preset height for the unmanned aerial vehicle swarm; penalty for being detected by the radar cluster system; Punishment of the total number of failed unmanned aerial vehicles in the unmanned aerial vehicle swarm; Wherein, the ; Wherein, the Is virtual leadership The distance to the target location is determined, , Is the position of the target; Is virtual leadership Is a position of (2); Is the duty ratio of the invalid unmanned aerial vehicle in the unmanned aerial vehicle bee colony, ; The number of drones that are totally dead; the total number of drones that are a drone swarm; Wherein, the ; Wherein, the And Are all constant and are used for the preparation of the high-voltage power supply, The distance difference between the current moment and the moment before the target position is obtained; Wherein, the ; Wherein, the For the impact coefficient of the height deviation on the system prize, A height that is a virtual leader; the method comprises the steps of presetting a flight height expected by the unmanned aerial vehicle; Wherein, the ; Wherein, the Penalty factors for the detection range of the drone swarm entering the radar swarm system, Penalty coefficients for unmanned aerial vehicle threat ranges for unmanned aerial vehicle swarms to enter the radar swarm system; the dangerous range of the unmanned aerial vehicle is preset; radius corresponding to the maximum detection range of the radar; position and radar for virtual leadership Is a distance of (2); Wherein, the ; Wherein, the For the number of unmanned aerial vehicles in the unmanned aerial vehicle swarm that have not failed, For the number of unmanned aerial vehicle failures in the unmanned aerial vehicle swarm, Is a constant greater than zero and is used to adjust the extent to which a failure affects the overall drone swarm.
5. The method for planning the path of the unmanned aerial vehicle bee colony under the complex environment based on reinforcement learning according to claim 4, wherein the method for calculating the failure number of the newly added unmanned aerial vehicles in the unmanned aerial vehicle bee colony in the step 3 is as follows: judging whether the failure probability of each unmanned aerial vehicle is larger than a third preset threshold value, and judging that the unmanned aerial vehicle is failed if the failure probability of each unmanned aerial vehicle is larger than the third preset threshold value, wherein the method for calculating the failure probability of the unmanned aerial vehicle comprises the following steps: ; Wherein, the Is the failure probability of the unmanned aerial vehicle, In order to create a range radius for failure, Is unmanned plane Is provided in the position of (a), For the time that the drone swarm is continually tracked by radar, Is the current time.

Description

Unmanned plane bee colony path planning method based on reinforcement learning under complex environment Technical Field The invention belongs to the technical field of unmanned aerial vehicle path planning, and particularly relates to a path planning method of unmanned aerial vehicle bee colony under a complex environment based on reinforcement learning. Background Unmanned aerial vehicles show wide application potential by virtue of excellent maneuverability, high autonomy and versatility. However, a single drone has significant limitations in task execution. In contrast, unmanned aerial vehicle bee colony can show promotion task efficiency, flies through the bee colony formation, can improve task efficiency, reinforcing system viability to the stand-alone risk is shared to a certain extent. In complex environments, such as scenes of a networked system formed by mountain land and other relief terrains and a plurality of ground radars, unmanned aerial vehicle bee colony needs to reduce unmanned aerial vehicle failure risks caused by radar detection and interference as much as possible on the premise of guaranteeing the completion of overall tasks, and bee colony normal rate and task success rate are improved. The traditional path planning method based on rules or cost functions generally depends on a manually designed heuristic strategy, is difficult to simultaneously consider various factors such as terrain shielding, radar detection probability, single machine failure probability and the like, and is easy to fall into local optimum. In the prior art, various unmanned aerial vehicle formation and obstacle avoidance control methods exist, formation maintenance and local obstacle avoidance can be realized on the premise of ensuring communication topology connectivity, and part of methods also consider the avoidance of threat areas. However, in the methods, for static obstacle or local conflict, the radar threat area is generally simplified into a non-traversable obstacle, the detection probability of the radar cluster system and the failure probability of the unmanned aerial vehicle are not incorporated into a unified path optimization framework, and the bee colony is not split and combined in a self-adaptive manner by combining the characteristics of the radar resolution unit, so that the rapid arrival of the target and the reduction of the failure number of the unmanned aerial vehicle bee colony are difficult to be simultaneously considered in a complex threat environment. Disclosure of Invention The invention aims to provide a path planning method of unmanned aerial vehicle bee colony under a complex environment based on reinforcement learning, so as to solve the problem that the existing method is difficult to integrate detected risks and failure risks into unified planning under the threat environment of complex terrain and radar cluster systems, and therefore the problem that the rapid arrival of targets and the overall viability are difficult to be simultaneously considered. The invention adopts the following technical scheme that the unmanned aerial vehicle bee colony path planning method based on reinforcement learning in a complex environment comprises the following steps: Step 1, one is provided with The centroid point of the unmanned aerial vehicle bee colony of the unmanned aerial vehicle formation group is defined as a virtual leader, the current state of the virtual leader is used as the state input of reinforcement learning, and a strategy network and a value network are trained by utilizing a near-end strategy optimization algorithm to obtain a converged near-end strategy network; step 2, forward deduction is carried out based on a dynamic model according to the control quantity of the virtual leader output by the converged near-end strategy network so as to obtain an expected path of the virtual leader formed by a plurality of route points; Step 3, calculating the probability of the unmanned aerial vehicle bee colony detected by the radar cluster system and the newly-increased unmanned aerial vehicle failure number in the unmanned aerial vehicle bee colony at each route point on the expected route of the virtual leader; When the probability of the unmanned aerial vehicle swarm detected by the radar cluster system is larger than a first preset threshold value or the failure number of the unmanned aerial vehicles is larger than a second preset threshold value, equally dividing each unmanned aerial vehicle formation group of the unmanned aerial vehicle swarm at the previous waypoint into two formation subgroups for flying, and adjusting the positions of each formation subgroup so that the distance between the centroids of the two formation subgroups is larger than or equal to the maximum distance between two adjacent radar resolution units; and 4, obtaining the expected path of each unmanned aerial vehicle according to the expected path of the virtual leader, the deviation vector of the mass