CN-121979245-A - Urban multi-unmanned-plane collaborative route planning method based on deep reinforcement learning

CN121979245ACN 121979245 ACN121979245 ACN 121979245ACN-121979245-A

Abstract

The invention discloses a depth reinforcement learning-based urban multi-unmanned aerial vehicle collaborative route planning method, which comprises the steps of constructing a flight environment model based on urban three-dimensional point cloud data, utilizing an R-Tree index to realize efficient obstacle inquiry, acquiring surrounding obstacle distance information aiming at a plurality of unmanned aerial vehicles in a ray-based environment perception mode, constructing a state space containing current unmanned aerial vehicle states, target relative positions, other unmanned aerial vehicle relative positions and surrounding obstacle distance information, outputting continuous flight control actions through a parameter sharing depth reinforcement learning strategy, designing a multi-target reward function to guide the plurality of unmanned aerial vehicles to realize safe and efficient collaborative route planning, and enabling each unmanned aerial vehicle to independently decide according to a sharing strategy network and own local observation information in an execution stage to realize distributed collaborative flight. The method effectively reduces the complexity of the traditional method for environment perception and decision calculation, and is suitable for multiple unmanned aerial vehicle planning tasks in the complex urban space domain.

Inventors

MENG HUAN
LI XIANG

Assignees

上海擎狮智能科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260202

Claims (8)

1. A city multi-unmanned aerial vehicle collaborative route planning method based on deep reinforcement learning is characterized by comprising the following steps: Firstly, cleaning urban three-dimensional point cloud data, removing noise, constructing a three-dimensional flight environment model based on an R-Tree space index technology, and inquiring the space relation between an unmanned aerial vehicle and an obstacle in real time; Step 2, constructing a state space based on ray perception, namely transmitting by taking the unmanned aerial vehicle as the center The method comprises the steps of obtaining a cut-off distance of each ray, and constructing a state vector by combining current self-motion state information of the unmanned aerial vehicle, relative position information of a target point and other relative position information of the unmanned aerial vehicle; Step 3, constructing an action space, namely defining a state space of the unmanned aerial vehicle as a normalized speed control vector along the forward direction, the lateral direction and the vertical direction; Step 4, setting a multi-objective rewarding function, namely designing a comprehensive rewarding function comprising objective guiding rewards, collision penalties, time consumption penalties and action smoothness penalties; Constructing a deep reinforcement learning strategy network with shared parameters, namely adopting a near-end strategy optimization PPO algorithm to construct a deep neural network model comprising an Actor network and a Critic network, enabling all the same-quality unmanned aerial vehicles to share the same group of network parameters, and outputting continuous flight control actions based on respective state spaces; And 6, deploying a training convergence strategy network to the unmanned aerial vehicle control system, and outputting a control instruction to drive the unmanned aerial vehicle to fly according to the ray data and the state information acquired in real time.
2. The method of claim 1, wherein the transmitting of step 2 Rays, each ray having a preset maximum detection distance When the ray does not collide with the obstacle within the maximum detection distance, its return distance takes the maximum detection distance.
3. The method of claim 1, wherein the radiation comprises radiation directed toward the target location along the current location of the drone, radiation directed toward the opposite direction of the target location along the drone, radiation directed vertically upward and vertically downward along the world coordinate system, and the remaining radiation distributed at a predetermined angle in a plane formed by the current location of the drone and the target location, wherein the radiation density toward the forward region of the target location is higher than the rearward region.
4. The method of claim 1, wherein the state space in step 2 is an observation vector based on local perception, and at least comprises current flight speed of the unmanned aerial vehicle, relative position information of the current position of the unmanned aerial vehicle relative to the target position, relative position information of other unmanned aerial vehicles relative to the current unmanned aerial vehicle, and obstacle distance information corresponding to each ray.
5. The method of claim 1, wherein the flight control actions of step 5 are represented in continuous action space, and the actions include forward speed commands, lateral speed commands, and vertical speed commands of the unmanned aerial vehicle.
6. The method of claim 1, wherein the target guidance reward includes a reward when the drone approaches the target location, a penalty when the drone departs from the target location, a penalty when the drone is progressively farther from the target height, a penalty when the drone approaches an environmental boundary, a penalty when the drone approaches an obstacle, a penalty when the drone approaches other drones, a reward when the drone reaches the target.
7. The method of claim 1, wherein the integrated bonus function is a weighted combination of a plurality of bonus sub-items, different bonus sub-items correspond to different weights, wherein the weight values corresponding to the bonus and the collision penalty when the unmanned aerial vehicle reaches the target are greater than the weights of the remaining sub-items, and the weight parameters of each bonus sub-item are configured according to different flight scenes or mission requirements.
8. The method of claim 1, wherein the parameter-sharing deep reinforcement learning strategy network updates parameters in a centralized manner during a training phase, and is independently invoked by each unmanned aerial vehicle during an execution phase, thereby realizing distributed collaborative flight without explicit communication.

Description

Urban multi-unmanned-plane collaborative route planning method based on deep reinforcement learning Technical Field The invention belongs to the technical field of low-altitude airspace management, multi-unmanned aerial vehicle cooperative control and path planning, and particularly relates to a city multi-unmanned aerial vehicle cooperative route planning method based on deep reinforcement learning. Background With the rapid development of low-altitude economy, unmanned aerial vehicles are increasingly widely deployed in application scenes such as urban inspection, logistics distribution, emergency response, urban perception and the like. In an actual urban environment, multiple unmanned aerial vehicles often need to perform tasks simultaneously in complex three-dimensional spaces with dense buildings and limited airspace. How to realize efficient collaborative flight and path planning of multiple unmanned aerial vehicles on the premise of guaranteeing flight safety has become a key technical problem to be solved in low-altitude unmanned system application. Existing unmanned aerial vehicle path planning studies are typically based on a degree of idealized assumptions, such as simplifying the environment into regular geometry, or pre-building an air route network that can be traversed by the unmanned aerial vehicle. However, in a high-density urban environment, unmanned aerial vehicles inevitably need to traverse narrow spaces or highly constrained areas between buildings, which puts higher demands on path planning algorithms in terms of three-dimensional space perceptibility, trajectory rationality, multi-machine collaboration mechanisms, and the like. The traditional path planning method mainly comprises a search-based method and a sampling-based method, and the method can obtain a reliable result in a single unmanned aerial vehicle scene, but when the method is extended to a multi-unmanned aerial vehicle cooperative task, the calculation complexity of the method can be rapidly increased, the generated flight track often lacks smoothness, and the method is difficult to rapidly converge to a high-quality solution in a complex or narrow environment. In addition, some path planning methods based on numerical optimization can process complex constraint conditions under the condition of not relying on explicit environment modeling, but the iterative solution process is usually accompanied by higher calculation cost and longer solution delay, so that the real-time collaborative flight requirement of multiple unmanned aerial vehicles is difficult to meet. In recent years, deep reinforcement learning has been increasingly introduced into the field of unmanned aerial vehicle path planning, thanks to its advantages in learning based on experience and environmental adaptation. Early studies focused on single agent scenarios, followed by a gradual development of multi-agent reinforcement learning frameworks to deal with collaborative tasks. While multi-agent reinforcement learning methods exhibit good potential in collaborative decision-making problems, their training process is often accompanied by significant computational costs, and training stability and algorithm expansibility face serious challenges as the number of agents increases. Therefore, how to construct a multi-unmanned aerial vehicle path planning method with safety, synergy, expandability and sample efficiency in a complex three-dimensional environment of a city is still an important problem in the current research. Disclosure of Invention The invention aims to solve the problems in the prior art and provides a collaborative route planning method for multiple unmanned aerial vehicles in a city based on deep reinforcement learning. According to the method, the surrounding environment information of the unmanned aerial vehicle is obtained based on the ray perception model, under the independent strategy reinforcement learning paradigm, a unified strategy network is shared by a plurality of homogeneous unmanned aerial vehicles through a parameter sharing mechanism, and flight control under a continuous action space is realized by combining a near-end strategy optimization algorithm, so that the cooperative efficiency and the system expandability of the unmanned aerial vehicles in a complex environment are improved on the premise of ensuring the flight safety. The technical scheme adopted by the invention is as follows: A city multi-unmanned aerial vehicle collaborative route planning method based on deep reinforcement learning comprises the following steps: Step 1, three-dimensional flight environment modeling, namely cleaning urban three-dimensional point cloud data, removing noise, constructing a three-dimensional flight environment model based on an R-Tree spatial index technology, and inquiring the spatial relationship between an unmanned aerial vehicle and an obstacle in real time, organizing buildings and the obstacle in the environment by adopting