CN-121995935-A - Multi-unmanned aerial vehicle navigation collaborative environment monitoring implementation method based on deep reinforcement learning

CN121995935ACN 121995935 ACN121995935 ACN 121995935ACN-121995935-A

Abstract

The invention discloses a multi-unmanned aerial vehicle navigation collaborative environment monitoring implementation method based on deep reinforcement learning, and belongs to the technical field of unmanned aerial vehicle navigation. The method comprises the steps of obtaining training data, carrying out deep reinforcement learning training based on the training data, inputting the data of each round into a deep reinforcement learning model, outputting a planning route by the model, randomly generating data by an Internet of things node in a poisson process, adding the node with the data into a node list to be accessed by an unmanned aerial vehicle, inputting the node into the deep reinforcement learning model to obtain a path planning scheme, further introducing a multi-unmanned aerial vehicle task allocation method based on dynamic planning after the high-quality unmanned aerial vehicle global flight path is obtained, and carrying out intelligent division on the path among a plurality of unmanned aerial vehicles according to the energy constraint of the unmanned aerial vehicle.

Inventors

WU PENGFEI
LIU XINYU
YANG QIHANG
SHA CHAO
HUANG HAIPING

Assignees

南京邮电大学

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (3)

1. A method for realizing collaborative environment monitoring of multi-unmanned aerial vehicle navigation based on deep reinforcement learning is characterized by comprising the following steps of 1, obtaining training data, namely uniformly distributing unmanned aerial vehicle network nodes, generating required quantity of training data in a self-defined size interval by using a random generation method, 2, performing deep reinforcement learning training based on the training data, namely inputting the training data into a deep reinforcement learning model in a space set mode, training the model based on the training data by the deep reinforcement learning model, enabling the neural network to approach a strategy function to obtain path planning strategies and probability distribution of the unmanned aerial vehicle, wherein the training data are expressed as a state space, an action space and a corresponding reward function space, the state space is expressed as S= (node weight, AOI of an internet of things node, unmanned aerial vehicle position, nodes to be accessed and node rounds), the action space is expressed as a= (unmanned aerial vehicle moving direction, next access node), and the data of each round is input into the deep reinforcement learning model in a space set mode, and the model outputs a planning route As shown in formula (1): Formula (1) In the above-mentioned formula(s), The representative model M is formed by the input data set The parameters are The resulting strategy, and its probability distribution, Step 4, calculating a loss function value of the total path, wherein the formula (2) represents the total path: Formula (2), step 5, X (t) is an original uniformly distributed data set, after each round of data training is completed, the training data is updated in a gradient descent mode to obtain a data set with more complex distribution, the data set is mixed with the original data set to be used as input data in step 2, In the above formula, M is the model being trained, As a baseline model of the vehicle, The method comprises the steps of (1) outputting strategy probability distribution by a model M, wherein T is a training degree control parameter and is determined by a training progress, and (6) randomly generating data by the nodes of the Internet of things in a poisson process, adding the nodes with the data into a node list to be accessed by an unmanned aerial vehicle, and inputting the nodes with the data into a deep reinforcement learning model to obtain a path planning scheme: step 7, obtaining the high-quality unmanned aerial vehicle global flight path And then, a multi-unmanned aerial vehicle task allocation method based on dynamic planning is further introduced, and the paths are intelligently divided among the plurality of unmanned aerial vehicles according to the energy constraint of the unmanned aerial vehicles.
2. The method for realizing the collaborative environment monitoring of the multi-unmanned aerial vehicle navigation based on the deep reinforcement learning according to the claim 1 is characterized in that step 2 is that data are input into a deep reinforcement learning model, and the model outputs a path planning strategy and probability distribution of the unmanned aerial vehicle; 2.1, inputting data into a neural network, wherein the neural network is used for approaching a strategy function and outputting the probability of selecting the action a in the state S; Step 2.2, the reinforcement learning model comprises a state space, an action space and a reward function, wherein the state space is S= (node weight, AOI of the node of the Internet of things, unmanned plane position, node to be accessed and node turn), the action space is a= (unmanned plane moving direction, next access node), the reward function is r=total path length, and the reward of the action a is output.
3. The method for realizing collaborative environment monitoring for multi-unmanned aerial vehicle navigation based on deep reinforcement learning according to claim 1, wherein the step 7 comprises the following steps that step 7.1, each node comprises position information, current buffer data quantity and corresponding AOI state thereof, flight energy consumption, data acquisition and transmission energy of the unmanned aerial vehicle when executing corresponding task segments are comprehensively considered, and continuous sub-paths are calculated Energy consumption of (2) is ; Step 7.2, in the state transition process, the state DP (l, k) represents the time before the completion of the k unmanned aerial vehicles before use The minimum accumulated cost when each node accesses a task, the corresponding state transition relationship can be expressed as Step 7.3, when all the node access tasks are considered, the system selects to enable the system to select under the condition that the maximum unmanned aerial vehicle number is not exceeded And the minimum state is used as a final solution, and the optimal node access sub-path corresponding to each unmanned aerial vehicle is determined through a backtracking dynamic planning process.

Description

Multi-unmanned aerial vehicle navigation collaborative environment monitoring implementation method based on deep reinforcement learning Technical Field The invention relates to the technical field of unmanned aerial vehicle navigation application, in particular to a multi-unmanned aerial vehicle navigation collaborative environment monitoring implementation method based on deep reinforcement learning. Background Along with the rapid development of the technology of the Internet of things, the sensor node is widely applied to the fields of environmental monitoring, agricultural management, urban planning, intelligent home and the like. The sensor nodes provide real-time information by collecting environmental data so as to support various intelligent applications. Unmanned aerial vehicle becomes the important instrument of thing networking node data collection because of its flexibility and mobility. The unmanned aerial vehicle can cover a wide area, and can quickly access and collect scattered sensor node data. However, due to randomness of node data generation and limited energy of the unmanned aerial vehicle, how the unmanned aerial vehicle efficiently completes the task of distribution under various constraint conditions has important research significance, and particularly comprises how to optimize path planning of the unmanned aerial vehicle so as to maximize data collection efficiency of the unmanned aerial vehicle and minimize energy consumption of the unmanned aerial vehicle, so that the unmanned aerial vehicle becomes a problem to be solved urgently. The traditional path planning algorithm mainly comprises an exhaustion method, a dynamic planning method, a nearest neighbor algorithm, a greedy algorithm, a secondary optimization method and the like. These algorithms have achieved some success in solving the traveller's problem, but have many shortcomings in practical applications. Although the exhaustive method and the dynamic programming can find the optimal solution, the computational complexity is high, and the method and the system are not suitable for large-scale problems. The approximation algorithm and the heuristic algorithm such as the nearest neighbor algorithm and the greedy algorithm have higher calculation efficiency, but the optimality of the solution is difficult to ensure, and the solution is easy to sink into a local optimal solution. In order to improve the quality of solving efficiency and solution, advanced heuristic algorithms such as genetic algorithm, simulated annealing algorithm and ant colony algorithm are proposed. The genetic algorithm generates a new solution by simulating natural selection and a genetic mechanism, is suitable for large-scale problems, has global searching capability, gradually reduces randomness by simulating a physical annealing process, jumps out of local optimum, has wide searching space, simulates ant foraging behaviors, searches an optimum path by transferring and updating pheromones, and is suitable for dynamic optimization problems. The advanced heuristic algorithms improve the path planning effect to a certain extent, but have long calculation time and complex parameter setting, and are difficult to apply to path planning of multiple unmanned planes in real time. In recent years, with the development of deep learning and reinforcement learning technologies, a path planning method based on deep reinforcement learning is becoming a research hotspot. The methods of centralized multi-unmanned aerial vehicle deep reinforcement learning, centralized training and distributed execution multi-agent reinforcement learning, graphic neural network and the like are applied to the path planning problem, can adjust the path planning strategy in real time in dynamic and complex environments, and show good performance. However, most of the algorithms are directly subjected to strategy learning in a joint action space or rely on complex multi-agent cooperative mechanisms, and the generalization of the algorithms is still insufficient under a flexible and changeable environment. Disclosure of Invention Aiming at the problems existing in the prior art, the invention aims to provide a path planning problem solving method for data collection of multiple unmanned aerial vehicles, which can provide an effective and stable solution according to the random node distribution condition, maximize the data collection rate of the unmanned aerial vehicles under the energy constraint and ensure the information freshness. The invention aims to provide a device. In order to solve the problems, the technical scheme includes that a multi-unmanned-plane navigation collaborative environment monitoring implementation method based on deep reinforcement learning comprises the following steps of 1, obtaining training data, namely uniformly distributing unmanned-plane network nodes, generating required quantity of training data in a self-defined size interval by using a random generation