CN-121995936-A - Unmanned aerial vehicle data acquisition method based on prediction enhancement type deep reinforcement learning network
Abstract
The invention discloses an unmanned aerial vehicle data acquisition method based on a prediction enhancement type deep reinforcement learning network. According to the invention, a state prediction network is introduced into MATD frames, and a self-adaptive intrinsic rewarding structure is combined, so that collaborative track planning of the unmanned aerial vehicle group in an unknown and dynamic environment can be realized. The invention improves the space exploration efficiency and the regional division capability, reduces the repeated access and cooperation conflict of the unmanned aerial vehicle, and further improves the task completion rate and the coverage performance.
Inventors
- SHI JIA
- TANG HAIPEI
- LI ZAN
- SUN WENTAO
- GUAN LEI
- SI JIANGBO
Assignees
- 西安电子科技大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260128
Claims (10)
- 1. A method for unmanned aerial vehicle data acquisition based on a predictive enhanced deep reinforcement learning network, characterized in that the method is applied to each unmanned aerial vehicle in an unmanned aerial vehicle system, and comprises: the unmanned aerial vehicle acquires the state at the current moment, inputs the state at the current moment into a self-deployed trained strategy network, and outputs the action at the current moment; the unmanned aerial vehicle executes the action at the current moment to acquire data from the corresponding ground data acquisition node; The trained strategy network is a trained Actor network obtained by jointly training an Actor network, a Critic network, a predictor and a exploration adapter deployed on each unmanned aerial vehicle in the unmanned aerial vehicle system by adopting a reinforcement learning method, wherein the predictor and the exploration adapter are used for assisting in training of the Actor network and the Critic network, and the output of the predictor and the output of the exploration adapter are used for calculating self-adaptive intrinsic rewards of the unmanned aerial vehicle in the training process, and the self-adaptive intrinsic rewards are used for guiding the unmanned aerial vehicle to perform position exploration and region cooperation.
- 2. The unmanned aerial vehicle data acquisition method based on the prediction enhancement type deep reinforcement learning network of claim 1, wherein the state of the current moment comprises state information of ground data acquisition nodes in a perception range of the current moment and state information of the unmanned aerial vehicle at the current moment, wherein the state information of the ground data acquisition nodes in the perception range of the current moment comprises positions of all ground data acquisition nodes, residual data quantity and residual data generation time of each data packet in the perception range of the unmanned aerial vehicle at the current moment, and the state information of the unmanned aerial vehicle at the current moment comprises the positions and residual energy of the unmanned aerial vehicle at the current moment; The actions at the current moment comprise an angle for controlling the movement direction of the unmanned aerial vehicle at the current moment and the flight distance of the unmanned aerial vehicle at the current moment.
- 3. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network according to claim 2, wherein the predictor is a neural network, and the predictor is used for any unmanned aerial vehicle according to the input Outputting a predictive probability by the state information of the ground data acquisition nodes in the perception range of the time step, wherein the predictive probability represents the And the probability that the ground data acquisition node in the perception range of the time step belongs to any unmanned plane.
- 4. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network of claim 3, wherein the expression of the loss function of the predictor is as follows: ; Wherein, the The loss value is indicated as such, Representing the cross-entropy loss function, The thermal function is represented by a function of heat, Representing the first of the unmanned aerial vehicle systems Unmanned plane (V) The individual drones represent either of the drones, Has a value of 1 to , Representing the total number of drones in the drone system, Represent the first Unmanned aerial vehicle The state information of the ground data acquisition nodes in the perception range of the time step, Representation of The represented ground data acquisition node belongs to the first Probability of the individual drone.
- 5. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network according to claim 2, wherein the exploration adapter is configured to input any unmanned aerial vehicle of the unmanned aerial vehicle system Generating the state of the time step and generating the state of any unmanned aerial vehicle Space exploration degree of time steps, wherein any one unmanned aerial vehicle is in The expression of the spatial exploration degree of the time step is as follows: ; Wherein, the Representing the first of the unmanned aerial vehicle systems The unmanned aerial vehicle is at The degree of spatial exploration of the time step, Has a value of 1 to , Representing the total number of drones in the drone system, Represent the first Unmanned aerial vehicle The state of the time step is that, In order to run the average value of the values, Representing an embedded network with fixed network parameters in the exploration adapter, Representing an embedded network in the exploration adapter having network parameters that need to be trained.
- 6. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network of claim 5, wherein the expression of the loss function of the exploration adapter is as follows: ; Wherein, the The loss value is indicated as such, Representing the desired operation.
- 7. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network according to claim 2, wherein any one of the unmanned aerial vehicle systems is executing during the training process After the time-step action, the result is The total rewards for the time steps include: environmental rewards and time-step Adaptive intrinsic rewards of time steps, wherein any one of the unmanned aerial vehicles The expression of the environmental rewards for the time step is as follows: ; ; Wherein, the Represent the first The unmanned aerial vehicle is at Environmental rewards of time steps, 1 A single drone represents any one of the drone systems, Has a value of 1 to , Representing the total number of drones in the drone system, Representing the total number of surface data acquisition nodes on the surface, , Indicating the total number of steps per training, Representing the time taken up by each time step, , From the first of the representative unmanned plane Time for collecting data packet in each ground data collection node , Represent the first The time of generation of the earliest generated data packet in the individual ground data acquisition nodes, Has a value of 1 to , Represent the first The weight of the individual ground data acquisition nodes, Which represents a constant term that is preset in order to obtain, Representing a preset AOI threshold value of the device, Represent the first The unmanned aerial vehicle is at The time step is considered to be energy consumption or punishment of impact obstacles.
- 8. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network according to claim 7, wherein any one unmanned aerial vehicle is The expression of the adaptive intrinsic rewards for the time steps is as follows: ; ; ; Wherein, the Representing said arbitrary one of the unmanned aerial vehicles Adaptive intrinsic rewards of time steps, Indicating that any one unmanned aerial vehicle is in The degree of spatial exploration of the time step, Indicating that any one unmanned aerial vehicle is in The location diversity of the time step rewards, Indicating that any one unmanned aerial vehicle is in The region of time steps cooperate to rewards, Representing said arbitrary one of the unmanned aerial vehicles The state of the time step is that, Representation of Comprises The state information of the time step itself, Representation of Comprises The state information of the ground data acquisition nodes in the perception range of the time step, Indicating that any one unmanned aerial vehicle is in A set of locations for each time step preceding the time step, Representing a collection In any one of the positions of the plate, Representation calculation Position and location of (3) The euclidean distance between the two, The representation is to take the absolute value, Representation of The represented ground data acquisition node belongs to the first The probability of the individual unmanned aerial vehicle, Representing the network parameters of the predictor.
- 9. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network of claim 8, wherein any one unmanned aerial vehicle is executing After the time-step action, the result is The expression of the total prize for the time step is as follows: ; Wherein, the Indicating that any one unmanned aerial vehicle is executing After the time-step action, the result is The total of the rewards of the time steps, Representing the weight coefficient.
- 10. The unmanned aerial vehicle data acquisition method based on the predictive enhanced deep reinforcement learning network of claim 1, wherein the application scenes of the unmanned aerial vehicle system comprise a patrol scene, an emergency response scene and a mobile perception scene.
Description
Unmanned aerial vehicle data acquisition method based on prediction enhancement type deep reinforcement learning network Technical Field The invention belongs to the technical field of intelligent control of multiple unmanned aerial vehicles, and particularly relates to an unmanned aerial vehicle data acquisition method based on a prediction enhancement type deep reinforcement learning network. Background With the development of unmanned aerial vehicles in the scenes of inspection, mapping, emergency guarantee and the like, a Multi-unmanned aerial vehicle System (Multi-UAV System) is becoming a mainstream in a large-scale continuous task. In order to realize efficient collaborative trajectory planning in a dynamic environment, multi-agent deep reinforcement learning (MADRL) becomes an important solution idea. The traditional MADRL schemes such as MATD, MADDPG have certain intelligence, but still have the following disadvantages: 1) The lack of predictive capability for future states of the environment results in a lack of prospective action strategies, and difficulty in maintaining stable performance on a long time scale; 2) The cooperative efficiency among the unmanned aerial vehicles is insufficient, namely the problem that different unmanned aerial vehicles are concentrated in a similar area, repeatedly access and waste of resources often occur; 3) Insufficient or unstable exploration ability, namely that if no additional guidance rewards exist, the intelligent agent easily falls into local optimum in early stage; 4) Sparse rewards lead to low training efficiency, external rewards tend to be sparse in a multi-unmanned aerial vehicle task, and effective training signals are difficult to obtain by a model. Therefore, there is a need for a multi-agent reinforcement learning method that can address both "lack of prospective" and "lack of synergistic efficiency. Disclosure of Invention In order to solve the problems in the prior art, the invention provides an unmanned aerial vehicle data acquisition method based on a prediction enhancement type deep reinforcement learning network. The technical problems to be solved by the invention are realized by the following technical scheme: The invention provides an unmanned aerial vehicle data acquisition method based on a prediction enhancement type deep reinforcement learning network, which is characterized by being applied to each unmanned aerial vehicle in an unmanned aerial vehicle system, and comprising the following steps of: the unmanned aerial vehicle acquires the state at the current moment, inputs the state at the current moment into a self-deployed trained strategy network, and outputs the action at the current moment; the unmanned aerial vehicle executes the action at the current moment to acquire data from the corresponding ground data acquisition node; The trained strategy network is a trained Actor network obtained by jointly training an Actor network, a Critic network, a predictor and a exploration adapter deployed on each unmanned aerial vehicle in the unmanned aerial vehicle system by adopting a reinforcement learning method, wherein the predictor and the exploration adapter are used for assisting in training of the Actor network and the Critic network, and the output of the predictor and the output of the exploration adapter are used for calculating self-adaptive intrinsic rewards of the unmanned aerial vehicle in the training process, and the self-adaptive intrinsic rewards are used for guiding the unmanned aerial vehicle to perform position exploration and region cooperation. In some embodiments, the state of the current moment comprises state information of ground data acquisition nodes in a sensing range of the current moment and state information of the unmanned aerial vehicle at the current moment, wherein the state information of the ground data acquisition nodes in the sensing range of the current moment comprises positions of all ground data acquisition nodes in the sensing range of the unmanned aerial vehicle at the current moment, residual data quantity and residual data generation time of each data packet, and the state information of the unmanned aerial vehicle at the current moment comprises the positions of the unmanned aerial vehicle at the current moment and residual energy; The actions at the current moment comprise an angle for controlling the movement direction of the unmanned aerial vehicle at the current moment and the flight distance of the unmanned aerial vehicle at the current moment. In some embodiments, the predictor is a neural network, and the predictor is configured to determine any one of the drones based on the inputsOutputting a predictive probability by the state information of the ground data acquisition nodes in the perception range of the time step, wherein the predictive probability represents theAnd the probability that the ground data acquisition node in the perception range of the time step belongs to any unm