CN-122015841-A - Unmanned aerial vehicle track planning method and system based on integral compensation
Abstract
The application discloses an unmanned aerial vehicle track planning method and system based on integral compensation, and relates to the technical field of unmanned aerial vehicle track planning. Training the intelligent agent in the environment simulator by reinforcement learning, modeling the track planning problem into a partial or complete visible Markov decision process to obtain a baseline strategy of track planning, correcting the state error of the intelligent agent in real time based on an integral compensation mechanism, smoothing the action instruction output by the intelligent agent by adopting an action filter, optimizing the strategy by an integral compensation reinforcement learning planning module, selecting the strategy with the largest accumulated rewards by the planning module by using a population optimization method as an execution strategy, and generating a final track by interaction with the environment simulator, thereby realizing high-success-rate and smooth and flyable track planning.
Inventors
- ZHANG XINJU
- LI SHENGZHI
- WANG QI
- WANG YUSHENG
Assignees
- 应急管理部大数据中心
Dates
- Publication Date
- 20260512
- Application Date
- 20260109
Claims (10)
- 1. The unmanned aerial vehicle track planning method based on integral compensation is characterized by comprising the following steps of: Constructing an environment simulator simulating the flight environment of the unmanned aerial vehicle, wherein the environment simulator randomly generates threats and target positions when resetting each time; Training an agent in the environment simulator by reinforcement learning, modeling a track planning problem as a partial observable Markov decision process or a complete observable Markov decision process, and obtaining a baseline strategy of the track planning; Real-time correction is carried out on the state error of the intelligent agent based on an integral compensation mechanism, and a motion filter is adopted to carry out smoothing treatment on a motion instruction output by the intelligent agent; and optimizing the strategy by using an integral compensation reinforcement learning planning module, wherein the planning module selects the strategy with the largest cumulative rewards as an execution strategy by using a population optimization method, and interacts with an environment simulator to generate a final track.
- 2. The unmanned aerial vehicle flight path planning method based on integral compensation of claim 1, wherein the environment simulator is used for simulating static threat, dynamic threat, target point and aerial vehicle dynamics model of the unmanned aerial vehicle flight environment.
- 3. The unmanned aerial vehicle track planning method based on integral compensation as claimed in claim 2, wherein in the integral compensation mechanism, the attenuation factor lambda takes a value of 0.8 and the compensation coefficient beta takes a value of 0.3, and the state error after integral compensation replaces the original state error without increasing the dimension of the state space, so as to eliminate the static position difference and reduce the position tracking error to zero.
- 4. A method of unmanned aircraft flight path planning based on integral compensation as claimed in claim 3, wherein the integral compensation state error in the integral compensation mechanism is defined as: ; In the formula, Is that The error in the original state of the moment in time, Is that The compensated state error of the moment in time, In order to compensate for the coefficient of the coefficient, In order for the attenuation factor to be a factor, For adjusting the weight of past errors; the contribution to the old error decays exponentially with time; from time step 1 to Representing the sum of all past errors, the weight of the past errors decreasing with increasing distance from the current time.
- 5. The unmanned aerial vehicle trajectory planning method of claim 1, wherein the motion filter comprises a momentum filter and an interpolation filter, wherein the momentum filter is configured to calculate a local average of the motion difference by an exponentially weighted average, and the calculation formula of the momentum filter is: ; ; In the formula, To simulate the number of steps An exponentially weighted average of motion differences of (a); is a weighting coefficient for controlling the weight of the historical value; a weighted average of the last analog steps; for the intelligent agent at the decision time An output original action; The action actually transmitted to the simulator in the last simulation step number is performed; To cut the function, the difference is limited to Within the range; Is a smooth action; the motion update rate is used for controlling the change amplitude of the smooth motion; the interpolation filter is used for generating an action transition function through Hermite interpolation, and the calculation formula of the interpolation filter is as follows: ; In the formula, To at the moment of decision A smooth motion; A smoothing action at the last decision moment; For the intelligent agent at the current decision moment An output original action; is a weighting coefficient for balancing new and old actions.
- 6. A method of unmanned aircraft flight path planning based on point compensation as claimed in claim 1, wherein when the strategy with the greatest jackpot is selected as the execution strategy, the jackpot is a continuous heuristic bonus function calculated in the following way: ; In the formula, Is a total rewarding value based on the current coordinate and the Euclidean distance of the target point; a base prize; Is a weight factor for adjusting heuristic bonus intensity; is Euclidean distance; And Is a weight parameter; The distance threshold is fixed, the rewards are minus infinity when encountering an obstacle, 100 when reaching the end point, and 0 in other conditions.
- 7. The unmanned aerial vehicle trajectory planning method based on integral compensation of claim 6, wherein the optimization process of the integral compensation reinforcement learning planning module comprises: Setting a population size p=100, an elite individual number e=10, a planning length h=250 steps, and iterative optimization times k=10; Predicting a future state by the sub-environment simulator, calculating a jackpot J (θ); updating strategy parameters by adopting a soft updating mode; If the integral compensation optimization fails, backing off to a baseline strategy; The planning module supports formation planning of multiple unmanned aircrafts, and realizes collaborative track generation by introducing formation states and rewards.
- 8. The unmanned aerial vehicle track planning method based on integral compensation as claimed in claim 7, wherein the network structure of the intelligent agent is a strategy-evaluation framework, wherein the strategy network comprises two hidden layers, 128 neurons in each layer are activated as ReLU, a control signal range is limited by an output layer through a tank activation function, the evaluation network input comprises states and actions, the hidden layer structure and the strategy network are used for estimating a Q value through the output layer, and network parameters are updated through gradient clipping and preferential experience playback.
- 9. An unmanned aerial vehicle trajectory planning system based on integral compensation, characterized by the steps for performing the unmanned aerial vehicle trajectory planning method based on integral compensation as claimed in any one of claims 1 to 8, comprising: The environment simulator module is used for constructing an environment simulator for simulating the flying environment of the unmanned aerial vehicle, and the environment simulator randomly generates threats and target positions when resetting each time; The reinforcement learning training module is used for training an agent in the environment simulator by reinforcement learning, modeling the track planning problem into a partial considerable Markov decision process or a complete considerable Markov decision process, and obtaining a baseline strategy of the track planning; the compensation filtering module is used for correcting the state error of the intelligent agent in real time based on an integral compensation mechanism, and smoothing the action instruction output by the intelligent agent by adopting an action filter; And the integral compensation planning module is used for optimizing strategies through the integral compensation reinforcement learning planning module, selecting the strategy with the largest accumulated rewards as an execution strategy by using a population optimization method, and interacting with the environment simulator to generate a final track.
- 10. The unmanned aerial vehicle flight path planning system of claim 9, wherein the environment simulator module further comprises a sub-simulator unit for predicting future conditions, wherein the environment simulator module sets a random initial position at each reset, ensures that the planning strategy is independent of a fixed environment, and ensures pose point smoothness through the action filter module.
Description
Unmanned aerial vehicle track planning method and system based on integral compensation Technical Field The invention relates to the technical field of Unmanned aerial vehicle (Unmanned AERIAL VEHICLE, UAV) track planning, in particular to an Unmanned aerial vehicle track planning method and system based on integral compensation. Background Unmanned aerial vehicle track planning is the most important part in mission planning, and is also one of key technologies of unmanned aerial vehicle collaborative combat and future networked combat. Reasonable planning can enable unmanned aerial vehicles to effectively avoid threats, and survival probability and combat efficiency are improved. Solving the track planning problem by adopting an intelligent optimization algorithm is the most widely used method at present, and an Ant Colony Optimization (ACO) algorithm based on a grid model is one of the most important methods. Compared with VORONOI drawing method, the method can automatically search the minimum cost flight path in free space without setting navigation nodes or constructing VORONOI drawing, has stronger self-adaptive capacity and is suitable for different threat types of battle conditions. In contrast, reinforcement learning (ReinforICent Learning, RL) has the advantages of good real-time performance, excellent generalization performance, universality of design flow and the like, so that the reinforcement learning system has excellent performance on path planning problems in the fields of robots, unmanned aerial vehicles and the like. The probability roadmap (probabilistic roadmaps, PRM) is used for dividing a plurality of local target points on a large map, and then the reinforcement learning agent trained by the depth deterministic strategy gradient (DEEP DETERMINISTIC polar gradient, RL) guides the robot to move towards the local target points, so that the problem of remote path planning of the robot in a complex environment can be solved. The whole map image is used as the observation state of the reinforcement learning intelligent agent, and the adopted depth Q network algorithm is excellent in static and dynamic obstacle environments. The above methods are all based on model-free reinforcement learning and use modes of "offline training and unmanned aerial vehicle use", but do not discuss countermeasures when reinforcement learning agents fail during unmanned aerial vehicle use. While the gradient optimization method used in the offline training phase may be used in the unmanned aerial vehicle use phase to continue training the reinforcement learning agent as a countermeasure, this is overly computationally expensive and reduces real-time. The dynamic anti-integral compensation (Integral Compensation, IC) mechanism contains internal state variables, is complex in design, can convert non-convex form stability criteria into LMI (linear matrix inequality ) form stability criteria by using a projection theorem, has weak feasibility condition limitation and good system dynamic response effect, can achieve global asymptotic stability when the system is stable in an open loop and can only achieve local asymptotic stability when the system is unstable in an open loop. Therefore, the method aims at solving the problems that the traditional method such as Ant Colony Optimization (ACO), genetic algorithm and the like have large calculation amount, poor real-time performance and easy sinking into local optimum, and the existing method has low success rate in a dynamic threat environment and lacks effective constraint on the track needing overload. What is needed is an integrated compensation reinforcement learning algorithm planning method which can simultaneously consider a static anti-integration compensation mechanism and a dynamic anti-integration compensation mechanism, combines an integration compensation algorithm and a reinforcement learning algorithm, and meets the unmanned aerial vehicle track planning requirement. Disclosure of Invention In view of the above, in order to solve the problems existing in the existing unmanned aerial vehicle track planning method, the present invention aims to provide an unmanned aerial vehicle track planning method and system based on integral compensation reinforcement learning (RL-IC), which eliminates static difference, enhances action correlation by fusing an integral compensation mechanism and an RL frame, and introduces an environmental simulator and an action filter to realize track planning with high success rate and smooth and flyable. The system supports single machine and formation tasks and has strong generalization capability. In order to achieve the above purpose, the present invention provides the following technical solutions: In a first aspect, the invention provides an unmanned aerial vehicle track planning method based on integral compensation, comprising the following steps: Constructing an environment simulator simulating the flight environment of the