Search

CN-121981346-A - Multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution

CN121981346ACN 121981346 ACN121981346 ACN 121981346ACN-121981346-A

Abstract

The multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution comprises the steps of firstly modeling a multi-solution unmanned aerial vehicle path planning problem into a continuous multi-solution optimization problem and initializing a population, calculating global, neighborhood and individual characteristics for each individual in each evolution iteration of the population to construct an optimized state of each individual, then respectively selecting different mutation strategies by an intelligent agent according to the state of each individual, clustering the updated population to calculate a reward value of the current iteration after all the individuals execute mutation, intersection, selection and the like differentiation algorithms, updating intelligent agent model parameters by using a PPO method, and optimizing other multi-solution unmanned aerial vehicle path problems by using a trained model.

Inventors

  • GONG YUEJIAO
  • Lian Hongqiao
  • MA ZEYUAN
  • GUO HONGSHU
  • ZHANG XINGLIN

Assignees

  • 华南理工大学

Dates

Publication Date
20260505
Application Date
20251203

Claims (10)

  1. 1. A multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution is characterized by comprising the following steps: Using a trained agent model to adaptively select different mutation strategies for individuals in the differential evolution algorithm process so as to position more optimal path solutions, wherein the training of the agent model comprises the following steps: S1, problem modeling and initializing, namely modeling a problem into a continuous multi-solution optimization problem according to an unmanned aerial vehicle path planning problem, sampling a batch of solutions as an initial population, and initializing an intelligent body model; s2, constructing individual states, namely extracting global features and neighborhood features for each individual in each generation of optimization process, and combining the individual features to form state characterization of each individual; S3, the intelligent agent selects a variation strategy, namely, a plurality of different variation strategies are adopted as action spaces, and the intelligent agent distributes a search strategy according to the state of each individual; s4, updating the population, namely updating the position and state of the individual; s5, calculating rewards based on clustering, namely clustering the population by adopting a clustering method based on density after updating of each generation is finished, normalizing target values of optimal solutions in each cluster, and summing the normalized target values as rewards; s6, updating the intelligent body model, namely adopting a near-end strategy optimization algorithm to update parameters of the intelligent body model according to the individual state and rewards; and repeating the steps S2 to S6 until the optimization process is finished.
  2. 2. The multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution according to claim 1, wherein in step S1, the process of problem modeling and initialization is as follows: Given an unmanned aerial vehicle path planning problem, firstly modeling the problem as a continuous optimization problem, namely sampling a plurality of waypoints between a starting point and an end point, wherein each solution of the optimization problem is a coordinate sequence of each waypoint; then randomly sampling a plurality of solutions in a feasible domain of a solution space to form an initial population, and randomly initializing parameters of an agent model.
  3. 3. The multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution according to claim 1, wherein in step S2, the construction process of the individual states is as follows: S2.1, calculating global features, wherein the global features comprise average distances among individuals in a population, standard deviation of target values of the population, residual algebra, stagnation algebra of the population and average target values of the population; s2.2, calculating the neighborhood characteristics of each individual, and firstly constructing the neighborhood of each individual based on a K neighbor algorithm Then calculating the average distance, target value standard deviation, stagnation algebra, average target value and the feature of the current neighborhood, which is ranked in the front, of all individuals in each neighborhood, wherein the ranking is obtained by sequencing the target values of the optimal individuals in each neighborhood; S2.3, calculating individual characteristics, namely calculating the following characteristics of the distance and target value difference between the individual and the current optimal solution, the distance and target value difference between the individual and the historical optimal solution, the distance and target value difference between the individual and the neighborhood optimal solution, stagnation algebra, target values, the average distance and average target value difference between the individual and all other individuals in the population, and the average distance and average target value difference between the individual and all other individuals in the neighborhood for each individual in the population; s2.4, splicing the global features and the neighborhood features of each individual to obtain a multi-dimensional group feature, wherein the group feature and the individual feature jointly form the state of each individual.
  4. 4. The multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution according to claim 1, wherein in step S3, the action space comprises five kinds of variation strategies, namely gaussian local search, K-neighbor based development strategy, K-neighbor based exploration strategy, global development strategy and global exploration strategy.
  5. 5. The multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution according to claim 4, wherein in step S3, the process of selecting the mutation strategy by the agent is as follows: S3.1, after the global features and the neighborhood features of each individual are spliced, a population embedder is adopted to obtain population embedment; the individual characteristics of each individual pass through an individual embedder to obtain individual embedment, and the population embedment and the individual embedment are spliced to obtain decision embedment of each individual; S3.2, embedding all decisions into a self-attention coding module for self-attention coding among the embedded individuals in order to enhance information exchange among different individuals; s3.3, each individual feature after self-attention coding passes through a multi-layer perceptron and a Softmax layer to obtain an action probability distribution which represents sampling probability of five mutation strategies, and sampling is carried out according to the probability distribution, so that one mutation strategy can be selected for each individual.
  6. 6. The method for planning a path of a multi-solution unmanned aerial vehicle based on reinforcement learning assisted evolution according to claim 1, wherein in step S4, each individual in the population performs a selected mutation strategy, and the operations of differential evolution such as crossover and selection, and updates the position and state of the offspring individual.
  7. 7. The multi-solution unmanned aerial vehicle path planning method according to claim 1, wherein in step S5, the step of calculating the prize value comprises: S5.1, DBSCAN clustering is carried out on the sub-population after evolution updating, and the population is divided into a plurality of clusters; S5.2, normalizing and summing target values of the optimal solutions in each cluster to obtain a final rewarding value, wherein for the moment t, a formula of the rewarding function is as follows: Wherein, the Is the prize value at time t, Is the target value of the optimal solution in each cluster after clustering, Is the upper bound of the target value for normalization, Is the number of clusters.
  8. 8. The multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution according to claim 5, wherein in step S6, the update process of the agent model comprises: S6.1, embedding decisions obtained in the step S3.1 through a value network module, specifically adding the decision embedments of all individuals, dividing the added decision embedments by the number of individuals of the population to obtain a population average embedment, and then obtaining a desired return value through a multi-layer perceptron; s6.2, calculating the gradient of parameters of a strategy network module and a value network module of the intelligent agent according to a near-end strategy optimization algorithm, and updating the parameters of the intelligent agent model The specific calculation formula of the model update at the moment is as follows: Wherein, the Is the first The parameters of the time of day model are, Is the gradient of the policy network module, Is the gradient of the value network module, Is that The prize value obtained at the moment in time, Is that The population at the moment in time, Is the return of the value network estimate, Is the mean square error loss of the signal, Is to gradient the model parameters, Is the probability distribution of the policy network output, Is that A mutation strategy for time-of-day sampling, Is the learning rate.
  9. 9. A computer device comprising a memory and a processor, the memory being electrically connected to the processor, the memory storing a computer program, wherein the computer program, when executed by the processor, causes the processor to implement the method as claimed in any one of claims 1 to 8.
  10. 10. A computer readable storage medium storing a computer program, wherein the computer program is executed by a processor, the processor implementing the method according to any one of claims 1 to 8.

Description

Multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution Technical Field The invention relates to the field of reinforcement learning and intelligent optimization algorithms, in particular to a multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution. Background Along with the wide application of unmanned aerial vehicles in the fields of logistics distribution, emergency rescue, environmental monitoring and the like, efficient, safe and robust path planning technology becomes a key link. The path planning aims at generating a feasible track from a starting point to a target point for the unmanned aerial vehicle in a complex environment, and simultaneously meets the multi-objective optimization requirements such as obstacle avoidance, time, dynamics constraint and the like. The multi-solution path planning provides more flexible and diversified choices for decision makers by finding a plurality of optimal paths meeting the requirements. The evolutionary algorithm is widely applied to unmanned aerial vehicle path planning in recent years due to the advantages of strong global searching capability, easiness in parallelization and the like. However, due to factors such as genetic drift, selection pressure, etc., it is difficult for classical evolutionary algorithms to generate multiple feasible path solutions with significant differences in one run. To improve this deficiency, the existing multi-solution evolution methods are mainly divided into three types, namely (1) introducing niche technologies (such as crowding mechanisms and population division) and maintaining population diversity by enhancing flexibility of selection, (2) utilizing history searching experience and assisting population resampling by subspace sampling, redundancy check or optimal backtracking, and (3) converting the multi-solution problem into a multi-objective optimization problem and balancing the quality and diversity of solutions by constructing optimization degree and diversity into different objectives. For example ,VNCDE(Zhang Y H, Gong Y J, Gao Y, et al. Parameter-free voronoi neighborhood for evolutionary multimodal optimization[J]. IEEE Transactions on Evolutionary Computation, 2019, 24(2): 335-349.), each individual is assigned a different mutation strategy to balance exploration and development according to the individual's neighborhood state. However, such methods rely on expert knowledge to manually design strategies, lacking flexibility in different issues. In recent years, the meta-black box optimization is combined with the configuration of the neural network automatic learning optimizer, so that the dependence on expert knowledge is reduced. The meta-black box optimization based on reinforcement learning is used for training an agent to dynamically adjust algorithm parameters or select operation operators through a modeling Markov decision process, and good performance is achieved in an optimization task. For example ,RLEA-SSC(Xia H, Li C, Zeng S, et al. A reinforcement-learning-based evolutionary algorithm using solution space clustering for multimodal optimization problems[C]//2021 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2021: 1938-1945) makes decisions on which individuals to evolve using reinforcement learning training, increasing the flexibility of the multi-solution optimization algorithm, but the generalization ability of the algorithm is limited and retraining is required to face new problems. Disclosure of Invention The invention aims to provide a multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution, which aims to utilize a deep reinforcement learning training agent to adaptively select a variation strategy for an evolution algorithm, overcome the defects of the existing multi-solution method such as insufficient flexibility, manual control depending on expert knowledge, poor generalization and the like, and optimize to more optimal unmanned aerial vehicle paths with obvious differences. According to the invention, the path planning problem is modeled as a continuous multi-solution optimization problem, and the reinforcement learning training agent is utilized to dynamically decide the individual variation strategy in the evolutionary algorithm, so that the flexibility of the algorithm is improved, the improvement of the optimization efficiency and performance is realized, the generalization capability is enhanced, and a plurality of optimal paths can be solved in different unmanned plane path planning problems. The invention is realized at least by one of the following technical schemes. A multi-solution unmanned aerial vehicle path planning method based on reinforcement learning assisted evolution comprises the following steps: Using a trained agent model to adaptively select different mutation strategies for individuals in the dif