CN-121979238-A - Unmanned aerial vehicle flight intention recognition method and system based on reinforcement learning

CN121979238ACN 121979238 ACN121979238 ACN 121979238ACN-121979238-A

Abstract

The invention provides an unmanned aerial vehicle flight intention recognition method and system based on reinforcement learning, which belong to the technical field of intelligent information processing and unmanned aerial vehicle systems, wherein the method is oriented to a single-machine and multi-unmanned aerial vehicle cooperative scene, firstly, multi-source observation data of an unmanned aerial vehicle group are obtained, and a standardized observation sequence is formed through time alignment, abnormal rejection and standardized processing; on the basis, an unmanned aerial vehicle group dynamic diagram is constructed, and a particle swarm algorithm is introduced to optimize composition parameters so as to obtain a stable group interaction structure with discriminant. And then, carrying out message transmission and image level aggregation on the unmanned aerial vehicle group dynamic image based on the image neural network, extracting group cooperative characteristics, and executing role decomposition on unmanned aerial vehicle individuals to obtain a role result. The invention can realize high-robustness and high-reliability flight intention recognition under the conditions of cooperation of multiple unmanned aerial vehicles and environmental disturbance, and has good engineering application prospect.

Inventors

ZHANG ZHIYOU

Assignees

苏州工业职业技术学院

Dates

Publication Date: 20260505
Application Date: 20260114

Claims (10)

1. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning is characterized by comprising the following steps of: S1, acquiring multi-source observation data of an unmanned aerial vehicle group to be identified in a target space domain, wherein the multi-source observation data at least comprises flight path data and state data, and performing time alignment and abnormal rejection on the multi-source observation data to obtain a standardized observation sequence; S2, constructing an unmanned aerial vehicle group dynamic diagram based on a standardized observation sequence, wherein nodes of the unmanned aerial vehicle group dynamic diagram correspond to unmanned aerial vehicle individuals, edges of the unmanned aerial vehicle group dynamic diagram correspond to interaction relations among the unmanned aerial vehicle individuals, and the interaction relations are at least characterized by one or more of relative distance, relative heading and communication accessibility; S3, inputting the unmanned aerial vehicle group dynamic diagram into a graphic neural network, extracting group cooperative characteristics representing unmanned aerial vehicle group cooperative behaviors, and performing role decomposition on each unmanned aerial vehicle individual based on the group cooperative characteristics to obtain a role result; S4, constructing a multi-agent reinforcement learning strategy for unmanned aerial vehicle group flight intention recognition, wherein the multi-agent reinforcement learning strategy takes group cooperative characteristics and role results as state input, and introduces causal consistency constraint; S5, in the identification stage, the multi-source observation data obtained in real time are processed according to the steps S1 to S3 to obtain real-time group cooperative characteristics and real-time role results, a trained multi-agent reinforcement learning strategy is input, an intention result of unmanned aerial vehicle group flight intention is output, and a evidence result corresponding to the intention result is output.
2. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 1, further comprising S6, performing domain drift detection on multisource observation data in a recognition stage, performing online correction on a multi-agent reinforcement learning strategy based on causal consistency constraint when the domain drift is detected to meet a preset trigger condition, obtaining corrected intention results and corrected evidence results, and outputting the corrected intention results as final unmanned aerial vehicle group flight intention recognition.
3. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 1, wherein S1 specifically comprises: Acquiring multi-source observation data of an unmanned aerial vehicle group to be identified in a target airspace, wherein the multi-source observation data at least comprises track data and state data, and adding a uniform time stamp and an unmanned aerial vehicle identifier to each observation record; Performing time alignment on the multi-source observation data, and mapping the observation records of different sources to the same time axis according to a preset sampling period to obtain an aligned observation sequence; Performing abnormal elimination on the aligned observation sequence, wherein the abnormal elimination at least comprises deletion fragment processing, mutation point screening and physical constraint verification, so as to obtain a cleaned observation sequence; And carrying out normalization and unit unification processing on the washed observation sequence, and outputting a standardized observation sequence which is input as a subsequent step.
4. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 1, wherein S2 specifically comprises: Defining a node set of the unmanned aerial vehicle group dynamic diagram based on a standardized observation sequence, wherein the nodes correspond to unmanned aerial vehicle individuals one by one, and binding a node state vector from the standardized observation sequence for each node; Constructing a candidate edge set based on a standardized observation sequence, and calculating a candidate interaction relation at the same time for any two nodes, wherein the interaction relation is at least characterized by one or more of a relative distance, a relative course and communication reachability; computing edge characteristics of the candidate edge set, and forming initial assignment of edge weights to obtain candidate graph representation; and optimizing the edge selection threshold and the edge weight parameter represented by the candidate graph by adopting a particle swarm algorithm, so that the constructed graph can meet the targets of clearer cooperative structure and more stable cross-environment disturbance in the subsequent role decomposition, and the unmanned aerial vehicle group dynamic graph input as the subsequent step is output.
5. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 1, wherein S3 specifically comprises: inputting the unmanned aerial vehicle group dynamic graph into a graph neural network, and transmitting information of node information and side information to obtain cooperative characterization of each node; Carrying out image-level aggregation on the collaborative characterization of each node to obtain group collaborative features for characterizing the overall collaborative behavior of the unmanned aerial vehicle group; performing role candidate division on the unmanned aerial vehicle individuals based on the group cooperative characteristics to form a role candidate set, wherein the role candidate set is used for describing the functional division of the unmanned aerial vehicle individuals in group cooperative; and executing role determination on the role candidate set, and outputting a role result input as a subsequent step, wherein the role result at least comprises the role labels and the role confidence degrees of the unmanned aerial vehicle individuals.
6. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 1, wherein S4 specifically comprises: Constructing training samples, enabling state input of each training moment to be composed of the group cooperative characteristics and the role results, and constructing corresponding training target sets; Generating a counterfactual disturbance sample for the standardized observation sequence in a training stage, wherein the counterfactual disturbance sample changes environment related factors on the premise of keeping task intention unchanged so as to form a comparison sample pair for stability constraint; adopting a particle swarm algorithm to carry out joint optimization on the rewarding weight and the causal consistency constraint weight of the multi-agent reinforcement learning strategy, so that the strategy simultaneously meets the purpose of improving the discrimination performance and improving the discrimination consistency of the anti-reality disturbance sample; Training under the constraint of the combined optimizing result to obtain a multi-agent reinforcement learning strategy, and solidifying the verification rule of the causal consistency constraint to be used as the reasoning and correction basis in the identification stage.
7. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 1, wherein S5 specifically comprises: acquiring multisource observation data of an unmanned aerial vehicle group to be identified in real time, and generating a real-time standardized observation sequence according to the processing flow of the S1; based on the real-time standardized observation sequence, constructing a real-time unmanned aerial vehicle group dynamic diagram according to the processing flow of the S2; The real-time unmanned aerial vehicle group dynamic diagram is processed according to the S3 to obtain real-time group cooperative characteristics and real-time role results; inputting the real-time group cooperative characteristics and the real-time role results into the multi-agent reinforcement learning strategy, outputting the intention results of unmanned aerial vehicle group flight intention, and synchronously outputting the evidence results corresponding to the intention results.
8. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 2, wherein S6 specifically comprises: Performing domain drift detection on the multisource observation data in the identification stage and the corresponding group cooperative characteristics to obtain a domain drift measurement result; comparing the domain drift measurement result with a preset trigger condition, and judging that online correction is required to be executed when the preset trigger condition is met; and when online correction is executed, based on the recent standardized observation sequence and the corresponding anti-reality disturbance sample, carrying out light update on the multi-agent reinforcement learning strategy according to the causal consistency constraint to obtain a corrected multi-agent reinforcement learning strategy.
9. The unmanned aerial vehicle flight intention recognition method based on reinforcement learning according to claim 8, wherein S6 further comprises adaptively optimizing the update step length and the trigger threshold of the online correction by using a particle swarm algorithm, enabling the correction process to quickly recover intention discrimination performance while maintaining stability, and outputting a final intention result and a final evidence result as final results.
10. An unmanned aerial vehicle flight intention recognition system based on reinforcement learning, which adopts the unmanned aerial vehicle flight intention recognition method based on reinforcement learning as claimed in any one of claims 1 to 9, and is characterized by comprising: The multi-source observation data acquisition and preprocessing module is used for acquiring multi-source observation data of the unmanned aerial vehicle group and executing time alignment, abnormal removal and standardization processing on the multi-source observation data; The unmanned aerial vehicle group dynamic diagram construction module is used for constructing an unmanned aerial vehicle group dynamic diagram based on a standardized observation sequence, wherein the unmanned aerial vehicle group dynamic diagram comprises a node set and an edge set, the nodes correspond to unmanned aerial vehicle individuals, and the edges are used for representing interaction relations among the unmanned aerial vehicle individuals; The group cooperative feature and role decomposition module is used for inputting the unmanned aerial vehicle group dynamic graph into the graph neural network, generating node cooperative characterization of unmanned aerial vehicle individuals based on a message transmission mechanism, and performing graph-level aggregation on the node cooperative characterization to obtain the group cooperative feature; The multi-agent reinforcement learning strategy module is used for taking a global state formed by group cooperative characteristics and character results as input in a training stage and outputting an intention probability vector of the unmanned aerial vehicle flight intention; The parameter joint optimizing module is used for carrying out joint optimizing on the intention recognition loss weight and the causality consistency loss weight by adopting a particle swarm algorithm so as to determine an optimal weight parameter for training the multi-agent reinforcement learning strategy module; The recognition and evidence generation module is used for outputting an intention probability vector based on a multi-agent reinforcement learning strategy solidified after training in a recognition stage and uniquely determining the flight intention of the unmanned aerial vehicle according to a maximum probability principle; The domain drift detection and online correction module is used for executing domain drift detection on the group cooperative features in the identification stage, and executing online correction on the multi-agent reinforcement learning strategy based on the latest observation data when the detection result meets the preset trigger condition.

Description

Unmanned aerial vehicle flight intention recognition method and system based on reinforcement learning Technical Field The invention relates to the technical field of unmanned aerial vehicle system information processing, in particular to an unmanned aerial vehicle flight intention recognition method and system based on reinforcement learning. Background With the wide application of unmanned aerial vehicles in the fields of inspection monitoring, emergency rescue, security patrol, airspace management and the like, the unmanned aerial vehicle operation environment gradually presents the characteristics of task complicacy, airspace densification and behavior diversification. In the application scene, the flight intention of the unmanned aerial vehicle can be accurately identified, and the unmanned aerial vehicle is an important precondition for realizing risk early warning, cooperative scheduling and safety control. However, the existing unmanned aerial vehicle flight intention recognition technology still has the defects that: firstly, in the prior art, intention judgment is carried out based on track features or rule models of single unmanned aerial vehicles, and classification judgment is usually carried out by relying on low-dimensional features such as speed, course, acceleration and the like or manually set thresholds, so that the method is difficult to describe interaction relations among the multiple unmanned aerial vehicles, and is difficult to effectively identify flight intention of group levels such as formation flight, collaborative reconnaissance, induced interference and the like; Secondly, although a track prediction or behavior classification model is introduced in a method based on machine learning or deep learning, mostly unmanned aerial vehicle individuals are regarded as mutually independent objects, and modeling of the overall structure and cooperative behavior of the unmanned aerial vehicle group is lacked; meanwhile, the method is often highly dependent on training data distribution, and when the flight environment, communication conditions or sensor state changes, the problem of obvious degradation of recognition performance easily occurs. Therefore, we propose an unmanned aerial vehicle flight intention recognition method and system based on reinforcement learning. The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art. Disclosure of Invention The invention aims at overcoming the defects of the prior art, and provides an unmanned aerial vehicle flight intention recognition method and system based on reinforcement learning, which solve the technical problems in the background art. In order to achieve the above purpose, the present invention provides the following technical solutions: unmanned aerial vehicle flight intention recognition method and system based on reinforcement learning, comprising the following steps: S1, acquiring multi-source observation data of an unmanned aerial vehicle group to be identified in a target space domain, wherein the multi-source observation data at least comprises flight path data and state data, and performing time alignment and abnormal rejection on the multi-source observation data to obtain a standardized observation sequence; S2, constructing an unmanned aerial vehicle group dynamic diagram based on a standardized observation sequence, wherein nodes of the unmanned aerial vehicle group dynamic diagram correspond to unmanned aerial vehicle individuals, edges of the unmanned aerial vehicle group dynamic diagram correspond to interaction relations among the unmanned aerial vehicle individuals, and the interaction relations are at least characterized by one or more of relative distance, relative heading and communication accessibility; S3, inputting the unmanned aerial vehicle group dynamic diagram into a graphic neural network, extracting group cooperative characteristics representing unmanned aerial vehicle group cooperative behaviors, and performing role decomposition on each unmanned aerial vehicle individual based on the group cooperative characteristics to obtain a role result; S4, constructing a multi-agent reinforcement learning strategy for unmanned aerial vehicle group flight intention recognition, wherein the multi-agent reinforcement learning strategy takes group cooperative characteristics and role results as state input, and introduces causal consistency constraint; S5, in the identification stage, the multi-source observation data obtained in real time are processed according to S1 to S3 to obtain real-time group cooperative characteristics and real-time role results, a trained multi-agent reinforcement learning strategy is input, an intention result of unmanned aerial vehicle group flight intention is output, and a eviden