CN-120469230-B - Self-learning training method and system for maneuvering flight control strategy of aircraft

CN120469230BCN 120469230 BCN120469230 BCN 120469230BCN-120469230-B

Abstract

The invention discloses a self-learning training method and a self-learning training system for a maneuvering flight control strategy of an aircraft, which relate to the technical field of fighter aircraft control and comprise the steps of establishing a flight dynamics model of a target fighter aircraft based on the engine performance and the flight dynamics performance of the target fighter aircraft; the method comprises the steps of constructing a reinforcement learning intelligent body for controlling a control strategy of a target fighter aircraft based on a flight dynamics model and a reinforcement learning algorithm, training the reinforcement learning intelligent body by taking the fighter aircraft controlled by an initial strategy as an opponent to obtain a trained reinforcement learning intelligent body, and repeatedly training the reinforcement learning intelligent body to achieve a target training period by periodically replacing the trained reinforcement learning intelligent body with the opponent of the target fighter aircraft based on a self-game training method to obtain the target reinforcement learning intelligent body. The invention relieves the technical problem that the prior art is difficult to obtain a wide effective countermeasure strategy.

Inventors

DENG XIANGYANG
XU TAO
YU YINGFU
ZHANG YUCHEN
ZHU HONGJI

Assignees

中国人民解放军海军航空大学

Dates

Publication Date: 20260512
Application Date: 20250513

Claims (8)

1. A method for self-learning training of an aircraft maneuver flight control strategy, comprising: Based on the engine performance and the flight dynamics performance of the target fighter aircraft, establishing a flight dynamics model of the target fighter aircraft; constructing a reinforcement learning agent for controlling a control strategy of the target combat aircraft based on the flight dynamics model and reinforcement learning algorithm; training the reinforcement learning intelligent agent by taking a combat aircraft controlled by an initial strategy as an opponent to obtain a trained reinforcement learning intelligent agent; based on a self-game training method, the reinforcement learning agent is periodically replaced by an opponent of the target combat aircraft, and repeated training is carried out on the reinforcement learning agent to reach a target training period, so that the target reinforcement learning agent is obtained; the reward function for training the reinforcement learning agent includes: ; Wherein, the For a single step prize value obtained by the hierarchical analysis, The round prize value obtained for the radar scan determination, And Coefficients for a single step prize and a round prize, respectively; ; ; In the formula, As an indicator of the threat of speed, As an index of the angular threat, As an indicator of the distance threat, As an indicator of a high degree of threat, 、、、 As the weight factor of the weight factor, A round prize value indicating when the machine wins, Indicating the round prize value when the opponent wins.
2. The method of claim 1, wherein the state space of the reinforcement learning agent comprises: ; Wherein, the For the altitude value of the machine, 、 And The north component, the east component and the sky component of the position vector difference value of the local machine and the enemy machine under the local coordinate system are respectively, 、 And The north component, the east component and the sky component of the velocity vector difference values of the local machine and the enemy machine under the local coordinate system are respectively, 、 And The north direction speed, the east direction speed and the heaven direction speed of the machine are respectively, The machine head of the machine is pointed in the direction, The pitch angle of the machine is the pitch angle of the machine, The rolling angle of the machine is the rolling angle of the machine, The machine head of the enemy plane is pointed in the direction, The pitch angle of the enemy plane is the pitch angle; The action space of the reinforcement learning agent comprises: ; Wherein, the Which is indicative of an elevator command, A rudder command is indicated and, Representing the aileron command and, Indicating throttle command.
3. The method of claim 1, wherein the speed threat indicator comprises: ; In the formula, And Respectively representing the flight speeds of the local aircraft and the enemy aircraft; the angle threat indicators include: ; In the formula, And Attack angles of a local machine and an enemy machine respectively; the distance threat indicators include: ; In the formula, The distance between the local machine and the enemy plane is represented, And The maximum attack distances of the local machine and the enemy machine are respectively represented, The maximum detection distance of the enemy plane; The high threat indicators include: ; In the formula, And The flying heights of the local aircraft and the enemy aircraft are respectively.
4. The method of claim 1, wherein the initial strategically controlled fighter aircraft comprises a fighter aircraft decision model fused with PID control and based on preset prior knowledge.
5. The method of claim 1, wherein training the reinforcement learning agent comprises training the reinforcement learning agent based on DDPG algorithm employing an Actor-Critic structure, wherein, The Actor network is used for inputting observation and acquisition actions and generating final actions after adding noise: ; in the formula, a represents the final action, In order to act on the network, Parameters representing the network of actions in question, Representing an observation of the current action, Representing added noise; The Critic network is used for finishing updating by calculating the difference between the current Q value and the target Q value: ; In the formula, The Q value of the target is indicated, Indicating the prize to be awarded in the current step, A discount factor representing the reward, Representing the target network of Critic, And (3) with Respectively representing the observed value and the action of the next interaction step, Is a parameter of the target network; In the update phase, the Critic network updates by means of a mean square error: ; where J () represents the mean square error loss function, The predicted Q value is indicated as such, Representing network parameters, s j and a j representing the observed value and action of the j-th sample, respectively, and m representing the number of samples; The Actor network is updated by the following loss gradients: ; In the formula, The loss gradient function is represented as a function of the loss, Representing a gradient function for the motion, A gradient function representing parameters to the action network.
6. A self-learning training system for an aircraft maneuver flight control strategy is characterized by being used for realizing the self-learning training method for the aircraft maneuver flight control strategy according to any one of claims 1-5, wherein the system comprises a building module, a first training module and a second training module, The building module is used for building a flight dynamics model of the target fighter aircraft based on the engine performance and the flight dynamics performance of the target fighter aircraft; The building module is used for building a reinforcement learning intelligent agent for controlling the control strategy of the target combat aircraft based on the flight dynamics model and a reinforcement learning algorithm; The first training module is used for training the reinforcement learning intelligent agent by taking the combat aircraft controlled by the initial strategy as an opponent to obtain a trained reinforcement learning intelligent agent; the second training module is configured to perform repeated training on the reinforcement learning agent to achieve a target training period by periodically replacing the trained reinforcement learning agent with an opponent of the target combat aircraft based on a self-game training method, so as to obtain the target reinforcement learning agent.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1-5 when executing the computer program.
8. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-5.

Description

Self-learning training method and system for maneuvering flight control strategy of aircraft Technical Field The invention relates to the technical field of fighter aircraft control, in particular to a self-learning training method and system for a maneuvering flight control strategy of an aircraft. Background The fighter aircraft maneuvering flight control is to occupy the favorable situation, so as to achieve the countermeasure mode of obtaining the dominant position of the battlefield. In actual countermeasure, pilots need to grasp the operation and the use of weapons of complex aircrafts, and simultaneously need to evaluate the situation of the battlefield along with the positions and movements of aircrafts of both sides of the battlefield, so as to make an effective air combat strategy. The opposing situation also has the characteristic of rapid change due to the rapid movement speed of the aircraft itself, etc., which makes it more difficult for pilots to quickly understand the battlefield conditions and quickly formulate or change strategies. In view of this focus, rapid decisions, automated decisions, etc. in the challenge field are becoming the main issue of research. In recent years, intelligent game countermeasure has been widely studied, and researchers have proposed many methods such as expert system method, matrix game method, differential countermeasure, etc., which also show a certain intelligent countermeasure capability in the countermeasure process. In 2019 Wang Xuan et al discussed the use of evolutionary expert system tree based in unmanned aerial vehicle air combat decisions. In a 2-dimensional countermeasure environment, a medium-distance air combat is taken as a study scene, a genetic algorithm is combined with an expert system, and an aircraft combat decision system based on an evolutionary expert system tree is established. The decision system is constructed based on series rules, firstly, if-then statement description is made on series conditions occurring in combat, then state identifiers corresponding to partial states are obtained, then the obtained series state identifiers are combined and then are processed by an evaluation layer and a decision layer, and finally, decision instruction output is obtained. Experimental result verification shows that the decision-making testing system can be adjusted according to the battlefield condition, and a certain winning target can be achieved under the condition that own flight performance is not dominant. However, their methods have severe limitations, if-then condition identification is mainly obtained based on human experience, has high subjectivity, and as the number of inputs to the decision making system increases, the rule base scale increases exponentially, resulting in combinatorial explosion. In addition, the policy optimization by means of the genetic algorithm is difficult to avoid the difficulty brought by the genetic algorithm's own gene length to the search. The method has certain applicability in the two-dimensional space of the research, and the effectiveness in the three-dimensional combat environment which is closer to the real environment needs to be further verified. In 2022, shouyi Li et al propose a fast algorithm for solving a large-scale matrix game based on dimension reduction based on a former matrix game algorithm, and verify the feasibility of the algorithm in the field of unmanned aerial vehicle combat decision. And researching the fight strategies of the fight parties as the minimum calculation units in the matrix game, wherein the two parties respectively have a set of mutually independent fight strategy sets, and assume that one strategy is randomly adopted in the fight process. The research provides a matrix game entanglement algorithm based on dimension reduction on the basis of the method, which is used for large-scale matrix games and verifies the existence of Nash equilibrium in the games. It should be noted, however, that such methods employ a minimum test unit for the challenge strategy during the course of the experiment, which is subjective on the assumption of the challenge strategy. In addition, the assumption that both parties adopt random strategies in countermeasure has strong constraint, and is difficult to apply in real countermeasure environment. Due to the high-dimensional input in air combat, a large amount of calculation amount of multi-continuous variable control and the characteristics of continuous decision of the algorithm, the traditional algorithm has the problems of poor adaptability, complex calculation and difficulty in meeting the real-time performance, and a more advanced method needs to be considered to be introduced to solve the problem. In 2021, dongyuan Hu effectively solves the problem of real-time calculation by introducing a reinforcement learning method in the field of air combat, the study provides a training framework combining a planning algorithm, an LSTM algorit