CN-121997972-A - Collaborative maneuvering decision-making method in multi-unmanned plane air combat game countermeasure
Abstract
The invention discloses a collaborative maneuvering decision-making method in multi-unmanned plane air combat game countermeasure, which aims to fully consider the dynamic response of enemies in the decision-making process by introducing an enemy behavior prediction model, thereby improving the robustness and flexibility of decision-making. Unlike traditional multi-unmanned plane game research, the invention has the innovation points that the enemy strategy is explicitly modeled, and the online reasoning is performed by adopting deep reinforcement learning. Specifically, the enemy behavior is predicted by establishing an enemy strategy modeler, and the perception and intention information is integrated by combining a cognitive map and a cooperative network so as to improve the global field of view of the decision. In addition, the invention also processes situation awareness information by introducing a multi-head self-attention mechanism, optimizes the prediction and evaluation of enemy reaction, and further enhances the adaptability of the decision system. The information asymmetry problem in collaborative decision-making is solved by combining a centralized training and distributed execution architecture, so that the unmanned aerial vehicle can make a decision with high efficiency based on incomplete information in an execution stage.
Inventors
- WU JIEHONG
- ZHANG NAN
- Wang Mingche
- LIU KUI
- Pu Haiyin
- YU CUNQIAN
- LIU XIANGQIN
Assignees
- 沈阳航空航天大学
Dates
- Publication Date
- 20260508
- Application Date
- 20251229
Claims (6)
- 1. A collaborative maneuvering decision-making method in multi-unmanned aerial vehicle air combat game countermeasure is characterized by comprising the following steps: the method comprises the steps of constructing a three-dimensional continuous air combat simulation environment, defining task targets for fighting unmanned aerial vehicle groups, describing the dynamic characteristics of unmanned aerial vehicles by adopting a six-degree-of-freedom motion model, wherein state variables at least comprise position, speed, attitude angle and angular speed; constructing an unmanned aerial vehicle air combat situation assessment model based on incomplete information, and assessing the relative situation relation of the enemy at each time step, wherein the calculation of the relative situation relation is related to factors such as relative angles, relative distances and the like; In the execution stage, each unmanned aerial vehicle makes a decision based on local observation thereof, wherein the local observation comprises the self state, the relative state information of the friendly unmanned aerial vehicle in a communication range and the enemy aircraft in a perception range; constructing a joint action space, and defining an executable action set of each unmanned aerial vehicle, wherein the action set comprises speed control, gesture adjustment and attack tactical actions so as to support collaborative decision modeling; Designing a multi-objective composite rewarding function, wherein the function at least comprises survival rewarding, defeating rewarding, collaborative rewarding, out-of-range punishment and round winning and losing rewarding, and the unmanned aerial vehicle is guided to realize balanced optimization among individual survival, attack efficiency and group collaboration through weight configuration; establishing an enemy strategy modeler based on a super network, predicting the action probability distribution of the enemy unmanned aerial vehicle at the future moment according to the historical environment state and the enemy action sequence, and outputting a prediction confidence; constructing a cognitive map cooperative network, modeling the perception and intention interaction relationship between the unmanned aerial vehicles in a map structure, and realizing multi-step information transmission and feature fusion between nodes through a diffusion map attention mechanism so as to improve the global situation perception and cooperative decision-making capability of a group; And constructing a centralized evaluator to perform joint optimization on time sequence differential loss, enemy prediction loss and strategy loss of an Actor of the Critic network, so that the unmanned aerial vehicle can output final maneuvering decision action based on local observation and a collaborative network in a distributed execution stage.
- 2. The collaborative maneuver decision-making method in the air combat game countermeasure of the multi-unmanned aerial vehicle according to claim 1, wherein the blue party agent set is set as The enemy gathers as The global joint state space is constructed at the time t as follows: Wherein, the = [ Xj, yj, zj ] T is the position, = [ Vxj, vyj, vzj ] T is the velocity component, phi, theta, ψ are roll, pitch, yaw angles, respectively, ωj= [ ωxj, ωyj, ωzj ] is the angular velocity, As the global timing information, a time sequence of the data is determined, During the execution phase, agent i can only be based on local observables Decision making Wherein, the For a set of friend/foe neighbors within communication range, Is in a relative state of the friend machine, The relative state of the enemy plane is expressed under the machine body coordinate system, so that , Then for any object, define Constructing relative state vectors according to the above Wherein, the For a rotational matrix of inertia to the machine body, In order for the relative distance to be a function of the distance, For the azimuth angle, In order to be the elevation angle, Is a reverse included angle of the enemy plane, For hit indication of attack area and non-escape area, according to geometric judgment, for describing time sequence information, adopting observation stack with length of H 。
- 3. The collaborative maneuver decision-making method in the air combat game countermeasure of multiple unmanned aerial vehicles according to claim 1, wherein a joint maneuver space is constructed, the maneuver of all unmanned aerial vehicles is modeled uniformly, and the set of the executable maneuver of each unmanned aerial vehicle at time t is recorded as Wherein, the Representing the j-th optional action of the unmanned aerial vehicle i in an action space, including speed adjustment, attitude adjustment and attack action, wherein the joint action space is formed by the Cartesian products of actions of all unmanned aerial vehicles; at time t, the joint action of the whole system is expressed as In an air combat scene, speed control, attitude adjustment and tactical action adjustment are performed on each unmanned aerial vehicle.
- 4. The collaborative maneuver decision-making method in the air combat game countermeasure with multiple unmanned aerial vehicles according to claim 1, wherein when the hostile strategy modeler based on the super network is established: Generation of hostile policy parameters over a super network Wherein, the For the action of the enemy in the last step, obtaining the prediction distribution of the enemy action based on an inference mode, and simultaneously evaluating the confidence coefficient Wherein, the E [0,1] indicates prediction confidence, and a near 1 indicates prediction reliability.
- 5. The collaborative maneuver decision-making method in the air combat game countermeasure of the multiple unmanned aerial vehicles according to claim 1 is characterized by constructing a cognitive map collaborative network, modeling the collaborative relationship among the multiple unmanned aerial vehicles through the network, and constructing the following perception intention encoder: Wherein, the For a local observation of the unmanned aerial vehicle i, To predict hostile policy distribution, information is propagated through a flooding graph attention network Wherein, the Representing a set of neighbors that are related to the set, For the attention-based diffusion weight, defined as: Final output projection to obtain policy actions 。
- 6. The method for collaborative maneuver decision-making in multi-unmanned aerial vehicle air combat game countermeasure according to claim 1, wherein the Critic network is configured to evaluate a state-action cost function with a penalty defined as: Wherein D is an experience playback pool, Respectively a state, an action, a reward and a next state, For the predicted action value of the Critic network, For the target value to be a target value, Is a discount factor; The objective of the enemy modeler is to minimize the difference between the predicted enemy action distribution and the actual observed action, taking the form of a mean square error: Wherein, the Is the observation information of the enemy, Is the actual action of the adversary at time t, Actions predicted for the enemy modeler; The goal of the Actor network is to maximize the expected return, and to increase robustness using hostile policy predictions, the loss of which is defined as Wherein, the In-state for an Actor network The motion profile of the lower output is that, Evaluation of action value for Critic networks The final joint loss function is defined as a three-part weighted sum, wherein, 、 、 As the weight coefficient of the light-emitting diode, 。
Description
Collaborative maneuvering decision-making method in multi-unmanned plane air combat game countermeasure Technical Field The invention relates to a collaborative maneuver decision-making method in multi-unmanned plane air combat game countermeasure, and belongs to the technical field of unmanned plane collaborative combat and autonomous decision-making control. Background With the rapid development of unmanned aerial vehicle technology, the unmanned aerial vehicle is increasingly widely applied in high-dynamic and strong-resistance air combat environments. In a multi-unmanned plane collaborative air combat scene, the complexity of decision making is greatly increased, and the decision making of each unmanned plane is not only dependent on the state and task of the unmanned plane, but also needs to predict enemy behaviors, understand global situation and efficiently cooperate with friends. The traditional decision method, such as an expert system based on static rules or domain knowledge, can execute preset tactics but lacks flexibility of coping with sudden situation and unknown strategies, the optimization theory-based method can optimize in a high-dimensional space but has large calculation amount, and is difficult to meet real-time decision requirements, and the game theory method can model strategy interaction of the two parties, but has great challenges in acquiring and processing enemy strategy information in real time in a dynamic environment with incomplete information and rapid strategy evolution. In addition, the method is generally difficult to effectively solve the problem of dimension disaster and information asymmetry in multi-machine collaboration. In recent years, the deep reinforcement learning technology provides a new approach for decision-making problems in complex dynamic environments through autonomous interactive learning of an agent and the environment. There have been studies to apply it to unmanned aerial vehicle air combat, enabling unmanned aerial vehicles to adaptively adjust maneuver strategies. However, in complex games with multiple machine antagonisms, especially flexible mutation of adversary strategies, the existing deep reinforcement learning method still has obvious limitations that firstly, most methods implicitly learn adversary behaviors, lack explicit modeling and prospective prediction of adversary strategies, result in poor reaction delay and decision robustness when facing strategy mutation, secondly, under a distributed execution framework, how to realize efficient group coordination based on partial incomplete information is still a critical problem to be solved, and finally, how to effectively integrate situation awareness, intention reasoning and collaborative decision to form global battlefield cognition, so that the existing method is still imperfect. Disclosure of Invention The invention aims to provide a collaborative maneuvering decision-making method in the multi-unmanned aerial vehicle air combat game countermeasure, aiming at the problem that the strategy of the enemy is uncertain and the collaborative decision-making is difficult in the multi-unmanned aerial vehicle air combat game, and specifically comprises the following steps: the method comprises the steps of constructing a three-dimensional continuous air combat simulation environment, defining task targets for fighting unmanned aerial vehicle groups, describing the dynamic characteristics of unmanned aerial vehicles by adopting a six-degree-of-freedom motion model, wherein state variables at least comprise position, speed, attitude angle and angular speed; constructing an unmanned aerial vehicle air combat situation assessment model based on incomplete information, and assessing the relative situation relation of the enemy at each time step, wherein the calculation of the relative situation relation is related to factors such as relative angles, relative distances and the like; In the execution stage, each unmanned aerial vehicle makes a decision based on local observation thereof, wherein the local observation comprises the self state, the relative state information of the friendly unmanned aerial vehicle in a communication range and the enemy aircraft in a perception range; constructing a joint action space, and defining an executable action set of each unmanned aerial vehicle, wherein the action set comprises speed control, gesture adjustment and attack tactical actions so as to support collaborative decision modeling; Designing a multi-objective composite rewarding function, wherein the function at least comprises survival rewarding, defeating rewarding, collaborative rewarding, out-of-range punishment and round winning and losing rewarding, and the unmanned aerial vehicle is guided to realize balanced optimization among individual survival, attack efficiency and group collaboration through weight configuration; establishing an enemy strategy modeler based on a super network, predicting the a