CN-120046494-B - Multi-to-multi-attack game method and device in obstacle environment

CN120046494BCN 120046494 BCN120046494 BCN 120046494BCN-120046494-B

Abstract

The application discloses a multi-to-multi-attack game method and a device under an obstacle environment, and relates to the technical field of unmanned aerial vehicle control, wherein the method comprises the following steps: the method comprises the steps of establishing a multi-to-multi attack and defense problem model of an unmanned aerial vehicle cluster, establishing a one-to-one attack and defense sub game problem, expanding a defending and defending winning area under differential game based on the one-to-one attack and defense sub game problem, obtaining the defending and defending winning area under differential game after expansion, expanding the defending and defending winning area under reinforcement learning, obtaining the defending and defending winning area under expansion reinforcement learning, and carrying out multi-to-multi attack and defending decision based on the multi-to-multi attack and defense problem model, the defending and defending winning area under expansion differential game and the defending and winning area under expansion reinforcement learning, so as to obtain a multi-to-multi defending decision planning result.

Inventors

YAN RUI
CHEN YIYANG
DONG XIWANG
PAN CHENGWEI
XU HAO
WANG QING
HUA YONGCHAO
FENG ZHI

Assignees

北京航空航天大学

Dates

Publication Date: 20260505
Application Date: 20250219

Claims (9)

1. The multi-to-multi-attack and defense game method in the obstacle environment is characterized by comprising the following steps of: Establishing a multi-to-multi attack and defense problem model of an unmanned aerial vehicle cluster, wherein the unmanned aerial vehicle cluster comprises a plurality of aggressor unmanned aerial vehicles and a plurality of defender unmanned aerial vehicles, and the multi-to-multi attack and defense problem model comprises a attack and defense differential game two-party dynamics model and a polygon obstacle game scene model; Establishing a one-to-one attack and defense sub game problem, wherein the one-to-one attack and defense sub game problem comprises a one-to-one attack and defense sub game model, a defending and win-over area under differential game and a defending and win-over area under reinforcement learning, wherein the defending and win-over area is composed of a plurality of defending and win-over states, and the defending and win-over states are states of a defending unmanned plane to win; The method comprises the steps of establishing a one-to-one attack and defense sub game model, establishing a defending winning state, defending winning region, attacking winning state and attacking winning region model, establishing a winning state and winning region model under differential game and reinforcement learning algorithms by utilizing differential game and reinforcement learning algorithms, establishing a forward reachable set and backward reachable set region model, establishing a defending winning region under new reinforcement learning by utilizing reinforcement learning algorithms based on the forward reachable set and backward reachable set region model and the defending winning region under reinforcement learning, and expanding the defending winning region under new reinforcement learning to obtain the defending winning region under expanded reinforcement learning; Based on the one-to-one attack and defense sub game problem, the defending winning area under the differential game is expanded to obtain the defending winning area under the expanded differential game, which concretely comprises the steps of establishing an European shortest distance path, an European shortest distance reachable area, a shortest path area diagram and a convex target coverage polygon model, establishing MOCG strategy, constructing an on-site approaching target differential game defender unmanned plane algorithm based on the European shortest distance expansion MOCG algorithm, namely MOCG-ESP strategy, wherein the MOCG-ESP strategy is to any state All choose the strategy as For the rest states, the selection strategy is , wherein, For the defending winning area under differential gaming, Is the control input of the defender unmanned plane, Refers to the position of the unmanned aerial vehicle from the defender Pointing to Is used for the vector of the unit of (a), Refers to that in In accordance with the sequence Is provided in the following point of the (c), Unmanned aerial vehicle position for defender Location with an attacker drone Is a shortest distance path of the Europe; expanding the defending and winning area under reinforcement learning to obtain the defending and winning area under the expanded reinforcement learning; And carrying out a many-to-many defending decision based on the many-to-many defending problem model, the defending winning area under the expanded differential game and the defending winning area under the expanded reinforcement learning, so as to obtain a many-to-many defending decision planning result.
2. The multi-to-multi attack game method under the obstacle environment according to claim 1, wherein the attack winning state is a state in which an attacker unmanned aerial vehicle obtains a winning state; the victory state and victory region model under the differential game and reinforcement learning algorithm comprises a defending victory state under the differential game, a defending victory region under the differential game, a defending victory state under reinforcement learning and a defending victory region under reinforcement learning.
3. The multi-to-multi-attack game method in an obstacle environment according to claim 1, wherein expanding a defending and winning area under reinforcement learning to obtain the defending and winning area under expanded reinforcement learning specifically comprises: and optimizing a reinforcement learning algorithm by utilizing a near-end strategy, and expanding a defense winning area under reinforcement learning to obtain the defense winning area under expanded reinforcement learning.
4. The method for multi-to-multi attack and defense game under the obstacle environment according to claim 1, wherein the multi-to-multi attack and defense decision making is performed based on the multi-to-multi attack and defense problem model, the defending and win-over area under the expanded differential game and the defending and win-over area under the expanded reinforcement learning, so as to obtain a multi-to-multi defending and defense decision planning result, which specifically comprises: Constructing a subgraph of the unmanned aerial vehicle cluster based on the many-to-many attack and defense problem model, the defending winning area under the expanded differential game and the defending winning area under the expanded reinforcement learning, wherein the subgraph comprises point characteristics and edge characteristics, the point characteristics are positions of an attacker unmanned aerial vehicle and a defending unmanned aerial vehicle, the edge characteristics represent that the current states of the attacker unmanned aerial vehicle and the defending unmanned aerial vehicle belong to the defending winning area under the expanded differential game or are combined, and the combined is a combined set of the defending winning area under the expanded differential game and the defending winning area under the expanded reinforcement learning; Based on the subgraph, establishing a binary integer programming problem model matched with unmanned aerial vehicle cluster tasks; And solving the binary integer programming problem model to obtain a many-to-many defending decision programming result.
5. A multi-to-multi-tap gaming device in an obstacle environment based on the multi-to-multi-tap gaming method in an obstacle environment as claimed in any one of claims 1 to 4, wherein the multi-to-multi-tap gaming device in an obstacle environment comprises: the system comprises a multi-to-multi-attack-and-defense problem model establishing module, a multi-to-multi-attack-and-defense problem model establishing module and a multi-to-multi-attack-and-defense problem model establishing module, wherein the unmanned aerial vehicle cluster comprises a plurality of attacker unmanned aerial vehicles and a plurality of defender unmanned aerial vehicles; The system comprises a one-to-one attack and defense sub game problem establishing module, a one-to-one attack and defense sub game problem establishing module and a one-to-one attack and defense sub game problem establishing module, wherein the one-to-one attack and defense sub game problem comprises a one-to-one attack and defense sub game model, a defending winning area under differential game and a defending winning area under reinforcement learning; The defending and winning area expanding module under the differential game is used for expanding the defending and winning area under the differential game based on the one-to-one attack and defense sub-game problem to obtain the defending and winning area under the differential game after expansion; The defending and winning area expansion module under reinforcement learning is used for expanding the defending and winning area under reinforcement learning to obtain the defending and winning area under expansion reinforcement learning; The multi-to-multi defense decision module is used for carrying out multi-to-multi defense decision based on the multi-to-multi attack and defense problem model, the defense winning area under the expanded differential game and the defense winning area under the expanded reinforcement learning, so as to obtain a multi-to-multi defense decision planning result.
6. The multi-to-multi-tap gaming device in a barrier environment of claim 5, wherein the one-to-one tap-to-multiple-tap gaming problem creation module is configured to: Establishing a one-to-one attack and defense sub game model; establishing a defending winning state, a defending winning region, an attack winning state and an attack winning region model, wherein the attack winning state is a winning state acquired by an attacker unmanned aerial vehicle; establishing a winning state and a winning region model under the differential game and reinforcement learning algorithm by utilizing the differential game and reinforcement learning algorithm, wherein the winning state and the winning region model under the differential game and reinforcement learning algorithm comprise a defending winning state under the differential game, a defending winning region under the differential game, a defending winning state under reinforcement learning and a defending winning region under reinforcement learning; establishing a forward reachable set and a backward reachable set region model; And constructing a new defense winning area under reinforcement learning based on the forward reachable set and the backward reachable set area model and the defense winning area under reinforcement learning by using a reinforcement learning algorithm, wherein the defense winning area under reinforcement learning is used for expanding to obtain the defense winning area under expansion reinforcement learning.
7. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the many-to-many hacking and defending game method in a hostile environment as claimed in any one of claims 1-4.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the many-to-many offensive and offensive gaming method of any of claims 1-4 in an obstacle environment.
9. A computer program product comprising a computer program which, when executed by a processor, implements the many-to-many hacking and defense gaming method in a barrier environment as claimed in any one of claims 1-4.

Description

Multi-to-multi-attack game method and device in obstacle environment Technical Field The application relates to the technical field of unmanned aerial vehicle control, in particular to a multi-to-multi-attack game method and device in an obstacle environment. Background The unmanned aerial vehicle cluster attack and defense differential game considers a resistance scene that a group of unmanned aerial vehicles (or attack unmanned aerial vehicles) are to be avoided to enter an area protected by a group of tracking unmanned aerial vehicles (or defending unmanned aerial vehicles), is more challenging and has wider application scenes than the attack and defense differential game, such as attacking certain key facilities or high-value targets. The problem is more challenging and practical because the target of the attacking drone is not only prevented from being captured by the defending drone, but also approaches some critical targets, while the defending drone is not only required to intercept the attacking drone, but is also required to pay attention to the protection of the target area. In the unobstructed environment, many researchers have studied the attack and defense differential game problem, and the differences are in game space, capturing conditions, information structure, winning team, speed ratio, attack and defense both sides dynamics, target area, etc. However, the current research is not very much directed to the problem of attack and defense differential gaming in the presence of a obstructed environment. In recent years, learning-based methods are often used to solve attack-defense differential games. The related art developed a simulated learning framework that utilized a graph neural network and a centralized expert algorithm. To combat a faster evasion, the related art proposes a deep reinforcement learning (Reinforcement Learning, RL) approach to train tracking strategies by fixing an analysis strategy for the evasion. However, learning-based approaches may suffer from problems of interpretability and lack of performance guarantees, which are critical in an antagonistic scenario. In addition, aiming at the analysis of the attack and defense problems, only a relatively conservative analysis result is often obtained in the environment with the obstacle, and in the unmanned aerial vehicle cluster attack and defense problems in the polygonal obstacle environment with the anti-non-game scene, the learning-based method is poor in interception effect. Disclosure of Invention The application aims to provide a multi-to-multi-attack game method and device under an obstacle environment, which can intercept attack unmanned aerial vehicles as much as possible in the problem of attack and defense of unmanned aerial vehicle clusters with polygonal obstacles, so that a target area is protected to the greatest extent, and the success rate of defending is improved. In order to achieve the above object, the present application provides the following solutions: in a first aspect, the present application provides a method for multi-to-multi-attack and defense gaming in an obstacle environment, including: Establishing a multi-to-multi attack and defense problem model of an unmanned aerial vehicle cluster, wherein the unmanned aerial vehicle cluster comprises a plurality of aggressor unmanned aerial vehicles and a plurality of defender unmanned aerial vehicles, and the multi-to-multi attack and defense problem model comprises a attack and defense differential game two-party dynamics model and a polygon obstacle game scene model; Establishing a one-to-one attack and defense sub game problem, wherein the one-to-one attack and defense sub game problem comprises a one-to-one attack and defense sub game model, a defending and win-over area under differential game and a defending and win-over area under reinforcement learning, wherein the defending and win-over area is composed of a plurality of defending and win-over states, and the defending and win-over states are states of a defending unmanned plane to win; expanding a defending winning area under the differential game based on the one-to-one attack and defense sub-game problem to obtain the defending winning area under the differential game after expansion; expanding the defending and winning area under reinforcement learning to obtain the defending and winning area under the expanded reinforcement learning; And carrying out a many-to-many defending decision based on the many-to-many defending problem model, the defending winning area under the expanded differential game and the defending winning area under the expanded reinforcement learning, so as to obtain a many-to-many defending decision planning result. Optionally, establishing a one-to-one attack and defense sub game problem specifically includes: Establishing a one-to-one attack and defense sub game model; establishing a defending winning state, a defending winning region, an attack winning state and an at