CN-121997714-A - Reinforced learning environment simulation platform and method in unmanned aerial vehicle defense

CN121997714ACN 121997714 ACN121997714 ACN 121997714ACN-121997714-A

Abstract

The specification provides a reinforcement learning environment simulation platform and a reinforcement learning environment simulation method in unmanned aerial vehicle defense, and relates to the technical fields of unmanned aerial vehicle interception mechanism and reinforcement learning. The simulation platform comprises an unmanned aerial vehicle model library, an initialization module, a reset module, an unmanned aerial vehicle defense stepping module and an unmanned aerial vehicle defense man-machine interaction module. Aiming at the problem that the current reinforcement learning-based anti-unmanned aerial vehicle task allocation method lacks a corresponding reinforcement learning simulation environment, a general reinforcement learning environment model and a simulation platform with high reality are constructed through modeling and simulation of unmanned aerial vehicles, various interception equipment, detection equipment, interception countermeasure processes and detection tracking processes, the problems that the reliability of an intelligent agent obtained through the conventional reinforcement learning training is insufficient and the intelligent agent is only suitable for specific scenes are solved, and the decision making capability of the intelligent agent in a complex environment is improved.

Inventors

HAO WEI
DING WANYING
XIE SHAOFEI
YAN JINGRUI
Guo taoyuan

Assignees

航天江南(北京)创新技术研究院有限公司

Dates

Publication Date: 20260508
Application Date: 20251228

Claims (10)

1. The reinforcement learning environment simulation platform in unmanned aerial vehicle defense is characterized by comprising an unmanned aerial vehicle model library, an initialization module, a reset module, an unmanned aerial vehicle defense stepping module and an unmanned aerial vehicle defense man-machine interaction module; the unmanned plane model library comprises abstract models of unmanned planes, detection equipment and six types of interception equipment, and each model comprises an ID, a type, a state, a position attribute and an adapted calculation method; The initialization module is used for performing system initialization, target initialization, detection equipment initialization and interception equipment initialization, and generating an unmanned aerial vehicle instance dictionary, an unmanned aerial vehicle instance list, a detection equipment instance list and an interception equipment instance list; The resetting module is used for resetting the system parameters, the target examples, the detection equipment examples and the interception equipment examples; The unmanned aerial vehicle defense stepping module comprises a feature acquisition sub-module and a state migration sub-module, wherein the feature acquisition sub-module is used for extracting feature tensors containing target information and interception equipment information by calculating the states of target unmanned aerial vehicles, interception equipment and detection equipment examples in the current moment environment; The man-machine interaction module comprises a wanted editing sub-module and an countermeasure simulation sub-module, wherein the wanted editing sub-module is used for setting system parameters, target parameters, interception detection equipment parameters, previewing equipment deployment positions and power coverage areas of the detection equipment and the interception equipment, and the countermeasure simulation sub-module is used for loading wanted, dynamic simulation countermeasure process and dynamic adjustment of simulation speed.
2. The simulation platform of claim 1, wherein the feature acquisition sub-module is specifically configured to: s11, traversing a target instance list, and updating a trackable detection equipment id list corresponding to each target; s12, traversing an interception equipment instance list, and updating a guidable detection equipment id list corresponding to each interception equipment; s13, sampling states of all targets to generate target characteristics T, wherein the states of the targets which can be sampled comprise target types, northeast coordinates, northeast direction speeds, threat degrees and intercepted states; S14, calculating a shooting condition matrix F of the interception equipment to the target, wherein the shooting condition matrix F is a three-dimensional matrix, the element index is started from 1, and a matrix element F 1ij =1 indicates that the ith interception equipment has a shooting condition to the jth target, otherwise, the j target is taken to be 0; S15, performing state sampling on all interception equipment to generate an interception equipment feature A; the state of the interception equipment which can be sampled comprises the shot prediction encountering time, the shot prediction encountering distance, the killing probability, the interception cost, the number of residual resources and the number of residual channels of the target; s16, normalizing the target features and the interception equipment features; s17, converting the format of the target feature T and the interception equipment feature A.
3. Simulation platform according to claim 2, characterized in that the target feature T is represented in matrix form: Wherein, element t i1 represents the type of the ith target, the micro unmanned aerial vehicle value is 1, the light unmanned aerial vehicle value is 2, the patrol bomb value is 3;t i2 ～t i4 represents the northeast three-dimensional coordinate of the ith target, t i5 ～t i7 represents the northeast direction speed component of the ith target, element t i8 represents the threat degree of the ith target, element t i9 represents the intercepted state of the ith target, the intercepted value is 1, and the non-intercepted value is 0.
4. A simulation platform according to claim 3, wherein the intercept equipment feature a matrix representation: Wherein a m1 represents the interception cost of the mth interception equipment, a m2 represents the number of remaining resources of the mth interception equipment, a m3 represents the number of remaining channels of the mth interception equipment, a m,3n+1 represents the predicted shot encountering time of the mth interception equipment to the nth target, a m,3n+2 represents the predicted shot encountering distance of the mth interception equipment to the nth target, a m,3n+3 represents the killing probability of the mth interception equipment to the nth target, and a m,3m+3 =0 when the interception equipment does not have shooting conditions to the targets, namely f 1mn =0.
5. The simulation platform of claim 4, wherein the state migration submodule is specifically configured to: s21, initializing a reward matrix R into a two-dimensional matrix of 6 rows and n columns, wherein n represents the number of attack targets; S22, traversing a target instance list, responding to the input interception action, filling a reward matrix R, and recording reward information; s23, calculating average rewards R, wherein r=sum (rpn.R)/n, wherein sum (DEG) represents summation calculation, rpn is a positive and negative rewards weight list of system parameters; S24, updating simulation time, and increasing the simulation time by one simulation time step; s25, traversing the target instance list, the detection equipment instance list and the interception equipment list to update the environment state; s26, judging the end of the round; s27, outputting a reward r and a round ending mark done.
6. The simulation platform of claim 5, wherein the manner in which the environmental status is updated comprises: s251, traversing a target instance list, and calling a related method of a target class to update the state, the position and the speed of a target; S252, traversing an interception equipment instance list, and updating the state, azimuth angle and pitch angle of the interception equipment, wherein the state attribute of the interception equipment is switched by a random number theta; S253, traversing a detection equipment instance list, and updating the state, the pointing angle, the allocated target list and the number of remaining channels of the detection equipment, wherein the state attribute of the detection equipment is switched by a random number theta; S254, traversing a target instance list, judging whether the interception process is finished for the intercepted target, and updating the related attribute; S255, traversing the interception equipment list again, and updating an allocated target list, a state and the number of remaining channels of the interception equipment instance; S256, traversing the detection equipment list again, and updating the allocated target list, the state and the number of remaining channels of the detection equipment instance.
7. The simulation platform of claim 6, wherein the six types of interceptors comprise missiles, antiaircraft guns, lasers, high power microwaves, electromagnetic interference, and combat unmanned aerial vehicles.
8. The simulation platform of claim 7, wherein the unmanned aerial vehicle model is provided with initialization, status update, position update, speed update, azimuth calculation, position bias calculation methods; the detection equipment and interception equipment model is provided with an initialization and pointing angle updating calculation method.
9. The simulation platform of claim 8, wherein the initialization module comprises four sub-modules of a system initialization sub-module, a target initialization sub-module, a probe equipment initialization sub-module, and a probe equipment initialization sub-module; The system initialization submodule is used for initializing related parameters of the environmental model and converting coordinates; The target initialization submodule is used for initializing an unmanned aerial vehicle model and generating an unmanned aerial vehicle instance dictionary and an unmanned aerial vehicle instance list; The detection equipment initializing sub-module is used for initializing a detection equipment model and generating a detection equipment instance list; the interception equipment initialization sub-module is used for initializing an interception equipment model and generating an interception equipment instance list.
10. A reinforcement learning environment simulation method in unmanned aerial vehicle defense, applied to the simulation platform of any one of claims 1 to 9, comprising: t1, operating a man-machine interaction module, and editing a fight wanted file through a wanted editing sub-module; T2, initializing a module, namely initializing a system, initializing a target, initializing detection equipment and initializing the detection equipment; T3, a characteristic acquisition sub-module extracts a characteristic tensor containing target information and interception equipment information by calculating the states of the target unmanned plane, the interception equipment and the detection equipment in the current moment environment; t4, a state migration submodule, which executes action response according to the input interception action, updates the environment state and calculates output action rewards and a round ending mark; And T5, a man-machine interaction module, a preview device deployment position, a power coverage range of a detection device and an interception device, a dynamic simulation countermeasure process and a dynamic adjustment simulation speed.

Description

Reinforced learning environment simulation platform and method in unmanned aerial vehicle defense Technical Field The document relates to the technical fields of unmanned aerial vehicle interception mechanism and reinforcement learning, in particular to a reinforcement learning environment simulation platform and a reinforcement learning environment simulation method in unmanned aerial vehicle defense. Background With the development of unmanned aerial vehicle technology, unmanned aerial vehicles gradually become a low-cost, high-flexibility, high-destructive and hardly-found damage or attack means. How to efficiently utilize the existing interception to defend the unmanned aerial vehicle under the environment constraint condition is an important research content at present, wherein training the intelligent agent to distribute the unmanned aerial vehicle task through the reinforcement learning technology is an important method, but the reinforcement learning technology needs the intelligent agent to interact with the environment, the intelligent agent influences the environment through executing actions, the environment responds to the action update state and outputs instant rewards, and reinforcement learning aims at the maximum accumulated rewards in one task period to train the intelligent agent. Mature reinforcement learning environment models are developed in the fields of robot control and the like, reinforcement training of an intelligent body can be supported, but no environment model with perfect functions is available in the unmanned aerial vehicle defense field. The current anti-unmanned aerial vehicle task allocation method based on reinforcement learning mainly simulates the environment state transition condition of an unmanned aerial vehicle after single interception through manually defining simple state transition probability, such as the patents CN202011482387.0 and CN202311702266.6. The method cannot realize complete process evaluation from the steps of discovering the target, tracking the target, starting to intercept and intercept the meeting until the countermeasure is finished based on continuous motion processes of elements such as the unmanned aerial vehicle, the interception equipment and the detection equipment, but has strong randomness, large difference from the actual situation, and the reliability of the intelligent agent obtained by training is insufficient, and the method is only suitable for specific scenes and has insufficient robustness. Disclosure of Invention The description provides a reinforcement learning environment simulation platform and a reinforcement learning environment simulation method in unmanned aerial vehicle defense, which are used for solving the problems that the reliability of an intelligent agent obtained by the existing reinforcement learning training is insufficient and the intelligent agent is only suitable for specific scenes. In a first aspect, the specification provides a reinforcement learning environment simulation platform in unmanned aerial vehicle defense, which comprises an unmanned aerial vehicle model library, an initialization module, a reset module, an unmanned aerial vehicle defense stepping module and an unmanned aerial vehicle defense man-machine interaction module; The unmanned aerial vehicle model library comprises abstract models of unmanned aerial vehicles, detection equipment and six types of interception equipment, and each model comprises an ID, a type, a state, a position attribute and a calculation method adapted to the model; The initialization module is used for performing system initialization, target initialization, detection equipment initialization and interception equipment initialization, and generating an unmanned aerial vehicle instance dictionary, an unmanned aerial vehicle instance list, a detection equipment instance list and an interception equipment instance list; The resetting module is used for resetting the system parameters, the target examples, the detection equipment examples and the interception equipment examples; The unmanned aerial vehicle defense stepping module comprises a feature acquisition sub-module and a state migration sub-module, wherein the feature acquisition sub-module is used for extracting feature tensors containing target information and interception equipment information by calculating the states of target unmanned aerial vehicles, interception equipment and detection equipment examples in the current moment environment; The man-machine interaction module comprises a wanted editing sub-module and an countermeasure simulation sub-module, wherein the wanted editing sub-module is used for setting system parameters, target parameters, interception detection equipment parameters, previewing equipment deployment positions and power coverage areas of the detection equipment and the interception equipment, and the countermeasure simulation sub-module is used for loading wanted, dynamic simul