CN-121995932-A - Unmanned aerial vehicle cluster offshore burst prevention decision-making method, system, equipment and medium

CN121995932ACN 121995932 ACN121995932 ACN 121995932ACN-121995932-A

Abstract

The invention relates to the technical field of intelligent decision making, and discloses an offshore burst prevention decision making method, system, equipment and medium for an unmanned aerial vehicle cluster. The method comprises the steps of obtaining battlefield environmental information and task targets of an unmanned aerial vehicle cluster, generating a long-term strategy scheme through a large language model by utilizing a military knowledge base, conducting sudden-prevention simulation according to the long-term strategy scheme, determining an instant tactical strategy through a multi-agent reinforcement learning model, conducting sudden-prevention simulation on the unmanned aerial vehicle cluster under a new tactical strategy of an enemy through the long-term strategy scheme and the instant tactical strategy, evaluating the performance of the long-term strategy scheme and the instant tactical strategy and identifying threats in the long-term strategy scheme and the instant tactical strategy to adjust the long-term strategy scheme, conducting sudden-prevention simulation on the unmanned aerial vehicle cluster under different battlefield environments again to obtain a plurality of instant tactical strategies, and determining the offshore sudden-prevention strategy of the unmanned aerial vehicle cluster according to the adjusted long-term strategy scheme and the instant tactical strategy with the best performance in the instant tactical strategies.

Inventors

FU YANFANG
ZHANG JIAHAO
WANG XIAOFEI
Peng Yachen

Assignees

西安工业大学

Dates

Publication Date: 20260508
Application Date: 20260113

Claims (8)

1. An unmanned aerial vehicle cluster offshore burst prevention decision-making method, which is characterized by comprising the following steps: Acquiring original battlefield environmental information and a task target of an unmanned aerial vehicle cluster; Searching and reasoning from a preset military knowledge base according to battlefield environmental information and task targets through a large language model to generate a long-term strategy scheme and corresponding strategy rewarding constraint; According to a long-term strategic scheme, simulation of the on-sea sudden defense of the unmanned aerial vehicle cluster is carried out, and an instantaneous tactical strategy of the unmanned aerial vehicle cluster is determined according to strategic rewarding constraint and battlefield environment information of the unmanned aerial vehicle cluster during sudden defense in a simulation process through a multi-agent reinforcement learning model; Under a new tactical strategy of an enemy, the unmanned aerial vehicle cluster carries out simulation on the offshore sudden defense process through a long-term tactical scheme and an instant tactical strategy so as to evaluate and identify the threats faced by the long-term tactical scheme and the instant tactical strategy when the long-term tactical scheme and the instant tactical strategy fight against each other in the war; According to the long-term strategy scheme and the transient strategy, the long-term strategy scheme is adjusted according to the behavior of the long-term strategy scheme and the transient strategy during the fight and the threats faced by the long-term strategy scheme, and the offshore sudden defense process of the unmanned aerial vehicle cluster in different battlefield environments is simulated according to the adjusted long-term strategy scheme, so that a plurality of transient strategy strategies are obtained; and determining the offshore defense strategy of the unmanned aerial vehicle cluster according to the adjusted long-term strategy scheme and the instantaneous tactical strategy which performs best when being used for fighting in the war among a plurality of instantaneous tactical strategies.
2. The unmanned aerial vehicle cluster maritime burst decision-making method of claim 1, wherein the searching and reasoning are performed from a preset military knowledge base through a large language model according to battlefield environmental information and task targets to generate a long-term strategy scheme and corresponding strategy rewarding constraint, and the method comprises the following steps: similar tactics and corresponding tactical terms are retrieved from the military knowledge base through the large language model according to the battlefield environmental information and the task targets, and a long-term strategy scheme and corresponding strategy rewarding constraint are obtained by reasoning according to the retrieved tactics and corresponding tactical terms.
3. The unmanned aerial vehicle cluster offshore defense decision-making method of claim 2, wherein the long-term strategic solution comprises a plurality of defense paths, grouping suggestions and action timings of unmanned aerial vehicle clusters, and use schemes of electromagnetic spectrum; the strategic rewarding constraint is that rewarding items conforming to military rules are added on the basis of achieving task targets by the unmanned aerial vehicle clusters.
4. The unmanned aerial vehicle cluster maritime defense decision-making method of claim 1, wherein the determining the maritime defense strategy of the unmanned aerial vehicle cluster based on the adjusted long-term strategy and the instantaneous tactical strategy of the plurality of instantaneous tactical strategies that performs best when fighted in war comprises: According to the adjusted long-term strategy scheme and the instantaneous tactical strategy which has the best performance in fighting among a plurality of instantaneous tactical strategies, the multi-agent reinforcement learning model is adjusted; Based on a knowledge distillation method, training a student model by taking a large language model and an adjusted multi-agent reinforcement learning model as a teacher model; And determining offshore sudden defense strategies of different unmanned aerial vehicle clusters in different battlefield environments through the trained student models.
5. The unmanned aerial vehicle cluster maritime bump protection decision method of claim 4, wherein the trained student model is deployed on each unmanned aerial vehicle in the unmanned aerial vehicle cluster.
6. An unmanned aerial vehicle cluster marine burst prevention decision making system, the system comprising: the data acquisition module is used for acquiring original battlefield environmental information and task targets of the unmanned aerial vehicle clusters; The strategy generation module is used for carrying out retrieval and reasoning from a preset military knowledge base according to battlefield environment information and task targets through the large language model so as to generate a long-term strategy scheme and corresponding strategy rewarding constraint; The tactical generation module is used for carrying out simulation of the offshore sudden defense of the unmanned aerial vehicle cluster according to a long-term strategic scheme, and determining an instantaneous tactical strategy of the unmanned aerial vehicle cluster according to strategic rewarding constraint and battlefield environment information of the unmanned aerial vehicle cluster in the sudden defense in a simulation process through the multi-agent reinforcement learning model; The strategy evaluation module is used for performing simulation on the offshore sudden defense process of the unmanned aerial vehicle cluster under a new strategy of an enemy through a long-term strategy scheme and an instantaneous strategy so as to evaluate the performance of the long-term strategy scheme and the instantaneous strategy when the long-term strategy scheme and the instantaneous strategy are in fight against and identify the threats faced by the long-term strategy scheme and the instantaneous strategy; The strategy adjustment module is used for adjusting the long-term strategy scheme according to the long-term strategy scheme and the performance of the instantaneous strategy during the fight and the threats faced by the long-term strategy scheme through the large language model, and carrying out simulation on the offshore sudden defense process of the unmanned aerial vehicle cluster in different battlefield environments according to the adjusted long-term strategy scheme so as to obtain a plurality of instantaneous strategy strategies; And the intelligent decision module is used for determining the offshore defense burst strategy of the unmanned aerial vehicle cluster according to the adjusted long-term strategy scheme and the instantaneous tactical strategy which performs best when the unmanned aerial vehicle is in fight among a plurality of instantaneous tactical strategies.
7. A computer device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the unmanned aerial vehicle cluster marine bump protection decision method of any of claims 1 to 5.
8. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the unmanned aerial vehicle cluster marine burst decision method of any one of claims 1 to 5.

Description

Unmanned aerial vehicle cluster offshore burst prevention decision-making method, system, equipment and medium Technical Field The invention relates to the technical field of intelligent decision making, in particular to an offshore burst prevention decision making method, system, equipment and medium for an unmanned aerial vehicle cluster. Background Currently, in the aspect of unmanned aerial vehicle cluster offshore burst prevention path planning, an unmanned aerial vehicle cluster collaborative path planning method based on multi-agent reinforcement learning is mostly adopted, and is regarded as a mainstream front-edge technology for solving the complex decision problem. The core architecture of this solution is typically such that the marine anti-collision task is modeled as a partially observable markov decision process. Each drone in the system acts as an agent with the goal of learning a cooperative strategy through interactions with the environment (trial and error) to maximize the cumulative rewards for completing the anti-replay task. The system generally comprises three modules, namely a situation awareness module, and a convolutional neural network or a recurrent neural network is adopted to process local observation information (such as self position and sensor data) of an individual. And the collaborative decision module adopts a centralized training distributed execution framework, such as MADDPG algorithm, and utilizes global information to guide optimization of individual strategies during training so as to learn cluster collaborative behaviors. And the path execution module converts the actions (such as heading and speed instructions) output by the decision module into specific control signals. However, the algorithm framework of the scheme adopts a 'isomorphic single network' to process the multi-scale space-time problem at the decision framework level, and particularly uses the same neural network to simultaneously take charge of long-term task planning (such as selecting a burst corridor) and instantaneous tactical maneuver (such as avoiding missiles). Such coupling architectures have the consequence that network parameters need to be weighed against each other's conflicting objectives, often resulting in a "short-term optimal rather than long-term optimal" decision. For example, an agent may learn too much of a conservative tactic, deviate severely from the optimal strategic course in order to avoid each instant threat, or, conversely, ignore the deadly threat in front of the eye in order to faithfully follow the macroscopic course, thus presenting problems of strategic and tactic decision imbalance, which limits its practical application effectiveness in unmanned aerial vehicle cluster marine defense tasks. Disclosure of Invention The invention aims to provide an offshore burst prevention decision-making method, system, equipment and medium for an unmanned aerial vehicle cluster, which can solve the technical problems. The invention provides an offshore sudden-prevention decision-making method of an unmanned aerial vehicle cluster, which comprises the following steps of: Acquiring original battlefield environmental information and a task target of an unmanned aerial vehicle cluster; Searching and reasoning from a preset military knowledge base according to battlefield environmental information and task targets through a large language model to generate a long-term strategy scheme and corresponding strategy rewarding constraint; According to a long-term strategic scheme, simulation of the on-sea sudden defense of the unmanned aerial vehicle cluster is carried out, and an instantaneous tactical strategy of the unmanned aerial vehicle cluster is determined according to strategic rewarding constraint and battlefield environment information of the unmanned aerial vehicle cluster during sudden defense in a simulation process through a multi-agent reinforcement learning model; Under a new tactical strategy of an enemy, the unmanned aerial vehicle cluster carries out simulation on the offshore sudden defense process through a long-term tactical scheme and an instant tactical strategy so as to evaluate and identify the threats faced by the long-term tactical scheme and the instant tactical strategy when the long-term tactical scheme and the instant tactical strategy fight against each other in the war; According to the long-term strategy scheme and the transient strategy, the long-term strategy scheme is adjusted according to the behavior of the long-term strategy scheme and the transient strategy during the fight and the threats faced by the long-term strategy scheme, and the offshore sudden defense process of the unmanned aerial vehicle cluster in different battlefield environments is simulated according to the adjusted long-term strategy scheme, so that a plurality of transient strategy strategies are obtained; and determining the offshore defense strategy of the unmanned aerial vehicle cl