CN-121979618-A - Unmanned cluster multi-target capture resource perception type task allocation method, device, equipment and medium
Abstract
The application relates to the technical field of unmanned cluster cooperative strategies and discloses an unmanned cluster multi-target capture resource perception type task allocation method, device, equipment and medium, wherein the method comprises the steps of modeling an association relationship between an intelligent agent and a target in an unmanned cluster capture scene; the method comprises the steps of summarizing a resource experience library and completing initial resource allocation in a trapping starting stage by using an agent and target resource ratio initializing method based on a heuristic algorithm, capturing scene information by using a dynamic trapping resource self-adaptive ratio optimizing method based on reinforcement learning, and realizing dynamic adjustment and optimization of resources by rewarding punishment learning. The application solves the key problems of poor expandability, blind resource allocation, insufficient dynamic adaptability and the like in the trapping system.
Inventors
- ZHU XIANQIANG
- LIU QITING
- LI MEIXUAN
- ZHAO CHENXU
- HE YIXIN
- YUAN PENGJIE
- LIN YICHENG
Assignees
- 中国人民解放军国防科技大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260409
Claims (10)
- 1. An unmanned cluster multi-target capture resource perception type task allocation method is characterized by comprising the following steps: S10, modeling is conducted on association relation between an agent and a target in an unmanned cluster trapping scene, and the unmanned cluster and the target are modeled according to functions and the dependency relation through an artificial potential field method to form a task executing process; S20, summarizing a resource experience library and completing initial resource allocation in a trapping start stage by an agent and target resource ratio initialization method based on a heuristic algorithm; s30, capturing scene information by means of a dynamic trapping resource self-adaptive proportioning optimization method based on reinforcement learning, and realizing dynamic adjustment and optimization of resources through rewarding punishment learning.
- 2. The unmanned cluster multi-target trapping resource perception type task allocation method according to claim 1, wherein the task execution process comprises the steps of designing trapping behavior logic of the unmanned cluster by adopting a distributed cooperative method, and carrying out dynamic cooperative trapping of the unmanned cluster through target tracking, cooperative surrounding and state switching.
- 3. The unmanned cluster multi-target capture resource perception type task allocation method according to claim 1 is characterized in that S20 is specifically characterized in that resource allocation is summarized according to task requirements and environment information and according to trained historical experience, and the training comprises the steps of capture task requirement analysis and constraint system construction, capture task resource consumption cost design and heuristic algorithm initialization of resource allocation flow.
- 4. The unmanned cluster multi-target capture resource perception type task allocation method according to claim 1, wherein S30 is specifically implemented by constructing an online resource proportion optimization framework driven by reinforcement learning by taking real-time adaptation, dynamic tuning and convergence guarantee as cores according to dynamic and non-global information constraint of a capture scene, and continuously optimizing a resource allocation scheme through interactive iteration of an agent and an environment.
- 5. The unmanned cluster multi-target capture resource perception type task allocation method according to claim 2, wherein the target tracking is specifically that when the unmanned aerial vehicle obtains the target estimated position through self detection or cluster communication sharing, a tracking speed vector pointing to the target is generated, and the tracking speed direction always points to the target; The cooperative enclosure is specifically characterized in that short-range separation repulsive force is designed for avoiding collision or excessive aggregation of unmanned aerial vehicles in a group, and the final control speed of the unmanned aerial vehicles is the weighted sum of tracking vectors and separation repulsive force, so that dynamic balance before and after the completion of enclosure is realized; The state switching is specifically that a behavior mode is dynamically adjusted based on the number of chasers of the same target in the communication topology, and if the current unmanned aerial vehicle sequence is larger than the number of chasers, the current target is abandoned, and the search is continued.
- 6. The unmanned cluster multi-target trapping resource perception type task allocation method according to claim 3, wherein the trapping task demand analysis and constraint system is constructed, trapping success judgment quantification standards are given by combining geometric distribution with dynamic characteristics, and two types of constraints are given based on actual feasibility of the task, including unmanned aerial vehicle allocation constraint and success rate constraint; the resource consumption cost design of the trapping task aims at minimizing the comprehensive resource consumption of the trapping task, and simultaneously avoids the irrational scheme of the unmanned aerial vehicle from being infinitely increased for pursuing time efficiency, so that the efficient utilization of resources is realized; the heuristic algorithm initializes the resource proportioning flow, and outputs a surrounding resource proportioning initial scheme which meets constraint and has optimal comprehensive cost through quantized iterative search and multidimensional evaluation.
- 7. The unmanned cluster multi-target enclosure resource-aware task allocation method of claim 4, wherein S30 comprises: Constructing a reinforcement learning environment model fitting the dynamic characteristics of the enclosure, defining quantitative definitions of states and action spaces, and providing a basis for online interaction; taking the trapping efficiency, the resource economy and the dynamic suitability as optimization guidance, designing a weighted collaborative rewarding function, and balancing the instant benefit and the long-term target; the method comprises the steps of constructing a dual-network structure of a strategy network and a value network, wherein the strategy network is a fully-connected network, outputting probability distribution of resource proportion adjustment quantity after a trapping state is input, and the value network is also a fully-connected network, outputting state value after the trapping state is input and used for evaluating long-term benefits of the current situation.
- 8. An unmanned cluster multi-target capture resource-aware task allocation device, applying the unmanned cluster multi-target capture resource-aware task allocation method according to any one of claims 1 to 7, the device comprising: The capture scene modeling module is used for modeling the association relation between the intelligent agent and the target in the capture scene of the unmanned cluster, and the unmanned cluster and the target are modeled according to functions and the dependency relation by an artificial potential field method to form a task execution process; The initial resource allocation module is used for summarizing a resource experience library and completing initial resource allocation in a capture starting stage through an agent and target resource ratio initialization method based on a heuristic algorithm; The resource dynamic adjustment and optimization module is used for capturing scene information by means of a dynamic trapping resource self-adaptive proportioning optimization method based on reinforcement learning and realizing dynamic adjustment and optimization of resources through rewarding punishment learning.
- 9. An unmanned cluster multi-target capture resource-aware task distribution computer device, comprising at least one processor, at least one memory, and a data bus; the processor and the memory complete communication with each other through the data bus; the memory stores program instructions for execution by the processor, the processor invoking the program instructions to perform the unmanned cluster multi-target enclosure resource aware task allocation method of any of claims 1 to 7.
- 10. A medium having stored thereon a computer program, which when executed by a processor, implements the unmanned cluster multi-target enclosure resource aware task allocation method of any of claims 1 to 7.
Description
Unmanned cluster multi-target capture resource perception type task allocation method, device, equipment and medium Technical Field The application relates to the technical field of unmanned cluster cooperation strategies, in particular to a method, a device, equipment and a medium for distributing multi-target capturing resource sensing type tasks of an unmanned cluster. Background The current scene study on unmanned cluster trapping has the following problems: When the existing research is used for carrying out the trapping task, the task is usually considered to be completed at first, and the problem of resource proportion of the unmanned cluster and the trapped object is seldom concerned. This generally causes a great deal of resource waste or inefficiency. The existing resource allocation method has obviously insufficient adaptability to dynamic environment. Especially in a multi-trace trapping scene, the capability of adjusting the resource proportion in real time according to the dynamic environment change is lacking, and the dynamic optimization of the overall trapping task efficiency is difficult to realize. In view of the foregoing, a new technical solution for unmanned cluster multi-target capturing resource-aware task allocation is needed. Disclosure of Invention The application aims to provide a method, a device, equipment and a medium for distributing multi-target capturing resource perception type tasks of an unmanned cluster, so as to solve the problem of optimal distribution of multi-target capturing resources in a complex environment in the prior art. In order to achieve the above purpose, the application provides a method for distributing unmanned cluster multi-target capture resource perception type tasks, which comprises the following steps: S10, modeling is conducted on association relation between an agent and a target in an unmanned cluster trapping scene, and the unmanned cluster and the target are modeled according to functions and the dependency relation through an artificial potential field method to form a task executing process; S20, summarizing a resource experience library and completing initial resource allocation in a trapping start stage by an agent and target resource ratio initialization method based on a heuristic algorithm; s30, capturing scene information by means of a dynamic trapping resource self-adaptive proportioning optimization method based on reinforcement learning, and realizing dynamic adjustment and optimization of resources through rewarding punishment learning. Preferably, the task execution process comprises the steps of adopting a distributed cooperative method to design the trapping behavior logic of the unmanned cluster, and carrying out dynamic cooperative trapping of the unmanned cluster through target tracking, cooperative surrounding and state switching. The method is characterized in that S20 is specifically that the resource proportioning is summarized according to task requirements and environment information and according to trained historical experience, the training comprises construction of a constraint system for task demand analysis and task resource consumption cost acquisition design and initialization of a resource proportioning flow by a heuristic algorithm. Preferably, S30 is specifically implemented by constructing an online resource proportioning optimization framework driven by reinforcement learning by taking real-time adaptation, dynamic tuning and convergence guarantee as cores aiming at dynamic and non-global information constraint of a trapping scene, and continuously optimizing a resource allocation scheme through interactive iteration of an agent and an environment. Preferably, the target tracking is specifically that when the unmanned aerial vehicle obtains the target estimated position through self detection or cluster communication sharing, a tracking speed vector pointing to the target is generated, and the tracking speed direction always points to the target; The cooperative enclosure is specifically characterized in that short-range separation repulsive force is designed for avoiding collision or excessive aggregation of unmanned aerial vehicles in a group, and the final control speed of the unmanned aerial vehicles is the weighted sum of tracking vectors and separation repulsive force, so that dynamic balance before and after the completion of enclosure is realized; The state switching is specifically that a behavior mode is dynamically adjusted based on the number of chasers of the same target in the communication topology, and if the current unmanned aerial vehicle sequence is larger than the number of chasers, the current target is abandoned, and the search is continued. Preferably, the acquisition task demand analysis and constraint system construction is performed, acquisition success judgment quantification standard is given by combining geometric distribution with dynamic characteristics, and two types of cons