CN-121998362-A - Unmanned ship cluster multi-target task allocation method based on improved DDQN algorithm

CN121998362ACN 121998362 ACN121998362 ACN 121998362ACN-121998362-A

Abstract

An unmanned ship cluster multi-target task allocation method based on an improved DDQN algorithm belongs to the technical field of unmanned ship task planning. The method solves the problems of low task allocation efficiency, poor task allocation stability, overestimation of the Q value and low sample utilization rate of the traditional DQN algorithm. The invention builds a distributed task allocation model containing capability constraint, combines the temperature-controllable Boltzmann strategy to enhance DDQN architecture exploration capability, improves training efficiency by using preferential experience playback, and ensures sample utilization rate. The method can enable the unmanned ship cluster to autonomously learn the optimal allocation strategy through interaction with the environment under the condition of no priori knowledge, effectively improve the efficiency and the robustness of task allocation, ensure the stability of task allocation and realize the efficient and collaborative allocation of the unmanned ship cluster to multiple targets. Meanwhile, by utilizing DDQN architecture decoupling action selection and value evaluation, overestimation deviation of the Q value is eliminated. The method can be applied to unmanned ship target task allocation.

Inventors

Zhao Enjiao
XIA SHUQIANG
LI YUCHAN
LI JINTIAN
LI JIALING
LIU XIAORUI
ZHAO YUXIN

Assignees

哈尔滨工程大学

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (10)

1. The unmanned ship cluster multi-target task allocation method based on the improved DDQN algorithm is characterized by comprising the following steps of: constructing an input state vector and an action space of a modified DDQN algorithm, wherein the modified DDQN algorithm is obtained by replacing an action selection strategy with a Boltzmann exploration strategy; The input state vector comprises coordinates of all unmanned ships, communication capacity values, detection capacity values and motion capacity values of all unmanned ships and current task states of all target unmanned ships, wherein the motion space is the number of the allocated target unmanned ships; And step two, a modified DDQN algorithm distributes target unmanned ships for each capturing unmanned ship according to the input state vector.
2. The unmanned aerial vehicle cluster multi-target task allocation method based on the improved DDQN algorithm of claim 1, wherein the current task state of the target unmanned aerial vehicle is defined as a three-dimensional vector for any one target unmanned aerial vehicle; the 1 st element in the three-dimensional vector represents the current communication capacity normalized residual demand ratio of the target unmanned ship; The 2 nd element in the three-dimensional vector represents the current detection capability normalized residual demand ratio of the target unmanned ship; The 3rd element in the three-dimensional vector represents whether the motion capability of the capturing unmanned ship distributed for the target unmanned ship meets the requirement, if the motion capability meets the requirement, the value of the 3rd element in the three-dimensional vector is 1, otherwise, the value of the 3rd element in the three-dimensional vector is 0.
3. The unmanned ship cluster multi-objective task allocation method based on the improved DDQN algorithm according to claim 2, wherein the calculation method of the communication capacity normalized remaining demand ratio is as follows: Wherein, the Representing the normalized remaining demand ratio of communication capacity.
4. The unmanned aerial vehicle cluster multi-target task allocation method based on the improved DDQN algorithm according to claim 3, wherein the calculation method of the detection capability normalized remaining demand ratio is as follows: Wherein, the Indicating the detectability normalized remaining demand ratio.
5. The unmanned ship cluster multi-objective task allocation method based on the improved DDQN algorithm according to claim 4, wherein the reward functions adopted in the training process of the improved DDQN algorithm are: Wherein, the Representing the total prize function value during training of the improved DDQN algorithm, Represent the first The prize function value of each enclosing unmanned ship, Represents the total number of unmanned boats to be caught, Representing the trapping first The prize function values of the unmanned ship team of the individual target unmanned ship, Represent the first The prize function value of each target unmanned ship, Representing the total number of target unmanned boats.
6. The unmanned aerial vehicle cluster multi-objective task allocation method based on the improved DDQN algorithm of claim 5, wherein the first step is to Rewarding function value of each unmanned ship The method comprises the following steps: Wherein, the Represent the first Rewarding unmanned ship according to distribution result Is a fixed value; indicating a capability matching reward, i.e. the first The three capability values of each unmanned capturing boat and the allocated target unmanned boat are respectively corresponding to difference, and the sum of the three difference values is ; Represent the first The number of the unmanned boats that are opposite to the distance of the assigned target unmanned boat, 、 And Are weight coefficients.
7. The unmanned aerial vehicle cluster multi-objective task allocation method based on the improved DDQN algorithm of claim 6, wherein the trapping is characterized by Rewarding function values of unmanned ship team of individual target unmanned ship The method comprises the following steps: Wherein, the The representation is assigned to the first The total distribution rewards of all the unmanned boats in the team of the target unmanned boats, namely the number and the number of the unmanned boats in the team Is a product of (2); the representation is assigned to the first All unmanned vessels are captured in team of the target unmanned vessels to the first The inverse of the total distance of the individual target unmanned boats, And Are weight coefficients.
8. The unmanned aerial vehicle cluster multi-objective task allocation method based on the improved DDQN algorithm of claim 7, wherein the first step Reward function value of individual target unmanned ship The method comprises the following steps: Wherein, the Represent the first A distribution reward of each target unmanned ship, if the unmanned ship is distributed to the first unmanned ship Unmanned ship with target, then The value of (2) is a set value, if no unmanned ship is distributed to the first Unmanned ship with target, then The value of (2) is 0; Represent the first The ability of the individual target unmanned boats matches the rewards, And Are weight coefficients.
9. The unmanned aerial vehicle cluster multi-objective task allocation method based on the improved DDQN algorithm of claim 8, wherein in the training process of the improved DDQN algorithm, the first experience playback pool is Probability of individual samples being selected The method comprises the following steps: Wherein, the Represent the first The priority of the individual samples is given to the user, , Is a normal number of times, and the number of times is equal to the normal number, Is the parameter of the ultrasonic wave to be used as the ultrasonic wave, Represent the first The TD error of the individual empirical samples, Represent the first The priority of the individual experience samples is determined, Representing the total number of experience samples in the experience playback pool.
10. The unmanned aerial vehicle cluster multi-objective task allocation method based on the improved DDQN algorithm of claim 9, wherein when the improved DDQN algorithm outputs actions, the probability that each action in the action space is selected is: Wherein, the As a function of the temperature parameter(s), Representing a selection action The value of the obtained value is that, Representing a selection action The value of the obtained value is that, The base number representing the natural logarithm, Representing a set of actions; the temperature parameter Linear decay with training rounds: Wherein, the Indicating the parameter of the initial temperature of the product, The decay rate of the temperature parameter is indicated, Representing the training round.

Description

Unmanned ship cluster multi-target task allocation method based on improved DDQN algorithm Technical Field The invention belongs to the technical field of unmanned ship (Unmanned Surface Vehicle, USV) task planning, and particularly relates to an unmanned ship cluster multi-target task allocation method based on an improved DDQN algorithm. Background The unmanned surface vessel plays an important role in the fields of ocean resource exploration and the like due to the advantages of small volume, high intelligent degree, good concealment and the like. In the face of complex and changeable ocean environments, a single unmanned ship is difficult to meet task requirements, so that the cooperative operation of multiple unmanned ships becomes a necessary trend. The multi-target task allocation is a key link of cluster cooperation, and the overall efficiency of the system is directly determined. Traditional task allocation methods include contractual networking agreements, auction algorithms, and the like, which often rely on accurate mathematical models or a large amount of prior knowledge, and have high computational complexity and poor real-time performance when processing high-dimensional state space and dynamic environments. Although reinforcement learning methods such as Deep Q Network (DQN) provide new ideas for solving such problems, the conventional DQN algorithm has problems of overestimation of Q value, low sample utilization rate, and easy local optimization during action selection, and is difficult to ensure task allocation efficiency and convergence stability of unmanned ship clusters in complex environments. In summary, the conventional DQN algorithm still has the problems of low task allocation efficiency, poor task allocation stability, overestimation of Q value and low sample utilization rate, and researches a task allocation method which can adapt to dynamic environment, has high convergence speed and allocation policy optimization, so that the method has important practical value. Disclosure of Invention The invention provides an unmanned ship cluster multi-target task distribution method based on an improved DDQN algorithm, which aims to solve the problems of low task distribution efficiency, poor task distribution stability, overestimation of a Q value and low sample utilization rate of a traditional DQN algorithm. The technical scheme adopted for solving the technical problems is that an unmanned ship cluster multi-target task allocation method based on an improved DDQN algorithm comprises the following steps: constructing an input state vector and an action space of a modified DDQN algorithm, wherein the modified DDQN algorithm is obtained by replacing an action selection strategy with a Boltzmann exploration strategy; The input state vector comprises coordinates of all unmanned ships, communication capacity values, detection capacity values and motion capacity values of all unmanned ships and current task states of all target unmanned ships, wherein the motion space is the number of the allocated target unmanned ships; And step two, a modified DDQN algorithm distributes target unmanned ships for each capturing unmanned ship according to the input state vector. Further, the current task state of the target unmanned aerial vehicle is defined as that for any one target unmanned aerial vehicle, the task state of the target unmanned aerial vehicle is a three-dimensional vector; the 1 st element in the three-dimensional vector represents the current communication capacity normalized residual demand ratio of the target unmanned ship; The 2 nd element in the three-dimensional vector represents the current detection capability normalized residual demand ratio of the target unmanned ship; The 3rd element in the three-dimensional vector represents whether the motion capability of the capturing unmanned ship distributed for the target unmanned ship meets the requirement, if the motion capability meets the requirement, the value of the 3rd element in the three-dimensional vector is 1, otherwise, the value of the 3rd element in the three-dimensional vector is 0. Further, the calculation method of the communication capacity normalized remaining demand ratio is as follows: Wherein, the Representing the normalized remaining demand ratio of communication capacity. Further, the calculation method of the detectability normalized remaining demand ratio is as follows: Wherein, the Indicating the detectability normalized remaining demand ratio. Further, the reward function adopted in the training process of the improved DDQN algorithm is as follows: Wherein, the Representing the total prize function value during training of the improved DDQN algorithm,Represent the firstThe prize function value of each enclosing unmanned ship,Represents the total number of unmanned boats to be caught,Representing the trapping firstThe prize function values of the unmanned ship team of the individual target unmanned ship,Repre