CN-121995735-A - Ship autonomous tracking control method integrating deep reinforcement learning and line-of-sight algorithm

CN121995735ACN 121995735 ACN121995735 ACN 121995735ACN-121995735-A

Abstract

The invention provides a ship autonomous tracking control method integrating a depth reinforcement learning and line-of-sight algorithm, which belongs to the technical field of ship intelligent control, and a cooperative control framework EEATD integrating the line-of-sight guidance algorithm, drift angle compensation and depth reinforcement learning is used for splitting coupling links such as interference compensation and course adjustment in a path tracking problem, so that a depth reinforcement learning controller does not need to bear a complex adjustment task of global state coupling in the path tracking, and only needs to accurately learn an angle instruction output strategy according to real-time states such as deviation of a target course and an actual course. Meanwhile, a composite rewarding function is designed, various targets such as heading errors, rudder angle change rates and rudder angle amplitude changes are comprehensively considered, verification is carried out through experiments, and experimental results show that the method effectively improves the accuracy and stability of track tracking, balances the smoothness of rudder angle adjustment, and shows stronger robustness under complex disturbance such as wind, waves and currents.

Inventors

ZHAO XINHAO
LI ZHIPENG
HUI ZHIPENG
CHEN GANG
WANG ZIHAO
ZHENG JIANBO
SHEN MINXING
LIU SHITONG
QIN MING

Assignees

浙江省智能船舶研究院有限公司

Dates

Publication Date: 20260508
Application Date: 20260113

Claims (8)

1. A ship autonomous tracking control method integrating deep reinforcement learning and line-of-sight algorithm is characterized by comprising the following steps: Step1, building a Nomoto ship model and simulating the steering characteristic of a ship; Step 2, calculating and obtaining a target course angle through an LOS guidance algorithm, and performing drift angle compensation on the target course angle; Step 3, inputting the set course, the current course angle, the historical course angle, the rudder angle, the historical rudder angle data and the target course angle into EEATD network architecture to obtain a real-time ship rudder angle; The EEATD network architecture is based on a TD3 reinforcement learning architecture, adopts a hierarchical feature learning network, and fuses a multi-head self-attention feature mechanism and a residual error learning module; the EEATD network architecture performs sine and cosine coding on angles in a state space, merges historical time sequence characteristics, and models time sequence dependence among the state characteristics; The EEATD network architecture adopts a composite rewarding function, including heading error rewards, rudder angle rewards and stability rewards; And the EEATD network architecture randomly generates a training path, increases speed disturbance in a Nomoto ship model, simulates the influence of wind wave current on the ship, and performs model training.
2. The method for autonomous tracking control of a vessel integrating deep reinforcement learning and line-of-sight algorithms as set forth in claim 1, wherein the Nomoto vessel model has the following specific formula: Wherein, the For the course angle of the ship, Is the first derivative of the course angle of the ship, Is the second derivative of the course angle of the ship, For the rudder angle command, The turning capability of the ship is represented, Is a course stability time constant, and can be changed to current course angle The following are provided: Wherein, the And Respectively representing the heading angle of the previous moment and the heading angles of the previous two moments, 、 And Respectively representing a current moment rudder angle, a previous moment rudder angle and a previous two moment rudder angles; 、 And Is a model super parameter, and the position of the ship at any moment And The method comprises the following steps: Wherein, the Is the velocity vector of the vessel and, Is the position of the ship at time 0.
3. The method for controlling autonomous tracking of a ship by fusing a deep reinforcement learning and sight line algorithm as set forth in claim 1, wherein the step 2 obtains the target course angle through calculation of an LOS guidance algorithm by the following specific calculation method: Wherein, the Is a pre-aiming point in the LOS algorithm, Is the current position of the ship.
4. The method for autonomous tracking control of a vessel in which deep reinforcement learning and line-of-sight algorithms are integrated as set forth in claim 1, wherein the drift angle is defined by The calculation method is as follows: Wherein, the For the transverse drift velocity of the vessel, Is the velocity vector of the ship; thereby obtaining the compensated target course angle : Wherein, the The target course angle is calculated and obtained through an LOS guidance algorithm.
5. The ship autonomous tracking control method integrating deep reinforcement learning and line-of-sight algorithms according to claim 1, wherein EEATD network architecture is based on a deep reinforcement learning algorithm framework TD3 and mainly comprises an online strategy network Actor and an online value network Critic, the Actor network takes state information of a current ship as input, the state information of the current ship is output after passing through a plurality of linear layers, normalization layers and activation functions, the Critic network inputs the state information of the current ship, corresponding features are extracted through a course error, motion states and historical rudder angle feature extraction network, motion features of the current ship are obtained through a motion feature extraction network, the motion feature extraction network is composed of the linear layers, the normalization layers and the activation functions, the motion features are obtained through linear mapping and a multi-head self-attention mechanism after being spliced, and the formula of the whole process is as follows: Wherein the method comprises the steps of Respectively representing the course error, the motion state, the historical rudder angle and the characteristics extracted by the action network, Is a two-way tensor that can be learned, The function is a multi-headed attention function, The method is characterized in that the fusion characteristic is spliced by a characteristic fusion encoder and an error enhancement bypass to obtain a Q value in reinforcement learning, and the higher the Q value is, the more the action accords with the current situation. The feature fusion encoder is composed of a linear layer, a normalization layer, a dropout layer and an activation function. The error enhancement bypass is composed of a linear layer, a normalization layer and an activation function.
6. The method for autonomous tracking control of a vessel with deep reinforcement learning and line-of-sight algorithm integrated as set forth in claim 5, wherein said EEATD network architecture comprises an online policy network Two online value networks And And a target network corresponding to the online policy network Target network corresponding to two online value networks And Wherein The representative state of the device is represented by, Representing the action of the user, Respectively representing training parameters in the network; the online policy network is based on the current state Obtaining the current optimal action value Adding Gaussian noise The method is characterized by comprising the following steps: Wherein, the Obtaining new state by environment interaction for action noise standard deviation Rewards and rewards And the obtained data Store in experience playback pool Then from experience playback pool Random grabbing batch bar data The target policy network is based on Calculate the optimal action and add Gaussian noise Obtaining a noisy target action value The method is characterized by comprising the following steps: Wherein, the As a standard deviation of the policy noise, Is a policy noise cut-off value, a dual-target value network And According to And Calculating expected returns And Dual online Critic network is then updated by minimizing the loss function And The loss function is specifically as follows: Wherein, the The online policy network is responsible for policy optimization, which aims to maximize the predicted Q value of the first online value network.
7. The method for autonomous tracking control of a vessel in which deep reinforcement learning and line-of-sight algorithms are fused as set forth in claim 1, wherein the EEATD network architecture performs sine and cosine coding on angles in a state space and fuses historical timing characteristics, and the specific formula is as follows: Wherein the method comprises the steps of Representation of The bow angle of the ship at the moment, Representation of The target heading angle of the ship at the moment, Representation of Error angle between moment ship heading angle and target heading angle, subscript Respectively represent The values of the various parameters of the time instant, Representation of The value of the rudder angle at the moment; The action space is as follows: Wherein, the Is in the range of And the whole action space limits the rudder angle feasible region through continuous interval constraint.
8. The method for autonomous tracking control of a vessel in which deep reinforcement learning and line-of-sight algorithms are integrated as set forth in claim 1, wherein the heading error rewards include heading error rewards and error rate rewards; the rudder angle rewards comprise rudder angle amplitude rewards, rudder angle change rate rewards and additional punishment items; The stability rewards comprise rewards endowed when the system approaches to the target course and keeps the stable state; Finally, the integral composite rewarding function constructs a staged control strategy through differentiated weight configuration according to the magnitude difference of the current course error, and dynamic collaborative optimization of each sub rewarding item is realized.

Description

Ship autonomous tracking control method integrating deep reinforcement learning and line-of-sight algorithm Technical Field The invention belongs to the technical field of intelligent control of ships, and particularly relates to a ship autonomous tracking control method integrating deep reinforcement learning and a sight line algorithm. Background As global trade integration and marine activities such as ocean resource development become common, technology for exploring various kinds of equipment in the ocean is advancing. The unmanned ship on the water surface has the advantages of high degree of autonomy, strong environmental adaptability, excellent operation efficiency and the like, and can be widely applied to marine activities such as ocean resource development, intelligent shipping, global trade transportation and the like, so that the unmanned ship on the water surface receives wide attention. In the application process, the autonomous track tracking control of the ship is the most basic and key technology. However, because the unmanned ship on the water surface has strong nonlinearity and strong coupling characteristics, and has large inertia, model parameter perturbation, wind, wave, current and other external environment disturbance, drift angle (the included angle between the actual speed direction and the heading of the ship) can be generated, the realization of high-performance autonomous track tracking control is not easy. Disclosure of Invention In order to solve the problems, the invention provides a cooperative control framework EEATD integrating line of sight guidance (LOS algorithm), drift angle compensation and deep reinforcement learning, which splits coupling links such as interference compensation, course adjustment and the like in the path tracking problem, so that a deep reinforcement learning controller does not need to bear complex adjustment tasks of global state coupling in the path tracking, and only needs to accurately learn rudder angle instruction output strategies according to real-time states such as deviation of a target course and an actual course. The invention provides a ship autonomous tracking control method integrating deep reinforcement learning and sight line algorithm, which comprises the following steps: Step1, building a Nomoto ship model and simulating the steering characteristic of a ship; Step 2, calculating and obtaining a target course angle through an LOS guidance algorithm, and performing drift angle compensation on the target course angle; Step 3, inputting the set course, the current course angle, the historical course angle, the rudder angle, the historical rudder angle data and the target course angle into EEATD network architecture to obtain a real-time ship rudder angle; The EEATD network architecture is based on a TD3 reinforcement learning architecture, adopts a hierarchical feature learning network, and fuses a multi-head self-attention feature mechanism and a residual error learning module; the EEATD network architecture performs sine and cosine coding on angles in a state space, merges historical time sequence characteristics, and models time sequence dependence among the state characteristics; The EEATD network architecture adopts a composite rewarding function, including heading error rewards, rudder angle rewards and stability rewards; And the EEATD network architecture randomly generates a training path, increases speed disturbance in a Nomoto ship model, simulates the influence of wind wave current on the ship, and performs model training. Preferably, the Nomoto ship model has the following specific formula: Wherein, the For the course angle of the ship,Is the first derivative of the course angle of the ship,Is the second derivative of the course angle of the ship,For the rudder angle command,The turning capability of the ship is represented,Is a course stability time constant, and can be changed to current course angleThe following are provided: Wherein, the AndRespectively representing the heading angle of the previous moment and the heading angles of the previous two moments,、AndRespectively representing a current moment rudder angle, a previous moment rudder angle and a previous two moment rudder angles;、 And Is a model super parameter, and the position of the ship at any momentAndThe method comprises the following steps: Wherein, the Is the velocity vector of the vessel and,Is the position of the ship at time 0. Preferably, the target course angle is obtained through calculation of an LOS guidance algorithm in the step 2, and the specific calculation mode is as follows: Wherein, the Is a pre-aiming point in the LOS algorithm,Is the current position of the ship. Preferably, the drift angleThe calculation method is as follows: Wherein, the For the transverse drift velocity of the vessel,Is the velocity vector of the ship; thereby obtaining the compensated target course angle : Wherein, the The target course angle is calculated an