CN-121979196-A - Data-driven reinforcement learning ship automatic driving tracking control method

CN121979196ACN 121979196 ACN121979196 ACN 121979196ACN-121979196-A

Abstract

The invention discloses a data-driven reinforcement learning ship automatic driving tracking control method, which comprises the steps of obtaining a three-degree-of-freedom ship physical model with unknown parameters and uncertain disturbance items influenced by sea conditions, collecting ship motion data through a Doppler log, reconstructing a ship data driving model by using a cyclic neural network (RNN) method, designing a feedforward controller of a ship by adopting a backstepping method on the basis, designing a ship optimal feedback controller by adopting a self-adaptive dynamic programming based on a Radial Basis Function (RBFNN), and combining the two controllers to realize the data-driven ship automatic driving tracking control. The problems that the parameter of the ship model is difficult to accurately acquire, the acquisition speed is low and the cost is high are solved, the interpretability and the stability of the physical model are reserved, the innovation of the ship navigation system is promoted, the reinforcement learning ship track tracking automatic driving based on data driving is realized by virtue of feedforward and optimal feedback control design, and the performance of the ship navigation system is improved.

Inventors

XUE LEI
BAI WEIWEI
Dong Shoujian

Assignees

大连船舶重工集团有限公司

Dates

Publication Date: 20260505
Application Date: 20251222

Claims (7)

1. The data-driven reinforcement learning ship automatic driving tracking control method is characterized by comprising the following steps of: s1, acquiring a three-degree-of-freedom ship physical model with unknown parameters and uncertain disturbance items influenced by sea conditions, wherein the three-degree-of-freedom ship physical model comprises a kinematic model and a dynamic model; S2, based on a three-degree-of-freedom physical model of the ship, utilizing ship motion data acquired by a Doppler log and combining an RNN method to construct a data driving model of the ship; S3, designing an automatic driving feedforward controller of the ship by adopting a back-stepping method based on a data driving model of the ship; S4, defining a cost function and a Hamiltonian-Jacobian-Bellman HBJ equation based on the feedforward controller, and designing an optimal feedback controller for automatic ship driving; S5, based on the optimal feedback controller, utilizing RBFNN to approach a cost function to obtain an optimal estimated feedback controller; And S6, combining the feedback controller with the feedforward controller based on the optimal estimation to obtain a final controller.
2. The data-driven reinforcement learning ship autopilot tracking control method of claim 1, wherein step S1 includes: S11, acquiring a three-degree-of-freedom ship physical model with unknown parameters and uncertain disturbance items influenced by sea conditions, wherein the formulas of the kinematic and dynamic models are as follows: In the formula, For the position coordinates of the vessel in an inertial coordinate system of the fixed earth's surface And bow rocking angle X represents the abscissa of the coordinate system, the unit is meter/m, y represents the ordinate of the coordinate system, and the unit is meter/m; for advancing speed of ship under attached coordinate system Speed of horizontal drifting And yaw rate The speed vector is formed and is acquired through ship motion data; controlling the input of a forward force for a marine propulsion system Force of horizontal drift Moment of bow A composed control vector; is a transverse interference force caused by wind, wave and current of a ship under an accessory coordinate system Force of longitudinal disturbance And the bow disturbance moment An external environment disturbance vector is composed and satisfies ; Representation of A positive constant of the upper bound; for the transformation matrix of the coordinate system, And meet the following Wherein A real matrix representing i rows and j columns; the matrix consisting of the weight inertia and the hydrodynamic force of the ship is added, ; In the form of a coriolis Li Juzhen, ; Is a linear hydrodynamic damping parameter matrix, ; Matrix array Represented by the following forms, respectively: wherein: Unknown inertial parameters representing the weight of the vessel itself, the additional mass and the hydrodynamic derivative, including weight inertia and hydrodynamic additional inertia; Representing an unknown linear hydrodynamic damping coefficient, 。
3. The data-driven reinforcement learning ship autopilot tracking control method of claim 1, wherein step S2 includes: s21, based on a three-degree-of-freedom ship physical model, obtaining a nominal form of a dynamic system, wherein the formula is as follows: And then according to the Stoney-Weierstrass ship theorem, the nominal form of the ship dynamics system is rewritten into: In the formula, Represents the ideal RNN weight value of the RNN, And meet the following Representing the error of the ideal RNN weights from the estimated RNN weights, Representing an estimate of the RNN weights, Representing the error of the RNN reconstruction, Represents a monotonically increasing RNN activation function, i.e. satisfies: In the formula, Representing the argument of the activation function, Representing a positive constant; S22, according to the estimated RNN weight Determining a data driving model, and then the ship approximate dynamics data driving model is as follows: In the formula, The state of the data driving model is represented, and the model is obtained through ship motion data acquisition and meets the requirements of Representing the error between the actual model state and the data-driven model state, i.e. subtracting the acquired motion data, A feedback term representing the compensation reconstruction error and satisfying the formula: In the formula, Representing the design parameters of the reconstructed model, , Representing an estimated value of the adjustable parameter, ; S23, for state error And (3) deriving: In the formula, Represents ideal adjustable parameters and meets , Representing the error between the ideal adjustable parameter and the estimated adjustable parameter, ; S24 derivative according to model state error Combining with Lyapunov stability analysis strategy, constructing RNN estimation weight self-adaptive law and variable parameters To obtain the RNN estimation weight and the variable parameter, the formula of the RNN estimation weight adaptation law is: In the formula, The RNN learning rate is indicated as being, Representing variable parameter update rate, adjustable parameter estimate The adaptive law of (2) is as follows: In the formula, Representing the rate of update of the variable parameter, ; S25, combining the ship physical model in the step S11 with the ship approximate dynamics data driving model in the step S22 to construct a three-degree-of-freedom ship data driving model, wherein the formula is as follows: In the formula, Representing RNN stability weights, i.e , Representing the error between the RNN reconstruction error, the external environment disturbance vector and the RNN ideal weight and the stable weight The parameters of the composition are set up, ; Representation of Normal number of upper bound, and meets 。
4. The data-driven reinforcement learning ship autopilot tracking control method of claim 1, wherein step S3 includes: s31, defining a controller formula as follows according to a three-degree-of-freedom ship data driving model of the ship: In the formula, In the form of a feed-forward controller, In the form of a feedback controller, ; Defining the tracking error of the ship, wherein the formula is as follows: In the formula, The reference signal of the track is represented and, Representing a virtual controller for the automatic driving, Representing the track error between the track of the ship design and the track reference signal, Representing speed errors between the forward speed, the yaw rate and the yaw rate of the ship design and the desired forward speed, the yaw rate and the yaw rate; S32, deriving a ship tracking track error, and driving the model by combining the ship kinematic data in the step S25 to obtain the following form: wherein: Representation of Constructing a first Lyapunov function, the formula: according to the first derivative of the Lyapunov function And designing a virtual controller of the ship by combining with a Lyapunov stability analysis strategy, wherein the formula is as follows: wherein: representing the parameters of the control design, ; S33, for conveniently implementing the optimal feedback control design, according to the speed error of the step S31, defining: In the formula, ; And defining and deriving a ship speed error according to the controller in the step S31, and combining the ship dynamics data driving model in the step S25 to obtain the following form: wherein: Representation of Is the first derivative of (a); s34, constructing a second Lyapunov function according to the first Lyapunov function, wherein the formula is as follows: Based on a second derivative of the Lyapunov function And designing a ship feedforward controller by combining with a Lyapunov stability analysis strategy, wherein the formula is as follows: wherein: representing the parameters of the control design, ; Substituting the ship feedforward controller into a second Lyapunov function derivative to obtain: 。
5. The data-driven reinforcement learning ship autopilot tracking control method of claim 1, wherein step S4 includes: S41 according to step S34 Finally, the formula of the ship automatic driving tracking optimizing control system is as follows: S42, defining a cost function of the automatic driving tracking optimization control system The formula is: In the formula, Represents a positive matrix, an ; Represents a positive matrix, an T represents time; S43, defining a Hamiltonian according to the cost function, wherein the formula is as follows: wherein: Representation of With respect to Is a bias guide of (2); Representation of Is a transpose of (2); s44, minimizing the cost function of the automatic driving tracking optimization control system in the step S42 to obtain an optimal cost function, wherein the formula is as follows: s45, according to the optimal cost function, an HJB equation is obtained, wherein the formula is as follows: wherein: Representation of With respect to Is a bias guide of (2); S46, acquiring an optimal feedback controller by adopting a gradient descent method according to an HJB equation, wherein the optimal feedback controller comprises the following formula: 。
6. the data-driven reinforcement learning ship autopilot tracking control method of claim 1, wherein step S5 includes: S51, adopting RBFNN to approach an optimal cost function, wherein the formula for approaching the optimal cost function is as follows: In the formula, Represents the expected RBFNN weight and satisfies An error representing the weight of the neural network, Representing an estimate of the weights of the neural network, Representing the function of activation of the neural network, Representing the number of neurons in the hidden layer, Representing the approximation error of the neural network, and regarding the approximation of the optimal cost function Obtaining a bias guide, namely obtaining a bias guide of an optimal cost function, wherein the bias guide is obtained by the formula: In the formula, Representation of With respect to Is used for the deflection of the tube, Representation of With respect to Is a bias guide of (2); s52, according to the estimated neural network weight Determining an evaluation network, and obtaining an approximate optimal cost function, wherein the formula is as follows: for approximately optimal cost function Obtaining a bias guide, and obtaining a bias guide of an approximate optimal cost function, wherein the formula is as follows: s53, substituting the optimal feedback controller and the HJB function according to the partial derivatives of the optimal cost function in the step S51, wherein the formula is as follows: In the formula, Represents the system reconstruction residual, and: Similarly, according to the partial derivative of the approximate optimal cost function in step S52, the estimated optimal feedback controller and the estimated HJB equation are obtained, where the formulas are respectively: S54, optimal feedback controller according to estimation Substituting into the step S41 ship automatic driving tracking optimizing control system, wherein the formula is as follows: S55, in order to minimize the estimated HJB equation, a gradient descent method is utilized to design an RBFNN self-adaptive law, and the formula is as follows: In the formula, Representing the learning rate of the neural network, The adjustment parameters representing the additional stability term are, Representing a continuously differentiable lyapunov function.
7. The data-driven reinforcement learning ship autopilot tracking control method of claim 1, wherein step S6 includes: s61, combining the feedforward controller in the step S34 with the estimation of the optimal feedback controller in the step S53 to obtain a final controller of the ship, wherein the formula is as follows: In the formula, Representing RNN weight, the ship speed data acquired by the Doppler log comprises forward speed Speed of horizontal drifting And yaw rate Obtaining a model error, substituting the model error into the estimated weight self-adaptive law of the step S24 to calculate, Representing the derivative of the virtual controller, The parameters of the design are represented by the parameters, Represents a positive matrix, an , And Representing a tangent function; and S62, realizing automatic steering of ship track tracking according to the final controller of the ship and the RBFNN self-adaptive law of the step S55.

Description

Data-driven reinforcement learning ship automatic driving tracking control method Technical Field The invention relates to the technical field of intelligent ships, in particular to a data-driven reinforcement learning ship automatic driving tracking control method. Background With the wide application of electronic informatization technology in the shipbuilding field, a technical foundation is provided for the development of automatic ship steering. Especially, the continuous progress of sensor technology, internet of things technology and the like enables a ship to automatically sense and acquire information and data in aspects of self, marine environment and the like, so that the acquired information data are fused into the prior art system, and innovation and upgrading of a ship navigation system are forced to be driven. In the prior art, the automatic steering of the ship is realized by mostly depending on a physical model of the ship, however, the physical model parameters are difficult to accurately acquire, the acquisition period is long, and the cost is high, so that the further research of a ship navigation system is greatly restricted. Although there are few studies to realize the autopilot of a ship only through a data driving manner, such methods have problems of high dependence on data quality, poor interpretability, limited generalization capability, and the like. Therefore, the physical model of the ship is combined with the data driving method, and the accurate and efficient automatic driving is realized, so that the method has great research value. Meanwhile, due to the influence of complex sea conditions such as wind, waves, currents and the like, a serious challenge is brought to ship navigation. In order to ensure the safety and the high efficiency of sailing, the ship must have good autopilot capability. The optimal control theory, which is an advanced control concept aiming at exchanging the maximum rewards with the minimum cost, plays a vital role in the control design based on artificial intelligence, and provides solid theoretical support and technical guarantee for the intelligent transformation of the ship navigation system. Disclosure of Invention The invention aims to solve the problems that in the prior art, the physical model of the ship is mostly relied on to realize the automatic steering of the ship, however, the parameters of the physical model are difficult to accurately acquire, the acquisition period is long, the cost is high, and the further research of a ship navigation system is greatly restricted. Although there are few studies to achieve autopilot of a ship only by data-driven means, such methods have problems of high dependence on data quality, poor interpretability and limited generalization ability. In order to achieve the above object, the present invention provides a data-driven reinforcement learning ship autopilot tracking control method, comprising: s1, acquiring a three-degree-of-freedom ship physical model with unknown parameters and uncertain disturbance items influenced by sea conditions, wherein the three-degree-of-freedom ship physical model comprises a kinematic model and a dynamic model; S2, based on a three-degree-of-freedom physical model of the ship, utilizing ship motion data acquired by a Doppler log and combining an RNN method to construct a data driving model of the ship; S3, designing an automatic driving feedforward controller of the ship by adopting a back-stepping method based on a data driving model of the ship; S4, defining a cost function and a Hamiltonian-Jacobian-Bellman HBJ equation based on the feedforward controller, and designing an optimal feedback controller for automatic ship driving; S5, based on the optimal feedback controller, utilizing RBFNN to approach a cost function to obtain an optimal estimated feedback controller; And S6, combining the feedback controller with the feedforward controller based on the optimal estimation to obtain a final controller. Preferably, step S1 includes: S11, acquiring a three-degree-of-freedom ship physical model with unknown parameters and uncertain disturbance items influenced by sea conditions, wherein the formulas of the kinematic and dynamic models are as follows: In the formula, For the position coordinates of the vessel in an inertial coordinate system of the fixed earth's surfaceAnd bow rocking angleX represents the abscissa of the coordinate system, the unit is meter/m, y represents the ordinate of the coordinate system, and the unit is meter/m; for advancing speed of ship under attached coordinate system Speed of horizontal driftingAnd yaw rateThe speed vector is formed and is acquired through ship motion data; controlling the input of a forward force for a marine propulsion system Force of horizontal driftMoment of bowA composed control vector; is a transverse interference force caused by wind, wave and current of a ship under an accessory coordinate system Force of longitudinal di