CN-119509275-B - Rocket projectile attitude self-adaptive control method and system based on reinforcement learning

CN119509275BCN 119509275 BCN119509275 BCN 119509275BCN-119509275-B

Abstract

The invention discloses a rocket projectile attitude self-adaptive control method and system based on reinforcement learning, which belong to the technical field of rocket projectile attitude control, the method constructs a rocket projectile attitude self-adaptive controller through a back-stepping method, the adaptive controller realizes the self-adaption of the rocket to unknown disturbance through the reinforcement learning-based Actor-Critic (AC), so that the system can quickly converge to a desired state, unknown nonlinear observation is performed under the condition of not depending on an internal model, and tracking control of the rocket projectile attitude is realized.

Inventors

NING XIN
SUN PENGYU
WANG ZHENG
BAI YUNFEI
CHAO LUJING

Assignees

西北工业大学

Dates

Publication Date: 20260508
Application Date: 20241119

Claims (5)

1. The rocket projectile attitude self-adaptive control method based on reinforcement learning is characterized by comprising the following steps of: s1, a rocket projectile attitude kinematic model and a dynamics model are established, the kinematic model and the dynamics model are both converted into a nonlinear model of a back-stepping method, the nonlinear model comprises a first-order subsystem and a second-order subsystem, and the nonlinear model comprises unknown interference and control input; S2, constructing a virtual control law of a first-order subsystem and a tracking error derivative of a second-order subsystem, wherein unknown disturbance is arranged in the tracking error derivative; the virtual control law of the first-order subsystem is as follows: (13) Wherein x 2c is a desired value, Is the tracking error of yaw angle and pitch angle of rocket projectile, Tracking expected values representing yaw and pitch angles of the rocket projectile; the first-order subsystem and the second-order subsystem are respectively: (3) Wherein, the , 、、 Respectively nonlinear matrixes, u is a control signal, d is unknown interference, , Respectively representing the yaw angle and pitch angle of the rocket projectile, , The angular velocity of the rocket projectile around the projectile body in the projectile body coordinate system; the tracking error derivative of the second order subsystem is: (15) Wherein, the As a derivative of the state variable, Is the derivative of the virtual control quantity, d is the unknown disturbance, An estimated value of unknown interference; S3, fitting unknown disturbance through an execution neural network to obtain a self-adaptive controller for reinforcement learning, evaluating the result of the execution neural network through an evaluation neural network, wherein the evaluation neural network is approximately evaluated through a penalty function; the update rate of the execution neural network and the evaluation neural network is obtained through a gradient descent method, the update target of the execution neural network is the smallest benefit error, and the update target of the evaluation neural network is the smallest mean square error of the residual error; in S3, the performing neural network fits the unknown disturbance as: (17) Wherein, the ; S4, the self-adaptive controller outputs control signals for controlling yaw angle and pitch angle to realize rocket attitude self-adaptation; the control law of rocket projectile control is as follows: (12) Wherein, the The term is the angular tracking error Is used for compensation of (a).
2. The method for adaptively controlling the attitude of a rocket projectile based on reinforcement learning according to claim 1, wherein in S3, the update rate of the execution neural network is: (24) Wherein, the For the learning rate of the neural network, Is the gain factor.
3. A method of adaptively controlling the attitude of a rocket projectile based on reinforcement learning as defined in claim 1, wherein in S3, said penalty function approximates: (22) wherein the basis functions are bounded, satisfy 。
4. A rocket projectile attitude adaptive control method based on reinforcement learning according to claim 3, wherein the update rate of the evaluation neural network is: (26) Wherein, the 。
5. A reinforcement learning based rocket projectile attitude adaptive control system for implementing the control method of claim 1, comprising: the nonlinear module is used for establishing a rocket projectile attitude kinematic model and a dynamics model, converting the kinematic model and the dynamics model into nonlinear models of a back-stepping method, wherein the nonlinear models comprise a first-order subsystem and a second-order subsystem, and the nonlinear models comprise unknown interference and control input; the subsystem module is used for constructing a virtual control law of the first-order subsystem and a tracking error derivative of the second-order subsystem, and unknown disturbance is arranged in the tracking error; the disturbance fitting module is used for fitting unknown disturbance through an execution neural network to obtain a self-adaptive controller for reinforcement learning, and evaluating the result of the execution neural network through an evaluation neural network, wherein the evaluation neural network is approximately evaluated through a penalty function; the update rate of the execution neural network and the evaluation neural network is obtained through a gradient descent method, the update target of the execution neural network is the smallest benefit error, and the update target of the evaluation neural network is the smallest mean square error of the residual error; and the control module is used for outputting control signals through the self-adaptive controller rocket projectile, wherein the control signals are used for controlling yaw angle and pitch angle, and realizing rocket projectile posture self-adaptation.

Description

Rocket projectile attitude self-adaptive control method and system based on reinforcement learning Technical Field The invention belongs to the technical field of rocket projectile attitude control, and relates to a rocket projectile attitude self-adaptive control method and system based on reinforcement learning. Background Guidance and precision of rocket projectiles become research hot spots for remote precision striking. But in the flying process of the rocket projectile, the mass, the speed and the pneumatic coefficient of the projectile body are continuously changed, so that the parameters of the projectile body are rapidly changed, the dynamic characteristics of the projectile body are highly nonlinear and contain uncertain factors, and the flying parameters are mutually coupled and are severely changed along with time, so that the system is a multivariable strong-coupling time-varying system, and the attitude of the rocket projectile is difficult to adaptively control. Disclosure of Invention The invention aims to overcome the defects of the prior art and provide a rocket projectile posture self-adaptive control method and system based on reinforcement learning, so as to solve the technical problem that the rocket projectile posture is difficult to self-adaptively control in the prior art. In order to achieve the purpose, the invention is realized by adopting the following technical scheme: a rocket projectile attitude self-adaptive control method based on reinforcement learning comprises the following steps: s1, a rocket projectile attitude kinematic model and a dynamics model are established, the kinematic model and the dynamics model are both converted into a nonlinear model of a back-stepping method, the nonlinear model comprises a first-order subsystem and a second-order subsystem, and the nonlinear model comprises unknown interference and control input; s2, constructing a virtual control law of a first-order subsystem and a tracking error derivative of a second-order subsystem, wherein unknown disturbance is arranged in the tracking error; S3, fitting unknown disturbance through an execution neural network to obtain a self-adaptive controller for reinforcement learning, evaluating the result of the execution neural network through an evaluation neural network, wherein the evaluation neural network is approximately evaluated through a penalty function; the update rate of the execution neural network and the evaluation neural network is obtained through a gradient descent method, the update target of the execution neural network is the smallest benefit error, and the update target of the evaluation neural network is the smallest mean square error of the residual error; And S4, the self-adaptive controller outputs a control signal for controlling a yaw angle and a pitch angle to realize rocket projectile posture self-adaptation. The invention further improves that: Preferably, in S1, the first-order subsystem and the second-order subsystem are respectively: Wherein ,f1(x1)＝(0 0)T,f2(x1,x2)、g1(x1)、g2(x1,x2) is a nonlinear matrix, u is a control signal, d is unknown interference, x 1 = (ψθ), ψθ are yaw angle and pitch angle of the rocket projectile, and x 2＝(ωy4 ωz4),ωy4,ωz4 is angular velocity of the rocket projectile around the projectile body in the projectile body coordinate system. Preferably, in S3, the virtual control of the first node subsystem is: x2c＝g1(x1)-1(-k1e1-f1(x1)+x1d),k1＞0 (13) Where x 2c is the desired value, g 1(x1) is a shorthand for a nonlinear matrix, e 1 is the tracking error of yaw and pitch angles of the rocket projectile, and x 1d is the tracking desired value of yaw and pitch angles of the rocket projectile. Preferably, in S2, the tracking error derivative of the second subsystem is: Wherein, the As a derivative of the state variable,Is the derivative of the virtual control quantity, d is the unknown disturbance,Is an estimate of the unknown interference. Preferably, in S3, the performing the neural network fits the unknown disturbance as: Wherein, the Preferably, in S3, the update rate of the execution neural network is: wherein Γ is the learning rate of the neural network, and Ω is the gain coefficient. Preferably, in S3, the pair penalty function approximates: Wherein the basis functions are bounded, satisfies phi c||≤ΦcM. Preferably, the update rate of the evaluation neural network is: Wherein, the Preferably, in S2, the control law of rocket projectile control is: Wherein g 2-1(g1e1) term is the compensation for the angle tracking error e 1. A reinforcement learning-based rocket projectile attitude adaptive control system, comprising: the nonlinear module is used for establishing a rocket projectile attitude kinematic model and a dynamics model, converting the kinematic model and the dynamics model into nonlinear models of a back-stepping method, wherein the nonlinear models comprise a first-order subsystem and a second-order subsystem, and the nonlinear models co