CN-115973454-B - Method for adjusting pose of failure spacecraft based on reinforcement learning

CN115973454BCN 115973454 BCN115973454 BCN 115973454BCN-115973454-B

Abstract

The invention discloses a fast adjustment method for the pose of a failed spacecraft based on reinforcement learning, which comprises the steps of S1, building a mathematical model and a constraint model for the pose of the failed spacecraft based on the end constraint of the pose of the spacecraft, S2, building a judgment standard and a Critic network based on a Long-term performance index function in a reinforcement learning algorithm, and S3, building a self-adaptive control method based on a Backstepping control framework and combining an Action network and the Critic network so as to control the failed spacecraft to enter an end constraint domain. The invention realizes the rapid posture adjustment of the failed spacecraft before the posture movement evolution, and enters the preset ignition maneuver orientation.

Inventors

HUANG JING
MENG YIZHEN
TIAN LULU
SUN JUN
ZHU DONGFANG

Assignees

上海航天控制技术研究所

Dates

Publication Date: 20260508
Application Date: 20221223

Claims (4)

1. A method for quickly adjusting the pose of a failed spacecraft based on reinforcement learning is characterized by comprising the following steps: step S1, establishing a failure spacecraft attitude mathematical model and a constraint model based on spacecraft attitude end constraint; The constraint model of the failed spacecraft comprises: The end constraint of the failed spacecraft is selected as follows according to the installation layout and the thrust vector of the thruster of the failed spacecraft: -q m ≤q 2 ≤q m -ω m ≤ω y ≤ω m Wherein q m ,ω m ,g min ,g max is the upper limit of the second attitude quaternion parameter, the upper limit of the pitch angle speed, the lower limit of the ratio of the third attitude quaternion to the yaw angle speed and the upper limit of the ratio of the third attitude quaternion to the yaw angle speed respectively; The constraint factors are simultaneously satisfied by an ellipsoid constraint field s 2 as follows: the failed spacecraft attitude mathematical model is a failed spacecraft attitude dynamics and kinematics model, and the calculation formula is as follows: Wherein q=col (q v ,q 4 ) is a spacecraft state description based on quaternion, q v ＝[q 1 ,q 2 ,q 3 ] T , subscript v represents a quaternion vector part, q 1 ～q 4 represents four components of a spacecraft attitude quaternion respectively, ω= [ omega x ,ω y ,ω z ] T ] represents a triaxial rotation angular velocity of a spacecraft body system B relative to an inertial system I, ω x ,ω y ,ω z represents an angular velocity of a spacecraft x, y and z axes respectively, J represents a rotational inertia matrix of positive symmetry of the spacecraft, τ, T d represents a control moment and external disturbance and system modeling error suffered by the spacecraft respectively, I n represents an n-dimensional identity matrix, and n=3; s2, establishing a judgment standard and a Critic network based on a Long-term performance index function in a reinforcement learning algorithm; and step S3, establishing an adaptive control method based on a Backstepping control framework in combination with an Action network and the Critic network so as to control the failed spacecraft to enter the terminal constraint domain.
2. The method for quickly adjusting the pose of a failed spacecraft based on reinforcement learning according to claim 1, wherein said step S2 comprises: The Long-term based performance index function is as follows: The control system comprises a control system, a controller, a control system and a control system, wherein T >0 is a small reinforcement learning integral step length, gamma epsilon (0, 1) is a discount factor, if the control system state enters an attraction domain, the control target is realized, the Long-term performance index function J (T) is not increased, and if the control system state deviates from the attraction domain, the controller is used for adjusting the control output so that the control system state faces to an end constraint domain or is kept in the constraint domain; Thus, the desired performance index J d (t) =0, p(s) is defined to include the Long-term performance index, p (s (ζ)) is as follows: wherein s 2 (t) represents an ellipsoid constraint domain at time t, s (ζ) represents a square root of the ellipsoid constraint domain at time ζ, ζ is an integral time variable, c p >0 is a relaxation factor to be designed, namely, p (s (ζ))=0 represents a well-characterized control output, and p (s (ζ))=1 represents a poor current control output, 1 represents a continuous increase of the performance index function J (t) such that the control result is poor, the spacecraft attitude deviates from the end constraint domain, and 0 represents a continuous decrease of the performance index function J (t) such that the control result is good, and the spacecraft attitude enters the end constraint domain.
3. The method for quickly adjusting the pose of a failed spacecraft based on reinforcement learning according to claim 2, wherein said step S2 further comprises: constructing a Bellman error equation, and establishing a relation between J (T-T) and J (T): J(t-T)=γ -1 (J(t)+p c ) Wherein, the The performance index function is integrated in a punishment way in a section [ T-T, T ]; And (3) completing the calculation of the Critic network by adopting a time difference method: Estimating by using RBF neural network, and solving nonlinear performance index Wherein H c (x c (t)) is a RBF nonlinear activation function, x c (t)＝[s,q v T ,ω T ] T is defined, s represents the square root of an ellipsoidal constraint domain, Representing an estimate of an ideal network weight; according to the Backstepping control framework, z 2 ＝q v ,z 3 ＝ω-ω c ,ω c is defined as the virtual control quantity of the design Wherein, the K 1 is a positive diagonal array; The RBF neural network adaptation law is: Wherein, the For the estimated value of p c , Δh c (t)＝H c (x c (t))-γH c (x c (t-T)),Λ c is a learning matrix with positive and negative angles, l c ,η p , Η p ,l Γ is the normal number to be designed, and k= [1, 1] is the matching matrix of the matrix dimension.
4. The method for quickly adjusting the pose of a failed spacecraft based on reinforcement learning according to claim 3, wherein the step S3 comprises the following steps: Where Λ a is a learning rate matrix of positive and negative angles, k a is a normal number to be designed, H a (x a ) is expressed as an RBF nonlinear activation function, Representing an estimate of an ideal network weight; the high-reliability gesture adjustment control law under the constraint condition of the preset end state based on reinforcement learning is as follows: Wherein, the Definition of the definition In order to reduce the calculation amount of the online estimation, the norm is adopted for estimation to obtain K θ represents the normal number to be designed, η is a learning matrix of positive diagonal, and l h is the normal number to be designed;

Description

Method for adjusting pose of failure spacecraft based on reinforcement learning Technical Field The invention relates to the technical field of reinforcement learning control of control systems, in particular to a method for quickly adjusting the pose of a failed spacecraft based on reinforcement learning. Background Since the 1 st artificial spacecraft in 1957 was lifted off, the application of the spacecraft and the development of human society are more and more compact, but the problem of space debris environment is more and more prominent with the increasing number of objects entering the outer space. The failed spacecraft is an important generation source of near-earth orbit space fragments, the space is reserved for a long time after the spacecraft fails to occupy orbit resources, a large amount of fragments are possibly generated, serious accidents are caused, even chain reactions are caused, and extremely adverse effects are brought to the high-value space spacecraft and normal spacecraft activities. Therefore, development of a technology capable of realizing full-autonomous, high-reliability and rapid maneuvering control of a failed spacecraft is urgently needed. The existing off-orbit sails, electric ropes, solar sails, electric propulsion and other forms are of milli-newton level, have poor maneuvering performance and long orbit separation time, and cannot meet the rapid processing requirement of the failed spacecraft when the spacecraft has larger mass or higher orbit. The solid propulsion system can generate extremely large total impulse in a short time, realizes rapid ignition maneuver, is easy to expand an autonomous functional module, gets rid of dependence on the attitude control capability of a spacecraft platform through a highly reliable autonomous maneuver decision of the fully autonomous system under the condition of unstable attitude, can realize the fully autonomous of the maneuver process, and is an ideal choice of the fully autonomous highly reliable rapid maneuver system of the spacecraft. Disclosure of Invention Aiming at the rapid attitude maneuver control of the failed spacecraft before the attitude motion evolution, the invention provides a rapid attitude adjustment method of the failed spacecraft based on reinforcement learning, which overcomes the uncertainty such as moment of inertia and the like and the external disturbance influence existing in the system and ensures that the system enters an end attitude control area with high reliability and rapidness. In order to achieve the above object, the present invention is realized by the following technical scheme: A method for quickly adjusting the pose of a failed spacecraft based on reinforcement learning comprises the following steps of S1, establishing a mathematical model and a constraint model of the pose of the failed spacecraft based on the end constraint of the pose of the spacecraft. And S2, establishing a judgment standard and a Critic network based on the Long-term performance index function in the reinforcement learning algorithm. And step S3, establishing an adaptive control method based on a Backstepping control framework in combination with an Action network and the Critic network so as to control the failed spacecraft to enter the terminal constraint domain. Optionally, the step S1 comprises the steps that the failed spacecraft attitude mathematical model is a failed spacecraft attitude dynamics and kinematics model, and the calculation formula is as follows: wherein q=col (q v,q4) is a spacecraft state description based on quaternion, q v＝[q1,q2,q3]T, subscript v represents a quaternion vector part, q 1～q4 represents four components of a spacecraft attitude quaternion respectively, ω= [ omega x,ωy,ωz]T ] represents a rotation angular velocity of a spacecraft body system B relative to an inertial system I, ω x,ωy,ωz represents an angular velocity of a spacecraft x, y and z axes respectively, J represents a rotational inertia matrix of positive symmetry of the spacecraft, τ and T d represent control moment, external disturbance and system modeling error suffered by the spacecraft respectively, and I n represents an n-dimensional identity matrix, n=3. Optionally, the constraint model of the failed spacecraft includes: The end constraint of the failed spacecraft is selected as follows according to the installation layout and the thrust vector of the thruster of the failed spacecraft: -qm≤q2≤qm -ωm≤ωy≤ωm Wherein q m,ωm,gmin,gmax is the upper limit of the second attitude quaternion parameter, the upper limit of the pitch angle speed, and the upper limit of the ratio of the third attitude quaternion to the yaw angle speed, respectively. The constraint factors are simultaneously satisfied by an ellipsoid constraint field s 2 as follows: optionally, the step S2 comprises the following steps based on the Long-term performance index function: The control system comprises a control system, a controller, a control system