CN-122018403-A - Autonomous control method based on knowledge reforming and intuitionistic function heuristic of structure guidance

CN122018403ACN 122018403 ACN122018403 ACN 122018403ACN-122018403-A

Abstract

The application provides an autonomous control method based on knowledge reforming and intuitive function heuristic of structure guidance, which relates to the technical field of collision avoidance control, and comprises the steps of sorting and classifying orbit evolution data under different orbit heights and different disturbance environments, carrying out structure guidance through an orbit dynamics model, constructing a state space, an action space and a reward function for reinforcement learning training, and training by adopting an offline reinforcement learning algorithm to obtain an avoidance control strategy library; obtaining system state quantity and mapping the system state quantity by an avoidance control strategy library to obtain reference control quantity, setting a prediction time domain and constructing a prediction model, constructing an objective function, and solving an optimization problem to obtain a control sequence. The method can improve generalization and pertinence of the control strategy, reduce the complexity of optimizing solution, ensure the control precision of the satellite orbit, support long-term autonomous operation of the spacecraft in a complex environment, and ensure the high efficiency and reliability of the avoidance control of the spacecraft.

Inventors

BAI XUE
Xin Linke
XU MING
WANG XIAOYI
QI XUGUANG
DING JIXIN
CHEN ZHAOYUE

Assignees

北京航空航天大学

Dates

Publication Date: 20260512
Application Date: 20260210

Claims (10)

1. An autonomous control method based on structure-guided knowledge reforming and intuitive function heuristics, comprising: According to the orbit height of the own satellite and the known disturbance condition, selecting a corresponding orbit dynamics model to model and simulate orbit evolution to obtain simulation data; Combining the acquired historical data set with the simulation data, and sorting and classifying orbit evolution data under different orbit heights and different disturbance environments to obtain a basic orbit data set covering a multi-orbit area and a multi-disturbance scene, wherein the historical data set comprises orbit parameters of satellites and orbit evolution records under different disturbance conditions; Based on the basic orbit data set, carrying out structure guidance through the orbit dynamics model, selecting, combining and encoding parameters in the data, constructing a state space, an action space and a reward function for reinforcement learning training, and training by adopting an offline reinforcement learning algorithm to obtain an avoidance control strategy library; Acquiring system state quantity containing the position, speed, safety distance and height of the own satellite and the target satellite, and mapping the system state quantity by the avoidance control strategy library to obtain reference control quantity; And (3) constructing an objective function punishment position deviation expected orbit, a control quantity deviation reference value and a predicted terminal position deviation, solving an optimization problem to obtain a control sequence, and controlling the satellite by predicting a control law of a control MPC through a model inspired by an intuition function.
2. The structure-guided knowledge-based autonomous control method of knowledge reforming and intuition function heuristics of claim 1, wherein the state space design process comprises: Determining and environment perception results according to the orbits represented by the basic orbit data set, and selecting and arranging the orbit quantity and the environment quantity in the basic orbit data set by combining with a selected orbit dynamics model to obtain a plurality of parameters directly related to an avoidance task, wherein the plurality of parameters comprise own satellite positions Sum speed of Location of target satellites or space debris to be evaded Sum speed of Safety distance threshold And track height ; Uniformly coding the state of each sample to obtain a comprehensive state vector : Based on the need to integrate state vector The state components in the model are subjected to coordinate unification, scale normalization and numerical stability treatment; Wherein the own satellite position Sum speed of For characterizing the orbit state of own satellite in the selected reference coordinate system, the position of the target satellite or space fragment to be avoided Sum speed of The reference system of (2) is the same as that of the own satellite; the safe distance threshold Preset according to task requirements and collision risk control requirements and used for judging whether potential collision threats exist or not, wherein the track height is as follows For characterizing the current trajectory region and corresponding to the corresponding trajectory dynamics model.
3. The structure-guided knowledge-based autonomous control method of reforming and intuitionistic function heuristics of claim 2, wherein the orbit dynamics model under inertial frame satisfies the expression: In the formula (I), in the formula (II), For the position of the satellite under the inertial frame, Is that The second derivative with respect to time is, Is that Is a die length of (2); is the perturbation acceleration; To control acceleration terms; constants describing earth's base gravitation; the motion of the motion space is used for representing a reference control quantity in a control period The reference control amount For determining maneuver instruction, the form of motion of the motion space corresponding to the selected control acceleration term in the orbit dynamics model 。
4. The structure-guided knowledge reforming and intuitive function heuristic autonomous control method of claim 3, wherein the reward function comprises a security index and a resource consumption index to guide reinforcement learning to trade-off between evasion effect and boost resource consumption; The safety index is the relative distance between the own satellite and the target satellite : If the relative distance Greater than a preset safe distance threshold Giving positive rewards in the current step strategy if the relative distance Less than the safe distance threshold Applying negative rewards on the current step strategy to punish the control sequences which are not effectively avoided; the fuel consumption index is a fuel consumption value corresponding to each step of maneuver, the fuel consumption index is added into the reward function in a punishment item mode, the fuel consumption value is added into the reward function in a punishment item mode, and the fuel consumption value is positively correlated with punishment to guide reinforcement learning to preferentially select an avoidance maneuver with the lowest fuel consumption on the premise of meeting the safety requirement, so that the overall propellant consumption is reduced.
5. The method for autonomous control based on structure-guided knowledge reconstruction and intuitional function heuristics of claim 2, wherein said obtaining system state quantities including position, velocity, safe distance and altitude of own satellite and target satellite, mapping by said avoidance control policy library to obtain reference control quantities, distinguishing internal dynamics state and cost function state, setting prediction time domain and constructing prediction model comprises: Acquisition of System state quantity at time The system state quantity Comprising own satellite positions Satellite speed per side Target satellite position Target satellite speed Distance of safety Height of ; Calculating to obtain a reference control quantity based on the avoidance control strategy library The reference control amount From the system state quantity Mapping to obtain; distinguishing an internal dynamic state from a state used by a cost function, wherein the internal dynamic state For advancing orbital evolution in a predictive model, and satisfies: selecting own satellite position to participate in quadratic penalty by using cost function, and using state vector by using cost function Satisfy the following requirements ; Setting the predicted time domain length as And constructing an internal prediction model based on a preset orbit dynamics model, and pushing the orbit evolution according to the internal dynamics states and corresponding control amounts of the current and future prediction steps through the internal prediction model.
6. The method for autonomous control based on structure-guided knowledge reforming and intuitive function heuristics of claim 5, wherein constructing an objective function penalizing position deviations from a desired orbit, control quantity deviations from a reference value, and predicted end position deviations, solving an optimization problem to obtain a control sequence to control a satellite in the party by predicting a control law controlling an MPC through a model of the intuitive function heuristics comprises: Construction of objective functions The objective function Punishment of own satellite position deviation from expected orbit state and deviation of control quantity from reference control quantity Punishment is carried out on own satellite position deviation of the prediction time domain tail end; Solving the optimization problem corresponding to the objective function to obtain the future Only executing a first step control amount in the control sequence, and updating the internal dynamics state according to the satellite actual dynamics model; judging whether the relative distance between the updated own satellite and the target satellite reaches a preset safe distance If yes, ending the control and entering the next control flow, and if not, repeating the steps until the track avoidance is completed.
7. The structure-guided knowledge-based autonomous control method of knowledge reforming and intuitionistic function heuristics of claim 1, further comprising: and (3) constructing a data reflux mechanism, and after each avoidance task is completed, refluxing actual data obtained in the execution process of the task to a historical data set for retraining or incremental updating of subsequent offline reinforcement learning.
8. An autonomous control system based on structure-guided knowledge reforming and intuitive function heuristics, comprising: the modeling simulation module is used for selecting a corresponding orbit dynamics model to model and simulate orbit evolution according to the orbit height of the own satellite and the known disturbance condition, so as to obtain simulation data; The data processing module is used for combining the acquired historical data set with the simulation data, and sorting and classifying orbit evolution data under different orbit heights and different disturbance environments to obtain a basic orbit data set covering a multi-orbit area and a multi-disturbance scene, wherein the historical data set comprises orbit parameters of satellites and orbit evolution records under different disturbance conditions; the strategy training module is used for carrying out structure guidance through the orbit dynamics model based on the basic orbit data set, selecting, combining and encoding parameters in the data, constructing a state space, an action space and a reward function for reinforcement learning training, and training by adopting an offline reinforcement learning algorithm to obtain an avoidance control strategy library; the model construction module is used for acquiring system state quantity comprising the position, the speed, the safety distance and the height of the own satellite and the target satellite, and mapping the system state quantity by the avoidance control strategy library to obtain reference control quantity; And the optimization control module is used for constructing a target function punishment position deviation expected orbit, a control quantity deviation reference value and a prediction end position deviation, solving an optimization problem to obtain a control sequence, and controlling the satellite in the square by predicting a control law of the control MPC through a model inspired by an intuitionistic function.
9. An electronic device comprising a processor, a memory, and a program stored on the memory and executable on the processor, the program when executed by the processor implementing the structure-guided knowledge-based reforming and intuitive function heuristic autonomous control method of any one of claims 1 to 7.
10. A computer readable storage medium, wherein a program or instructions is stored on the computer readable storage medium, which when executed by a processor, implements the autonomous control method of structure-guided knowledge reforming and intuitive function heuristics according to any one of claims 1 to 7.

Description

Autonomous control method based on knowledge reforming and intuitionistic function heuristic of structure guidance Technical Field The application relates to the technical field of collision avoidance control, in particular to an autonomous control method based on knowledge reforming and intuitionistic function heuristic of structure guidance. Background In the satellite in-orbit operation process, the space environment is complex and changeable, collision risks continuously exist, in order to ensure the satellite to run safely and stably in orbit, accurate assessment of the space situation is completed through core means such as dynamic modeling, optimal control, orbit prediction and environment perception, and a reliable collision prevention control instruction is generated according to an assessment result, which is also a core design requirement and a key realization path of the autonomous collision prevention control system of the spacecraft. In the prior art, two main flow technical schemes are mainly formed aiming at spacecraft autonomous collision avoidance control, one is to solve collision avoidance control instructions aiming at minimum fuel consumption or shortest maneuvering time by adopting a Model Predictive Control (MPC) method, an MPC-based adaptive salifying method is also provided in related researches, performance of MPC in the process of processing the constraint of the spacecraft and the rotating target collision avoidance is optimized by introducing an ellipsoidal collision avoidance area (KOZ) and an adaptive reference point method, calculation efficiency of constraint processing is improved, the other is to introduce Reinforcement Learning (RL) into the spacecraft collision avoidance control field, the reinforcement learning is used for relying on optimization characteristics of a model, an optimal control strategy is independently learned by interaction with the environment, the related researches model collision avoidance problems as Part of Observable Markov Decision Processes (POMDP), and a Deep Recursion Q Network (DRQN) is adopted for training a decision agent, so that spacecraft autonomous collision avoidance planning and planning under incomplete information are realized. However, the two types of technical schemes have obvious defects in practical application, the requirements of high instantaneity, high stability and high safety of on-orbit collision avoidance control of a spacecraft are difficult to meet, the pure MPC method is required to solve an optimization problem on line, along with the increase of constraint quantity and prediction step length, the calculated quantity can expand rapidly, even if the self-adaptive salifying strategy is introduced to optimize the calculation efficiency, the problems of large calculated quantity and poor instantaneity still exist when complex nonlinear constraint is processed, decision delay is easy to cause, the method is poor in adaptability to an uncertainty environment, failure or divergence is easy to occur in the solving process, the pure reinforcement learning method is free of structure guidance and strategy output instability, the training process of the method is highly dependent on a large amount of effective data, the training efficiency is low, the time is long, the control strategy which does not accord with physical motion feasibility is easy to be output due to the lack of dynamic structure guidance of the spacecraft, the situation that the strategy output is inaccurate or even diverges easily occurs in a complex or unknown space environment, the stability and safety of collision avoidance control cannot be greatly guaranteed, and the on-orbit running risk of the spacecraft is greatly increased. Disclosure of Invention Aiming at the defects of the prior art, the application provides an autonomous control method based on knowledge reforming and intuitive function heuristic of structure guidance, which solves the problems of large calculated amount, poor instantaneity, lack of structure guidance for reinforcement learning and unstable strategy of the model predictive control MPC method in the autonomous collision prevention control of the existing spacecraft. In order to achieve the above purpose, the application is realized by the following technical scheme: In a first aspect, an embodiment of the present application provides an autonomous control method based on knowledge reforming and intuitive function heuristic guided by a structure, where the method includes selecting a corresponding orbit dynamics model to model and simulate an orbit evolution according to an orbit height where a own satellite is located and a known disturbance condition, so as to obtain simulation data; combining the obtained historical data set with simulation data, sorting and classifying orbit evolution data in different orbit heights and different disturbance environments to obtain a basic orbit data set covering a multi-orbit area and a mult