CN-122018319-A - MPC control method and system for terminal strategy optimization heuristic by considering interference

CN122018319ACN 122018319 ACN122018319 ACN 122018319ACN-122018319-A

Abstract

The application provides a terminal strategy optimization heuristic MPC control method and system considering interference, and relates to the technical field of control, wherein the method comprises the steps of constructing a discrete time nonlinear dynamics model containing time-varying interference; introducing a disturbance predictor, obtaining disturbance model parameters through historical disturbance data fitting, predicting future disturbance by the disturbance predictor, obtaining optimal parameters of a parameterized terminal cost function through supervision and learning to obtain the optimal terminal cost function, constructing a terminal model predictive control MPC framework to form a corresponding terminal model predictive control MPC optimization problem, embedding the optimal terminal cost function into the terminal MPC framework to replace the traditional terminal cost, solving the terminal MPC optimization problem with constraint, and outputting a control law to complete on-line control. The application can disturb the scene during adaptation, reduce the control errors caused by interference and various uncertain factors, and improve the stability, the accuracy and the reliability of the online control of the nonlinear unmanned system.

Inventors

WANG XIAOYI
XU MING
QI XUGUANG
BAI XUE
DING JIXIN
CHEN ZHAOYUE

Assignees

北京航空航天大学

Dates

Publication Date: 20260512
Application Date: 20260226

Claims (10)

1. An MPC control method for terminal policy optimization heuristic considering interference, comprising: Constructing a discrete time nonlinear dynamics model containing time-varying interference in a control process of a nonlinear unmanned system to represent the evolution relation of a system state vector along with control input, time-varying interference, system white noise and modeling errors; Introducing a disturbance predictor g, and obtaining disturbance model parameters through historical disturbance data fitting Estimating the current disturbance by sensing or identifying in the kth step And predicting disturbance of the future step i by using a disturbance predictor g to obtain a disturbance prediction result ; Obtaining parameterized terminal cost function through supervised learning Optimum parameters of (a) Obtaining an optimal terminal cost function ; Constructing a terminal Model Predictive Control (MPC) framework, forming a corresponding terminal model predictive control MPC optimization problem, and optimizing the terminal cost function Embedding the terminal MPC framework to replace the traditional terminal cost; combining discrete time nonlinear dynamics model and disturbance prediction result And solving a constrained terminal MPC optimization problem at a time k, outputting a control law to output control input to the unmanned system based on the control law and completing on-line control.
2. The MPC control method of claim 1 wherein the discrete time nonlinear dynamics model satisfies the expression: ; Where k is the discrete time step, And Respectively represent the kth discrete time step and the kth discrete time step The system state vector at discrete time steps, In order to control the input of the device, As a time-varying recurrent interference that can be modeled, Representing at least one of system white noise and modeling error; the method is a nonlinear state transfer function and characterizes the evolution rule of the system state; is continuously differentiable in the definition domain and meets ; The system state vector includes at least one of a relative position, a speed, an attitude angle, an angular speed, and the control input includes at least one of a propeller pulse, a moment, and a force distribution.
3. The MPC control method of claim 2 wherein the terminal model predictive control MPC framework satisfies the expression: ; In the formula, Indicating the current time k versus future time Predicting values made by the system state vector; Representing a predicted value made by a system state vector of the kth+N step in the future at the current moment k, wherein N is a positive integer; Indicating the current time k versus future time A predicted value made by the control input of the step; Is a parameter to be estimated; as a phase cost function and for characterizing in the prediction time domain Tracking the degree of deviation of the desired state and controlling the input Consumed control resources; the method comprises the steps that a terminal cost function is parameterized, and the terminal cost function is used for quantifying the comprehensive cost of a future operation process of a system in a predicted time domain end point state; for optimizing cost function and for characterizing the current discrete time step k when the system is in state And when the MPC optimization problem is predicted and controlled by solving the terminal model, the optimal accumulated cost can be achieved.
4. The method for MPC control taking into account interference terminal policy optimization heuristics of claim 3, wherein said terminal model predictive control MPC framework comprises a set of states And input constraint set And introducing optimal parameters As a terminal item; the constraint condition of the terminal model predictive control MPC framework meets the expression: ; In the formula, ; As a set of states, Is a set of input constraints; in order to perturb the predicted result, Indicating the current time k versus future time Predicting values made by the system state vector; The prediction state of the 0 th prediction step when predicting the future at the discrete time step k of the current moment is shown, and corresponds to the starting point of the prediction time domain.
5. The method for MPC control of terminal policy optimization heuristic considering interference as claimed in claim 4, wherein said parameterized terminal cost function is obtained through supervised learning Optimum parameters of (a) Obtaining an optimal terminal cost function Comprising: Acquiring expert data sets covering different time and different phase interference samples In which, in the process, In order to train the state samples, A baseline control input for the long-range MPC, A baseline controller is indicated and is shown to be, Is that A corresponding long-domain cost is used, Is the number of samples; Parameterized terminal cost function Set as a linear parameterized basis function form: In which, in the process, In order to be able to estimate the parameters, The transpose is represented by the number, Is a predefined basis function vector; Solving the SMP convex optimization problem with the descent constraint on the parameter to be estimated theta to obtain the optimal parameter To determine an optimal terminal cost function The SMP convex optimization problem with the descent constraint is used for optimizing the parameterized terminal cost function in an offline supervised learning stage To be estimated parameters of (a) 。
6. The MPC control method of claim 5 wherein the SMP convex optimization problem with descent constraint satisfies the expression: ; ; Wherein, the For regularization coefficients, for preventing overfitting of the parameterized terminal cost function, Is a regularization term; in order to train the total number of samples, Training a state sample; is the first Control inputs for the individual samples; is the first Disturbance estimation of the individual samples; Is that A corresponding long-domain cost; representing parameters to be estimated Belongs to d-dimensional real vector space; Representing the real number domain; As a phase cost function, and characterize the first The instantaneous operation cost under each sample is used for quantifying state tracking deviation and controlling energy consumption; is shown in the first Based on training state samples under a plurality of training samples Control input Disturbance estimation And calculating a terminal cost function value corresponding to the predicted state at the next moment.
7. The MPC control method of terminal policy optimization heuristics considering interference of claim 5, wherein the predefined basis function vector is at least one of a radial basis function vector, an orthogonal polynomial basis function vector; Under the condition that the stage cost is quadratic, the SMP convex optimization problem with the descent constraint is a convex quadratic constraint problem of quadratic objective and linear inequality constraint, and a convex optimization solver is adopted to carry out offline solving.
8. An MPC control system that considers terminal policy optimization heuristics for interference, comprising: the model construction module is used for constructing a discrete time nonlinear dynamics model containing time-varying interference in the control process of the nonlinear unmanned system so as to represent the evolution relationship of the system state vector along with the control input, the time-varying interference, the system white noise and the modeling error; the prediction module is used for introducing a disturbance predictor g and obtaining disturbance model parameters through historical disturbance data fitting Estimating the current disturbance by sensing or identifying in the kth step And predicting disturbance of the future step i by using a disturbance predictor g to obtain a disturbance prediction result ; The parameter learning module is used for obtaining a parameterized terminal cost function through supervised learning Optimum parameters of (a) Obtaining an optimal terminal cost function ; The framework construction module is used for constructing a terminal Model Predictive Control (MPC) framework, forming a corresponding terminal model predictive control MPC optimization problem and carrying out optimal terminal cost function Embedding the terminal MPC framework to replace the traditional terminal cost; A solving control module for combining the discrete time nonlinear dynamics model and the disturbance prediction result And solving a constrained terminal MPC optimization problem at a time k, outputting a control law to output control input to the unmanned system based on the control law and completing on-line control.
9. An electronic device comprising a processor, a memory and a program stored on the memory and executable on the processor, the program when executed by the processor implementing the terminal policy optimization heuristic method taking into account interference according to any of claims 1 to 7.
10. A computer readable storage medium, wherein a program or instructions is stored on the computer readable storage medium, which when executed by a processor, implements an MPC control method taking into account interference terminal policy optimization heuristics according to any of claims 1 to 7.

Description

MPC control method and system for terminal strategy optimization heuristic by considering interference Technical Field The application relates to the technical field of control, in particular to an MPC control method and system considering terminal strategy optimization heuristic of interference. Background The model predictive control (Model Predictive Control, MPC) is used as a high-efficiency nonlinear system control method, and can be widely applied to complex control scenes such as unmanned systems, spacecraft autonomous detection, on-orbit service autonomous control and the like by virtue of the advantages of being capable of effectively processing multiple constraints and multiple variables. The terminal cost/terminal value is used as a core component of the MPC framework, so that the solving efficiency, control performance and system closed-loop stability of the MPC optimization problem are directly determined, and the design rationality and suitability are key for improving the control robustness and the online adaptability of the unmanned system in a complex environment. However, structural time-varying interference which recurs with time commonly exists in a complex control scene, meanwhile, the system is easily influenced by uncertain factors such as white noise, modeling errors and the like, the fixed design mode of the traditional terminal cost is difficult to meet the control requirements of high precision and high real-time, so that how to construct the terminal cost which can adapt to the time-varying interference, reduce estimation errors and give consideration to real-time performance and robustness becomes a technical problem to be solved urgently in the technical field of the current MPC control. In order to solve the above-mentioned deficiency of terminal cost design, there have been many attempts to improve the related art in recent years. On one hand, partial researches adopt a data driving or reinforcement learning method to construct and approach the terminal cost/terminal value of the MPC, learn the terminal cost as a function capable of shortening a prediction domain and improving local decision quality, try to optimize the adaptive capacity of the terminal cost through the self-adaptive characteristic of the data driving, and on the other hand, the traditional robust MPC, the self-adaptive MPC and other improved algorithms rely on a mature uncertainty processing theory, and the influence of environmental uncertainty and external interference on the MPC control performance is treated by introducing robust constraint, a self-adaptive adjustment mechanism and other modes so as to improve the disturbance rejection capacity of the system. Although the prior art makes the improvement, the prior art still has obvious defects in complex time-varying interference scenes and real-time and resource-limited deployment scenes, and the actual application requirements are difficult to meet. Firstly, adopting the related research of a data driving or reinforcement learning method, designing terminal cost based on static or stable assumption of environmental disturbance, not fully considering dynamic characteristics of time-varying disturbance in a complex environment, causing estimation errors of the terminal cost to accumulate along with time, having insufficient online adaptability, not effectively adapting to system control requirements under disturbance, secondly, improving algorithms such as traditional robust MPC and adaptive MPC, and the like, when processing structured time-varying disturbance recurrent along with time, two types of core practical defects exist, namely, in order to ensure system robustness, larger conservative design is usually introduced, further causing on-line solving efficiency reduction of MPC optimization problem, obviously reducing system control performance, calculating and realizing cost of algorithm are higher, always needing to calculate an unchanged set of time variation along with on-line or off-line, or frequently reevaluating model parameters, and being difficult to be deployed in the scenes of autonomous detection of a spacecraft, autonomous control of an on-orbit service, and the like, with high real-time requirements and limited resources. Disclosure of Invention Aiming at the defects of the prior art, the application provides the MPC control method and the MPC control system which consider the terminal strategy optimization heuristic of the interference, and solves the technical problems that the MPC terminal cost estimation error accumulation, poor online adaptability, high algorithm conservation and difficult real-time deployment under complex time-varying interference cannot be effectively solved in the prior art. In order to achieve the above purpose, the application is realized by the following technical scheme: In a first aspect, the embodiment of the application provides an MPC control method for optimizing and inspiring a terminal strategy in