CN-121979041-A - Model-free output feedback control method for intermittent process with non-repeated interference

CN121979041ACN 121979041 ACN121979041 ACN 121979041ACN-121979041-A

Abstract

A non-repeated interference intermittent process model-free output feedback control method belongs to the technical field of industrial process control, and specifically comprises the following steps of providing mathematical description for an optimal tracking control problem of a non-repeated intermittent process, designing an optimal control law based on output feedback, constructing a game Q function based on two-dimensional output feedback in the intermittent process, solving the problem of two-dimensional output feedback of batch operation by reinforcement learning, and finally converging the optimal control strategy based on the output feedback to an ideal optimal value by iteration solving the problem that the complex characteristic of a two-dimensional system is complicated in the traditional method, and reducing the influence of the optimal control law on the control performance by the output feedback control method based on reinforcement learning without depending on a model of the system, so that the control effect is greatly improved.

Inventors

JIANG XUEYING
LI YAN
SHI HUIYUAN
Zuo Anfan
YE HANWEN
SU CHENGLI
LI PING

Assignees

辽宁石油化工大学

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (1)

1. The model-free output feedback control method for the intermittent process with non-repeated interference comprises the following specific steps: step one, giving mathematical description for the optimal tracking control problem of a non-repetitive intermittent process; the intermittent process with unknown system dynamics is represented by a linear state space equation, the output tracking error is expanded into a state variable, an intermittent process model with non-repeated interference can be obtained, and the intermittent process model is expanded into an incremental state space equation: (1) Wherein, the The direction of time is indicated and the direction of time is indicated, Indicating the direction of the batch, Representing the output quantity, state variable of the system Representation of Time of day Extension state, state variables of a lot Representation of Time of day The state of extension of the batch is that, Representing that the system is input at Time of day Batch and lot Time of day The increment of the batch is used to determine, Indicating that the system is not repeatedly interfered Time of day Batch and lot Time of day Increment of batch; , , , , Representing a system parameter matrix having a modest dimension; designing an optimal control law based on output feedback; If the system state is measurable and the dynamic characteristics are known, an optimal control input and a worst condition disturbance input exist, and a control strategy is set as an input And With respect to state variables And (3) with By using the current and historical states and output errors in time and batch dimensions, the optimal gain matrix of the controller can be learned without any prior knowledge of the system, so that the optimal gain matrix is approached with a faster convergence speed and smaller approximation error, and the optimal control strategy can be found by continuously iterating and learning the actual data , , , ; Thirdly, a game Q function construction method based on two-dimensional output feedback in an intermittent process; A game Q function based on two-dimensional output feedback in the intermittent process is obtained through the performance index function, (2) Wherein, the , Respectively is And Is used for the weighting matrix of the (c), Is that Is used for the weighting matrix of the (c), Is the attenuation factor of the light-emitting diode, Is the sum of time and batch direction components; the gains of the controllers at the current moment are respectively; step four, solving the two-dimensional output feedback problem of batch operation by using reinforcement learning; The subsequent two-dimensional output feedback game is played by utilizing the historical control input, disturbance input and system output of the two dimensions of the time domain and the batch domain The function is parameterized, and if the system meets the observable condition, the state of the function can be expressed as the following mathematical form through measurable input and output information: (3) (4) (5) (6) step five, the optimal control strategy based on output feedback is solved through iteration to enable the optimal control strategy to finally converge to an ideal optimal value; By the corresponding equation (5), it can be rewritten as the following form based on output feedback: (7) Wherein, the , Then the two-dimensional zero and game output feedback Q function can be expressed in the form of: (8) Wherein, the , , , , , , , (9) , , , As the time component of the current moment in time, For the batch component at the current moment, , , , , , After substituting formula (9), a new control gain expression can be obtained: (10) Wherein, the ; Optimal control inputs and interference inputs can be obtained by solving the Belman equation based on output feedback, the optimal control inputs and interference inputs are solved by iteration through production data, in a two-dimensional state space, if system dynamics meet the Kronecker product structure of the formula (11), the Belman equation can be efficiently solved by utilizing the matrix decomposition characteristic of the formula (13), an optimal control strategy is finally obtained by utilizing the iterative calculation of the formula (14), (11) (12) (13) (14) The algorithm carries out iterative learning on time sequence and batch data of the injection molding process so as to continuously optimize the parameters of the controller, can calculate the optimal control law based on the parameters, and can drive the system output to gradually converge to an expected set value after the injection molding process is implemented.

Description

Model-free output feedback control method for intermittent process with non-repeated interference Technical Field The applied patent belongs to the technical field of industrial process control, and particularly relates to a model-free output feedback control method for an intermittent process with non-repeated interference, which reduces the influence of unknown dynamics on the control effect of a system. Background The batch production process is used as a manufacturing mode of discontinuous and batch organization, and is characterized in that the conversion of the product from raw materials to finished products is completed through independent batches, and the batch production process is widely applied to industries with high added value or strong customization demands such as chemical industry, pharmacy, food, new materials and the like. The typical characteristic of the production mode is that batch independence and dynamic coupling coexist, each batch has independent feeding, technological parameters and equipment states, quality fluctuation can be caused by raw material difference, environmental fluctuation or operation errors, the inside of the same batch shows strong dynamic property, the same reaction kettle can produce different products in sequence by taking injection molding industry as an example, each batch needs to be thoroughly cleaned to avoid cross contamination, but cleaning residues, dead angles of pipelines or operation sequence difference can still cause fluctuation of the content of active ingredients among batches. The complexity makes the control of the batch production process have double challenges, namely, on one hand, the dynamic uncertainty in the batch needs to be solved, the current batch quality is ensured through real-time monitoring and feedback adjustment, and on the other hand, the repeatability between batches needs to be optimized, and the process parameters are iterated by utilizing historical data to reduce the quality fluctuation range. The core contradiction in controlling the batch production process is the "balance of model accuracy and real-time". Traditional control methods rely on accurate mathematical models, but intermittent processes often involve strong nonlinearities, time-varying characteristics, and multivariate couplings, and the cost of building high-fidelity models is high and it is difficult to cover all conditions. For example, the distribution in the production process is commonly affected by temperature, pressure and catalyst concentration, the dynamic model of which may contain tens of partial differential equations, and the solution complexity increases exponentially with the reaction stage. For this reason, advanced control strategies are gradually evolving towards data-driven and model fusion. The iterative learning control is used for correcting the control input by repeatedly executing the same task and utilizing the historical batch data, gradually approaching the optimal track, and is particularly suitable for periodic batch production, the model prediction control is used for predicting the future state based on a simplified dynamic model, and optimizing the current control action on the premise of meeting the multivariate constraint (such as energy consumption, safety and yield), so that the iterative learning control is widely applied to the temperature-pressure cooperative control of the chemical reaction kettle. Disclosure of Invention The patent application provides a model-free output feedback control method based on reinforcement learning and state reconstruction aiming at a batch process with non-repeated interference, which can effectively solve the influence of the non-repeated interference in a system on a control effect, reduce the model dependence of the system and continuously learn by means of data in a time direction and a batch direction. By adopting the reinforcement learning method, a more accurate control strategy is established, an optimal control strategy is obtained, the control and tracking performances of the system are improved, and the convergence speed is accelerated. The applied patent is realized by the following technical scheme: The patent proposes a model-free output feedback control method based on reinforcement learning and state reconstruction for intermittent processes with non-repeated interference and undetectable states. The method firstly expresses an intermittent process with non-repeated interference as a state space form, introduces an output tracking error and constructs an expanded state space description. Then, a value function, a Q function and an optimal performance index covering two dimensions of time and batch are defined, and a related two-dimensional Bellman equation without process model information is designed to derive an analytical expression of the control law. Based on the method, a model-free output feedback algorithm based on reinforcement learning and state reconstruction is