CN-121995766-A - Biomass fermentation safety reinforcement learning control method and system

CN121995766ACN 121995766 ACN121995766 ACN 121995766ACN-121995766-A

Abstract

The invention belongs to the technical field of bioengineering, in particular to a biomass fermentation safety reinforcement learning control method and system. And constructing a virtual training environment integrating the mechanism model and the data-driven residual correction network, applying domain randomization, and training an agent in the environment to maximize long-term jackpot rewards so as to obtain a robust control strategy. The training strategy is deployed to the real fermentation system, the control instruction is corrected in real time according to the hard constraint rule through the independent safety layer so as to ensure the operation safety, and meanwhile, the strategy is subjected to online fine adjustment and update based on actual operation data. The invention realizes the safe and smooth transition from virtual training to real application, and remarkably improves the final concentration and the overall control performance of the ethanol in the fermentation process.

Inventors

YU WENZHI
MA FENGYING
Zong Yanchen
JI PENG
LIU TONGJUN

Assignees

齐鲁工业大学(山东省科学院)

Dates

Publication Date: 20260508
Application Date: 20260227

Claims (10)

1. The biomass fermentation safety reinforcement learning control method is characterized by comprising the following steps of: The method comprises the steps of obtaining historical data of a fermentation process, processing the historical data into a standardized state-action pair data set, utilizing the data set to train an initial strategy network, taking a standardized state vector as input by the initial strategy network, and outputting a standardized action vector approaching to the historical action data; The method comprises the steps of establishing a mechanism model describing a fermentation process based on a biochemical process dynamics equation, utilizing historical data of the fermentation process to train a neural network, predicting and compensating deviation between the mechanism model and real dynamics, forming a virtual training environment simulating a real fermentation state by using the mechanism model and a residual error correction network, training an agent, utilizing a reinforcement learning algorithm to update actor network and related critic network parameters during training so as to maximize long-term cumulative rewarding expectation until average cumulative rewarding obtained by the agent in the virtual training environment converges to a maximum value or reaches a preset training round, and storing actor network weights at the moment to obtain a trained control strategy model; The control strategy model is deployed to an actual fermentation scene, an action instruction is output according to actual fermentation state parameters, the output action instruction is supervised and corrected according to preset process safety hard constraint rules, only the action instruction conforming to the rules is issued to an executor, and the control strategy model is updated according to the parameters of the actual fermentation state.
2. The method of claim 1, wherein the state vector comprises at least one of temperature, pH, dissolved oxygen concentration, redox potential, residual sugar concentration, ethanol concentration, living cell concentration, carbon dioxide release rate, oxygen uptake rate, and a rate of change of at least one of the above parameters.
3. The method for controlling biomass fermentation safety reinforcement learning according to claim 1, wherein the biochemical process dynamics equation is a Monod equation and Luedeking-Piret equation.
4. The biomass fermentation safety reinforcement learning control method according to claim 1, wherein the neural network is trained by using historical data of a fermentation process, and the deviation between a mechanism model and real dynamics is predicted and compensated, specifically: acquiring a predicted state and a corresponding real fermentation state of a mechanism model in historical fermentation data; Calculating the deviation between the predicted state and the real fermentation state; And taking the real state data as input, taking deviation as a training target, and correcting the neural network through supervised learning training residual errors.
5. The method of claim 1, wherein prior to training the agent, further comprising applying domain randomization to key kinetic parameters of the virtual training environment, the key kinetic parameters comprising at least a maximum specific growth rate of the microorganism, a substrate half-saturation constant, and a cell to substrate yield coefficient.
6. The method of claim 1, wherein the reward function that maximizes long-term jackpot expectations is a process shaping reward expressed as: ; Wherein, the Is a basic weight; punishment for high weights; The method is characterized in that the method is a product toxicity early warning weight; As a slight penalty; is a time penalty term; rewards for process stability.
7. The method for biomass fermentation safety reinforcement learning control as claimed in claim 1, wherein the industrial safety hard constraint rule at least comprises: If the real-time pH measured value is lower than a first safety threshold value, forcedly correcting a pH adjusting item in the action instruction to increase alkali liquor; If the real-time fermentation liquid volume measured value is higher than the second safety threshold value, the feeding flow rate in the action instruction is forcedly corrected to zero; if the real-time dissolved oxygen measurement is below the third safety threshold, the action command is forcefully modified to increase the agitation rate or aeration rate.
8. The biomass fermentation safety reinforcement learning control method of claim 1, wherein the control strategy model is updated according to parameters of an actual fermentation state, specifically, the state generated in the actual fermentation process, the action corrected by a safety layer and a corresponding reward value are stored as interactive data in an online experience playback pool, and the data are sampled from the online experience playback pool at regular intervals to finely adjust network parameters of the control strategy model.
9. The method for controlling safety reinforcement learning of biomass fermentation according to claim 1, wherein updating the control strategy model according to parameters of the actual fermentation state further comprises inputting actual sensor data into a data assimilation filter, and estimating and updating in real time an unmeasurable state variable or time-varying parameter in the virtual training environment to form a digital shadow synchronized with the physical fermentation tank.
10. Biomass fermentation safety reinforcement learning control system, which is characterized by comprising: An initialization module configured to obtain historical data of the fermentation process, and process the historical data into a standardized state-action pair data set; training an initial strategy network by using the data set, wherein the initial strategy network takes a standardized state vector as input and outputs a standardized motion vector approaching to historical motion data; The digital twin pre-training module is configured to construct a mechanism model describing a fermentation process based on a biochemical process dynamics equation, train a neural network by utilizing historical data of the fermentation process, predict and compensate deviation between the mechanism model and real dynamics, form a virtual training environment simulating the real fermentation state by using the mechanism model and a residual error correction network, train an agent, update actor networks and associated commentator network parameters by using a reinforcement learning algorithm during training so as to maximize long-term cumulative rewarding expectation until the average cumulative rewarding obtained by the agent in the virtual training environment converges to a maximum value or reaches a preset training round, and store the actor network weight at the moment to obtain a trained control strategy model; The on-line fine adjustment and self-adaptation module is configured to be deployed in an actual fermentation scene by the control strategy model, output action instructions according to actual fermentation state parameters, monitor and correct the output action instructions according to preset process safety hard constraint rules, only issue action instructions conforming to the rules to an executor, and update the control strategy model according to the parameters of the actual fermentation state.

Description

Biomass fermentation safety reinforcement learning control method and system Technical Field The invention belongs to the technical field of bioengineering, and particularly relates to a biomass fermentation safety reinforcement learning control method and system. Background The statements in this section merely mention background of the present disclosure and do not necessarily constitute prior art. The biomass waste (such as agricultural straw) is utilized to produce fuel ethanol, which is an important way for realizing resource circulation and renewable energy development. In the fermentation process, the final concentration of ethanol is a core index that determines the production efficiency and economy. However, the process has complex double inhibition effects of substrates and products, and the physiological state of the fermenting microorganism is dynamically changed, so that extremely high requirements are placed on process control. Currently, the main fermentation process control method in industry mainly depends on program control or proportional-integral-derivative (PID) control implemented by a Programmable Logic Controller (PLC) or a Distributed Control System (DCS). The method is usually controlled based on preset fixed parameters or simple feedback logic (such as adjustment according to pH or temperature deviation), and is difficult to adapt to complex nonlinear and time-varying dynamic characteristics in the fermentation process, and global optimization cannot be realized. While some advanced control strategies, such as Model Predictive Control (MPC), have been introduced to improve performance, they rely heavily on accurate on-line measurements and accurate process models. For a biomass hydrolysate fermentation system with complex and changeable components, it is extremely difficult to build and maintain a high-precision mechanism model, so that the performance of the MPC is reduced when the MPC is used for coping with raw material fluctuation and model mismatch. Some artificial intelligence techniques represented by reinforcement learning can be applied to intelligent control of biochemical processes due to strong environment interaction and strategy self-optimization capability. However, these existing schemes tend to train based on an ideal simulation environment, ignoring differences between the simulation environment and the actual industrial scenario. When the perfect strategy trained in the simulation environment is directly deployed in the real industrial environment, the performance suddenly drops or even completely fails due to the differences of model mismatch, raw material batch difference, sensor noise, scale-up effect and the like. Disclosure of Invention The invention provides a biomass fermentation safety reinforcement learning control method and system, which provide a safety starting point through imitative learning, utilize high-fidelity digital twin and field randomization of mechanism and data fusion to train out a robust strategy, and adopt an online fine tuning framework with a hard constraint safety layer to ensure that reinforcement learning intelligent agent can be stably transited from virtual training to safe, reliable and high-performance control of a real fermentation tank. The first aspect of the invention discloses a biomass fermentation safety reinforcement learning control method, which comprises the following steps: The method comprises the steps of obtaining historical data of a fermentation process, processing the historical data into a standardized state-action pair data set, utilizing the data set to train an initial strategy network, taking a standardized state vector as input by the initial strategy network, and outputting a standardized action vector approaching to the historical action data; The method comprises the steps of establishing a mechanism model describing a fermentation process based on a biochemical process dynamics equation, utilizing historical data of the fermentation process to train a neural network, predicting and compensating deviation between the mechanism model and real dynamics, forming a virtual training environment simulating a real fermentation state by using the mechanism model and a residual error correction network, training an agent, utilizing a reinforcement learning algorithm to update actor network and related critic network parameters during training so as to maximize long-term cumulative rewarding expectation until average cumulative rewarding obtained by the agent in the virtual training environment converges to a maximum value or reaches a preset training round, and storing actor network weights at the moment to obtain a trained control strategy model; The control strategy model is deployed to an actual fermentation scene, an action instruction is output according to actual fermentation state parameters, the output action instruction is supervised and corrected according to preset process safety hard constraint