CN-122018346-A - Mushroom house environment control system and method based on reinforcement learning self-adaptive PID

CN122018346ACN 122018346 ACN122018346 ACN 122018346ACN-122018346-A

Abstract

The invention discloses a mushroom house environment control system and method based on reinforcement learning self-adaptive PID, and relates to the technical field of agricultural intelligent control. The system comprises an environment sensing module, an execution regulation and control module and a central control module. The central control module comprises a reinforcement learning model and a PID controller which are operated in cascade. The reinforcement learning model is based on the inclusion of time-smoothed regularization terms Training the optimization targets of the (a) to the time smooth regular term The second-order taylor expansion of the local smoothing function in the continuous time domain is constructed based on the control quantity. According to the method, PID parameters are optimized in real time through the reinforcement learning model, and a time consistency regularization and uncertainty perception mechanism is introduced, so that accurate and stable adjustment of the mushroom house environment is realized. The invention aims to inhibit action oscillation of an actuating mechanism while ensuring control precision, prolong the service life of equipment, eliminate environmental gradients by matching with scientific hardware layout and improve planting benefits.

Inventors

ZHOU LINLI
WANG DAYONG
WU ZEFENG
ZENG HAN
SUN YOUQIANG

Assignees

中国科学院合肥物质科学研究院
安徽省农业信息中心(安徽省农业博物馆)

Dates

Publication Date: 20260512
Application Date: 20260416

Claims (14)

1. Mushroom house environmental control system based on reinforcement learning self-adaptation PID, characterized by comprising: the environment sensing module is used for collecting environment parameters in the mushroom house in real time; Executing a regulation module for regulating the environment inside the mushroom house in response to the control quantity, and The central control module is respectively connected with the environment sensing module and the execution regulation and control module; the central control module comprises a reinforcement learning model and a PID controller which are operated in cascade; Wherein the reinforcement learning model is configured to output a reinforcement learning strategy based on a current state vector, the state vector characterizing a deviation state of the environmental parameter from a target value; The PID controller is configured to determine PID control parameters of the current time step according to the reinforcement learning strategy, and calculate control amounts based on the PID control parameters to drive the execution regulation and control module; The reinforcement learning model is based on the inclusion of a temporally smoothed regularization term Training the optimization targets of the (a) to the time smooth regular term Local smoothing function in continuous time domain based on control quantity The expression of the second-order taylor expansion construction of (c) is: In the formula, 、、 The control amounts of the current time step and the historical time step are respectively, In order to sample the period of time, And Is a weight coefficient.
2. The reinforcement learning adaptive PID based mushroom house environmental control system of claim 1, wherein the reinforcement learning model is based on a reward function Optimizing, wherein the expression is as follows: In the formula, As an absolute deviation of the environmental parameter from the target value, In order to vary the rate of change of the deviation, In order to perform the total energy consumption of the regulation module, For a controlled rate of change between two adjacent time steps, 、、、 Respectively, weight coefficients.
3. The reinforcement learning adaptive PID based mushroom house environmental control system of claim 2, wherein the specific values of the weight coefficients are set as follows: , , , 。
4. The mushroom house environmental control system based on reinforcement learning adaptive PID of claim 1, wherein the central control module further comprises an uncertainty aware adjustment module, the reinforcement learning strategy being composed of a plurality of parallel Actor networks, the uncertainty aware adjustment module being based on statistical variances of a set of control outputs of the plurality of parallel Actor networks in the same state Estimating strategy uncertainty in the current state, scaling the original control quantity based on the strategy uncertainty, and scaling the scaled control quantity The expression is: In the formula, To strengthen the original control amount of the learning strategy output, The coefficients are adjusted for uncertainty.
5. The reinforcement learning adaptive PID based mushroom house environmental control system of claim 4, wherein the final optimization objective of the reinforcement learning model The method comprises the following steps: In the formula, In order to strengthen the basic objective function of learning, As a regular term of uncertainty, Is a penalty weight.
6. The mushroom house environment control system based on reinforcement learning adaptive PID of claim 1, wherein the state vector is combined by normalization processing, and comprises parameters including deviation of each environment parameter from a target value Deviation rate of change of each environmental parameter PID control parameters of the current time step and the current running state of each device in the execution regulation and control module.
7. The reinforcement learning adaptive PID-based mushroom house environmental control system of claim 1, wherein the central control module is further configured to execute fault tolerant control logic to monitor in real time a deviation between an actual operating state of the execution regulation module and the control quantity command state, to reissue the control quantity when the deviation exceeds 15%, and to switch to a standby control mode when the deviation exceeds 15% detected 3 consecutive times.
8. The mushroom house environment control system based on reinforcement learning self-adaptive PID, which is disclosed in claim 1, is characterized in that five cultivation frames are arranged in the mushroom house and are arranged at equal intervals in the vertical direction, the environment sensing module comprises five sensor groups which are respectively arranged in the middle of each layer of the corresponding five cultivation frames, and each group of sensor groups comprises a temperature sensor, a humidity sensor, an illumination sensor, a carbon dioxide concentration sensor and an oxygen concentration sensor.
9. The reinforcement learning adaptive PID based mushroom house environmental control system of claim 1, wherein the execution regulation module comprises a heating system, the heating system comprises an electrothermal cable laid on the floor of the mushroom house, and the electrothermal cable covers the floor of the mushroom house in an S-shaped wiring manner.
10. The mushroom house environmental control system based on reinforcement learning adaptive PID of claim 1, wherein the execution regulation and control module includes a humidification system including ultrasonic spray heads respectively installed at bottoms of front ends of the five cultivation frames, and nozzles of each ultrasonic spray head are inclined upward by 15 ° with respect to a horizontal plane.
11. The mushroom house environment control system based on reinforcement learning self-adaptive PID of claim 1, wherein the execution regulation and control module comprises a ventilation system, wherein the ventilation system comprises an axial flow fan installed in the center of the top of the mushroom house, and a first electric shutter and a second electric shutter, wherein the first electric shutter and the second electric shutter are symmetrically arranged on the upper part of the side wall of the mushroom house, and the axial flow fan is controlled in linkage with the first electric shutter and the second electric shutter.
12. The mushroom house environmental control system based on reinforcement learning adaptive PID of claim 1, wherein the execution regulation module includes an illumination system including LED light strips installed at upper edges of layers of the five cultivation shelves, and the LED light strips have a red light to blue light spectrum ratio of 7:3.
13. The reinforcement learning adaptive PID based mushroom house environmental control system of claim 1, wherein the performance tuning module comprises a shade system comprising a double layer sunshade cloth curtain mounted inside the mushroom house roof, the double layer sunshade cloth curtain having a shade ratio of 95%.
14. A mushroom house environment control method based on a reinforcement learning adaptive PID, characterized in that a mushroom house environment control system based on a reinforcement learning adaptive PID as claimed in any one of claims 1 to 13 is applied, the method comprising the steps of: collecting environmental parameters in the mushroom house in real time through an environmental perception module; the central control module outputs a reinforcement learning strategy through a reinforcement learning model based on the current state vector; The central control module determines PID control parameters of the current time step through a PID controller according to the reinforcement learning strategy, and calculates control quantity based on the PID control parameters; An execution regulation module responds to the control quantity to regulate the environment inside the mushroom house.

Description

Mushroom house environment control system and method based on reinforcement learning self-adaptive PID Technical Field The invention relates to the technical field of agricultural intelligent control, in particular to a mushroom house environment control system and method based on reinforcement learning self-adaptive PID. Background The mushroom growing industry is developing towards intensification and industrialization, and the environmental control precision directly determines the yield and quality of mushrooms. Because mushrooms have different requirements on parameters such as temperature, humidity, carbon dioxide concentration and the like in different growth stages, the traditional environment control system is controlled by a threshold switch, so that the environment parameters are severely fluctuated and equipment is frequently started and stopped. While some systems introduce traditional PID control, it is difficult to accommodate the large hysteresis, strong coupling, and non-linear characteristics of the mushroom house due to the fixed parameters. The end-to-end control scheme based on reinforcement learning in recent years is easy to generate action oscillation, and can cause mechanical damage to an executing mechanism on a physical level, so that the service life of equipment is shortened. In addition, the sensor arrangement and the execution equipment layout of the existing system often lack scientificity, resulting in uneven distribution of indoor environmental fields. Therefore, how to design a mushroom house environment control system which can adapt to complex environment changes to realize accurate control, effectively inhibit action oscillation to prolong the service life of equipment and cooperate with scientific hardware layout becomes a technical problem to be solved urgently. Disclosure of Invention The invention mainly aims to provide a mushroom house environment control system and a mushroom house environment control method based on reinforcement learning self-adaptive PID, and aims to design a mushroom house environment control system which can adapt to complex environment changes to realize accurate control, can effectively inhibit action oscillation to prolong the service life of equipment and is matched with scientific hardware layout. In order to achieve the above object, the present invention provides a mushroom house environment control system based on reinforcement learning adaptive PID, comprising: the environment sensing module is used for collecting environment parameters in the mushroom house in real time; Executing a regulation module for regulating the environment inside the mushroom house in response to the control quantity, and The central control module is respectively connected with the environment sensing module and the execution regulation and control module; the central control module comprises a reinforcement learning model and a PID controller which are operated in cascade; Wherein the reinforcement learning model is configured to output a reinforcement learning strategy based on a current state vector, the state vector characterizing a deviation state of the environmental parameter from a target value; The PID controller is configured to determine PID control parameters of the current time step according to the reinforcement learning strategy, and calculate control amounts based on the PID control parameters to drive the execution regulation and control module; The reinforcement learning model is based on the inclusion of a temporally smoothed regularization term Training the optimization targets of the (a) to the time smooth regular termLocal smoothing function in continuous time domain based on control quantityThe expression of the second-order taylor expansion construction of (c) is: In the formula, 、、The control amounts of the current time step and the historical time step are respectively,In order to sample the period of time,AndIs a weight coefficient. Preferably, the reinforcement learning model is based on a reward functionOptimizing, wherein the expression is as follows: In the formula, As an absolute deviation of the environmental parameter from the target value,In order to vary the rate of change of the deviation,In order to perform the total energy consumption of the regulation module,For a controlled rate of change between two adjacent time steps,、、、Respectively, weight coefficients. Preferably, the specific value of the weight coefficient is set as follows:,,,。 preferably, the central control module further comprises an uncertainty perception adjustment module, the reinforcement learning strategy is composed of a plurality of parallel Actor networks, and the uncertainty perception adjustment module is used for adjusting the statistical variance of a control output set of the parallel Actor networks under the same state Estimating strategy uncertainty in the current state, scaling the original control quantity based on the strategy uncertainty