US-12619213-B2 - Controlling method and device for an industrial device

US12619213B2US 12619213 B2US12619213 B2US 12619213B2US-12619213-B2

Abstract

Various embodiments include methods for controlling an industrial device. Some embodiments include: obtaining a state input characterizing a current state of the industrial device; processing the state input to generate an action output characterizing an expected action to be performed by the industrial device for the current state, based on a machine learning model trained based on states of the industrial device, actions each performed for each state of the industrial device and results each obtained by performing each action; and generating a control signal for the industrial device based on the action output.

Inventors

Xiang Li
Xiao Feng Wang
Fan Bo Meng

Assignees

SIEMENS AKTIENGESELLSCHAFT

Dates

Publication Date: 20260505
Application Date: 20200921

Claims (12)

1 . A method for controlling an industrial device, the method comprising: obtaining a state input characterizing a current state of the industrial device; processing the state input to generate an action output characterizing an expected action to be performed by the industrial device for the current state, based on a machine learning model trained based on states of the industrial device, respective actions performed for each state of the industrial device, and results respective obtained by performing each action; generating a control signal for the industrial device based on the action output; and updating the machine learning model by: obtaining a further state of the industrial device; processing the further state of the industrial device to generate a further action to be performed by the industrial device for the further state, based on the machine learning model; generating a state next to the further state based on the further state and the further action based on a device model that is trained based on (a) one or more state-action pairs, each comprising a respective state of the industrial device and a corresponding action performed for the respective state and (b) one or more next states each corresponding to a respective state-action pair; computing a result of the further action based on the further state and the state next to the further state; and updating the machine learning model based on the further state, the further action, and the result of the further action.
2 . The method of claim 1 , the method further comprising deriving the states, the actions, and the results of the industrial device from historical data of the industrial device and/or human expert knowledge.
3 . The method of claim 2 , further comprising training the machine learning model via on-policy learning or off-policy learning.
4 . The method claim 2 , further comprising generating the machine learning model by training the machine learning model based on the states of the industrial device, the actions each performed for each state of the industrial device and the results each obtained by performing each action.
5 . The method of claim 1 , wherein updating the device model includes: obtaining one or more pairs of state and action from historical data of the industrial device; obtaining one or more next states each corresponding to one of the one or more pairs of state and action from the historical data of the industrial device; and updating the device model based on the obtained one or more pairs of state and action and the obtained one or more next states.
6 . The method of claim 1 , further comprising: determining if the expected action can be safely performed by the industrial device; and in response to a determination that the expected action can be safely performed, generating the control signal for the industrial device to perform the expected action, obtaining the result corresponding to the expected action, and updating the machine learning model based on the current state, the expected action and the result corresponding to the expected action.
7 . The method of claim 6 , the method further comprising in response to a determination that the expected action cannot be safely performed, updating the machine learning model based on the current state, the expected action and a predetermined result for the current state and the expected action.
8 . A device for controlling an industrial device, the device comprising: an obtaining apparatus for obtaining a state input characterizing a current state of the industrial device; a processor for processing the state input to generate an action output characterizing an expected action to be performed by the industrial device for the current state, based on a machine learning model that is trained based on states of the industrial device, actions each performed for each state of the industrial device and results each obtained by performing each action, and generating a control signal for the industrial device based on the action output; and a controller for controlling the industrial device based on the control signal; and wherein the processor is further programmed to update the machine learning model by: obtaining a further state of the industrial device; processing the further state of the industrial device to generate a further action to be performed by the industrial device for the further state, based on the machine learning model; generating a state next to the further state based on the further state and the further action based on a device model that is trained based on (a) one or more state-action pairs, each comprising a respective state of the industrial device and a corresponding action performed for the respective state and (b) one or more next states each corresponding to a respective state-action pair; computing a result of the further action based on the further state and the state next to the further state; and updating the machine learning model based on the further state, the further action, and the result of the further action.
9 . The device of claim 8 , wherein the processor is further programmed to update the device model by: obtaining one or more pairs of state and action from historical data of the industrial device; obtaining one or more next states each corresponding to one of the one or more pairs of state and action from the historical data of the industrial device; and updating the device model based on the obtained one or more pairs of state and action and the obtained one or more next states.
10 . The device of claim 8 , wherein the processor is further programmed to: determine if the expected action can be safely performed by the industrial device; and in response to a determination that the expected action can be safely performed, generate the control signal for the industrial device to perform the expected action, obtain the result corresponding to the expected action, and update the machine learning model based on the current state, the expected action and the result corresponding to the expected action.
11 . The device of claim 10 , wherein the processor is further programmed to, in response to a determination that the expected action cannot be safely performed, updating the machine learning model based on the current state, the expected action and a predetermined result for the current state and the expected action.
12 . A controlling device for an industrial device, the controlling device comprising: one or more processor; and one or more memories storing instructions operable, when executed the one or more processor, to cause the one or more processor to: obtain a state input characterizing a current state of the industrial device; process the state input to generate an action output characterizing an expected action to be performed by the industrial device for the current state, based on a machine learning model trained based on states of the industrial device, actions each performed for each state of the industrial device and results each obtained by performing each action; generate a control signal for the industrial device based on the action output; and update the machine learning model by: obtaining a further state of the industrial device; processing the further state of the industrial device to generate a further action to be performed by the industrial device for the further state, based on the machine learning model; generating a state next to the further state based on the further state and the further action based on a device model that is trained based on (a) one or more state-action pairs, each comprising a respective state of the industrial device and a corresponding action performed for the respective state and (b) one or more next states each corresponding to a respective state-action pair; computing a result of the further action based on the further state and the state next to the further state; and updating the machine learning model based on the further state, the further action, and the result of the further action.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a U.S. National Stage Application of International Application No. PCT/CN2020/116538 filed Sep. 21, 2020, which designates the United States of America, the contents of which are hereby incorporated by reference in their entirety. TECHNICAL FIELD The present disclosure generally relates to industrial controls. Various embodiments of the teachings herein include machine learning based industrial controls. BACKGROUND Currently, most industrial devices are controlled by rules predefined by human experts or controllers tuned by human experts. In order to achieve an automatic control, the control of the industrial devices is generally guided by simplified formulation of expert knowledge. But, for highly non-linear, multi-inputs, multi-outputs and delayed industrial devices, this cannot deliver a satisfying control performance. For highly non-linear, multi-inputs, multi-outputs and delayed industrial devices, separate controls have been used for different device parameters with multiple control loops, but with multiple loops, the control for the device can become unstable and sensitive to small perturbation. In some situations, an empirical control is introduced for a parameter with a high inert response; but this cannot guarantee a consistent production quality of the industrial devices. Alternatively, a model predictive control is used for highly non-linear, multi-inputs, multi-outputs and delayed industrial devices. The core idea of the model predictive control is to use a model to predict future plant output and solves an optimization problem to select an optimal control. Designing the model requires high manual work and expert knowledge and real time control is not feasible because to find the optimal solution with the model takes too long to compute. SUMMARY Various embodiments of the teachings of the present disclosure include controlling methods and/or devices for an industrial device that use a machine learning model which is trained, based on not only states of the industrial device and actions each performed for each state of the industrial device but also results each obtained by performing each action, to generate an expected action to be performed fora current state of the industrial device. The modelling does not require manual work and thus the industrial device may be controlled with low cost and high time efficiency. For example, some embodiments include a method for controlling an industrial device comprising: obtaining a state input characterizing a current state of the industrial device; processing the state input to generate an action output characterizing an expected action to be performed by the industrial device for the current state, based on a machine learning model that is trained based on states of the industrial device, actions each performed for each state of the industrial device and results each obtained by performing each action; and generating a control signal for the industrial device based on the action output. In some embodiments, the actions and the results of the industrial device are derived from historical data of the industrial device and/or human expert knowledge. In some embodiments, the machine learning model is trained via on-policy learning or off-policy learning. In some embodiments, the machine learning model is generated by training the machine learning model based on the states of the industrial device, the actions each performed for each state of the industrial device and the results each obtained by performing each action. In some embodiments, the machine learning model is updated by obtaining a further state of the industrial device; processing the further state of the industrial device to generate a further action to be performed by the industrial device for the further state, based on the machine learning model; generating a state next to the further state based on the further state and the further action based on a device model that is trained based on each pair of state and action of the industrial device and a next state corresponding to the pair of the state and the action; computing a result of the further action based on the further state and the state next to the further state; and updating the machine learning model based on the further state, the further action and the result of the further action. In some embodiments, the device model is updated by obtaining one or more pairs of state and action from historical data of the industrial device; obtaining one or more next states each corresponding to one of the one or more pairs of state and action from the historical data of the industrial device; and updating the device model based on the obtained one or more pairs of state and action and the obtained one or more next states. In some embodiments, the method further comprises: determining if the expected action can be safely performed by the industrial device; and in response to a determinat