CN-121523067-B - Control strategy security layer mapping method and system based on proxy model
Abstract
The invention discloses a control strategy security layer mapping method and system based on a proxy model. The method comprises the steps of obtaining an original control action from an upstream main controller, obtaining a current real-time state, carrying out forward prediction by adopting a proxy model based on the original control action and the current real-time state to obtain prediction results of all space points, comparing the prediction results of all space points with a preset safety threshold to judge whether the prediction results of all space points are lower than the safety threshold, if not, adopting a real-time optimization problem to determine a correction action meeting the requirement and meeting the safety constraint condition so as to obtain a final control action, and if so, determining the original control action as the final control action. By implementing the method, the technical problem that the existing advanced control strategy cannot actively and accurately guarantee the process safety while pursuing the optimization target can be overcome, the safety violation can be actively prevented, and the high-efficiency calculation of the complex system characteristics can be accurately reflected.
Inventors
- WANG RUIHANG
- ZHOU XIN
- FU QIANG
Assignees
- 光合泰智(杭州)科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260115
Claims (6)
- 1. The control strategy security layer mapping method based on the proxy model is characterized by comprising the following steps: the method comprises the steps of acquiring an original control action from an upstream main controller, wherein the original control action refers to a preliminary operation instruction generated by the upstream main controller for optimizing a specific target, and the preliminary operation instruction comprises adjustment of air supply temperature and air quantity; acquiring a current real-time state, wherein the current real-time state refers to current operation conditions and environment parameters of a system; Forward prediction is performed by adopting a proxy model based on the original control action and the current real-time state to obtain prediction results of all spatial points, and the method comprises the following steps: predicting corresponding POD coefficients by adopting a proxy model based on the original control actions and the current real-time state; reconstructing temperature field distribution by using POD coefficients and a pre-calculated space basis function by the agent model for each space point to obtain a prediction result of the temperatures of all the space points; Comparing the prediction results of all the space points with a preset safety threshold value to judge whether the prediction results of all the space points are lower than the safety threshold value; If the prediction results of all the space points are not lower than the safety threshold, adopting a real-time optimization problem to determine a correction action which meets the requirements and meets the safety constraint condition so as to obtain a final control action, wherein the real-time optimization problem is adopted to determine the correction action which has the smallest difference value with the original control action and meets the safety constraint condition so as to obtain the final control action, and the optimization target of the real-time optimization problem is that , wherein, Is a corrective action; is the original control action; And if the prediction results of all the space points are lower than the safety threshold, determining the original control action as a final control action.
- 2. The proxy model-based control policy security layer mapping method according to claim 1, wherein predicting the corresponding POD coefficient using a proxy model based on the original control action and the current real-time state comprises: By using The corresponding POD coefficients are predicted, wherein, In order to be a proxy model, Is the POD coefficient; In the current real-time state, For the original control action it is that, Is a parameter of the proxy model.
- 3. The proxy model based control strategy security layer mapping method of claim 2, wherein the temperature field distribution is represented as ; Is a spatial basis function extracted by POD technique, T avg generally refers to the average value of the entire temperature field.
- 4. The proxy model based control policy security layer mapping method of claim 1, wherein the security constraint condition is , wherein, In order to perform the estimated temperature of the post-correction action, For safe temperature constraints, given by user protocols or related criteria.
- 5. The proxy model-based control strategy security layer mapping method of claim 1, wherein the proxy model training process is as follows: Latin hypercube sampling is carried out on boundary conditions of an original CFD model so as to obtain sampled boundary conditions; Performing CFD simulation according to the sampled boundary conditions, generating diversified temperature field results, and calculating corresponding POD coefficients and space basis functions; and (3) adopting a multi-layer perceptron as a learning model, and searching optimal parameters through an optimization process based on the POD coefficient and a space basis function in a training stage to obtain a proxy model.
- 6. A control policy security layer mapping system based on a proxy model, wherein the system uses the control policy security layer mapping method based on a proxy model as claimed in any one of claims 1 to 5, comprising: The original control action acquisition unit is used for acquiring the original control action from the upstream master controller; The real-time state acquisition unit is used for acquiring the current real-time state; The prediction unit is used for performing forward prediction by adopting a proxy model based on the original control action and the current real-time state so as to obtain prediction results of all space points; The comparison unit is used for comparing the prediction results of all the space points with a preset safety threshold value so as to judge whether the prediction results of all the space points are lower than the safety threshold value; The correction unit is used for determining correction actions meeting the requirements and meeting the safety constraint conditions by adopting real-time optimization problems if the prediction results of all the space points are not lower than the safety threshold value so as to obtain final control actions; A final action determining unit, configured to determine the original control action as a final control action if the prediction results of all the spatial points are lower than the safety threshold; The prediction unit includes: the coefficient prediction subunit is used for predicting the corresponding POD coefficient by adopting a proxy model based on the original control action and the current real-time state; and the temperature field reconstruction subunit is used for reconstructing temperature field distribution by using the POD coefficient and the pre-calculated space basis function for each space point by the proxy model so as to obtain the prediction result of the temperature of all the space points.
Description
Control strategy security layer mapping method and system based on proxy model Technical Field The invention relates to a data center heat management method, in particular to a control strategy security layer mapping method and system based on a proxy model. Background Data centers are one of the highly energy intensive facilities as a key information infrastructure. Among them, the cooling system occupies a considerable proportion of power consumption, and thus, improving the energy use efficiency of the data center has become an important research direction. To solve this problem, both academia and industry are actively exploring the application of advanced control strategies such as DRL (deep reinforcement learning ) to replace the conventional control methods (e.g., PID controllers). By interacting with the environment and performing trial and error learning, the DRL can gradually form a long-term optimal control strategy targeting energy conservation. However, during the learning process of the DRL, the "explore" phase may result in performing some unsafe actions, such as excessively reducing the cooling output, which may cause the servers to partially overheat, thereby posing a downtime risk. For this challenge, there are currently mainly two types of security technologies, passive security-based methods and simple model-based active security methods. Passive security methods typically rely on a "post penalty" in that when the system actually enters an unsafe state, the DRL agent receives a negative reward to learn to avoid this behavior. This approach is clearly unsuitable in a mission critical environment because it requires that the system must first experience a dangerous situation. Active safety methods based on simple models attempt to prevent the occurrence of unsafe behavior by predicting the consequences of the action. However, these methods often rely on simplified data center models (such as linear or nodal models) that fail to accurately capture the highly non-linearities and non-uniformities of the temperature distribution within the data center. This means that even if the "average" temperature is within a safe range, local hot spots may still occur in certain specific areas (e.g. the top of the rack), thus constituting a safety hazard. In summary, the existing security technologies have shortcomings in facing the non-uniform spatial characteristics and real-time decision-making requirements of complex systems. Therefore, it is necessary to design a new method to overcome the technical problem that the existing advanced control strategy cannot actively and accurately guarantee the process safety while pursuing the optimization target, so that the safety violation can be actively prevented, and the efficient calculation of the complex system characteristics can be accurately reflected. Disclosure of Invention The invention aims to overcome the defects of the prior art and provide a control strategy security layer mapping method and system based on a proxy model. In order to achieve the purpose, the invention adopts the following technical scheme that the control strategy security layer mapping method based on the proxy model comprises the following steps: Acquiring an original control action from an upstream master controller; acquiring a current real-time state; Forward prediction is carried out by adopting a proxy model based on the original control action and the current real-time state so as to obtain prediction results of all space points; Comparing the prediction results of all the space points with a preset safety threshold value to judge whether the prediction results of all the space points are lower than the safety threshold value; If the prediction results of all the space points are not lower than the safety threshold, adopting a real-time optimization problem to determine a correction action which meets the requirements and meets the safety constraint condition so as to obtain a final control action; And if the prediction results of all the space points are lower than the safety threshold, determining the original control action as a final control action. The method further comprises the following steps of performing forward prediction by adopting a proxy model based on the original control action and the current real-time state to obtain prediction results of all spatial points, wherein the method comprises the following steps: predicting corresponding POD coefficients by adopting a proxy model based on the original control actions and the current real-time state; And reconstructing temperature field distribution by the agent model by using POD coefficients and a pre-calculated space basis function for each space point to obtain prediction results of all the space points. The method further comprises the following steps of predicting corresponding POD coefficients by adopting a proxy model based on the original control action and the current real-time state, wherein the meth