Search

CN-122028387-A - Intelligent regulation and control method for data center cooling system based on machine learning

CN122028387ACN 122028387 ACN122028387 ACN 122028387ACN-122028387-A

Abstract

The invention belongs to the technical field of machine learning and data center heat management intersection, and particularly discloses an intelligent regulation and control method of a data center cooling system based on machine learning. According to the method, each cooling unit is configured as an intelligent body with sensing and decision making capability, distributed coordination is realized through a digital Fisher communication mechanism, the opening degree of an air valve and a water valve is automatically adjusted according to the Fisher concentration gradient, the energy consumption and the temperature deviation are optimized by combining a local reinforcement learning model, and when faults occur, the adjacent intelligent bodies can automatically identify abnormality and connect cooling tasks. By the technical scheme, sub-second real-time regulation and control, high-reliability self-healing operation and remarkable improvement of cooling energy efficiency ratio are realized.

Inventors

  • SUN MEILING
  • XIE MINTENG
  • LIU SHUAI
  • LI CHENGLONG
  • ZHOU XINYI
  • YU HONGYU
  • Dong Chengyang
  • LIU ZIHANG

Assignees

  • 大连高德瑞信科技有限公司

Dates

Publication Date
20260512
Application Date
20260410

Claims (10)

  1. 1. The intelligent regulation and control method for the cooling system of the data center based on machine learning is characterized by comprising the following steps of: Step 1, configuring each cooling unit in a data center as an intelligent body with independent sensing and decision-making capability, acquiring physical environment parameters of a local area where the cooling unit is located in real time by utilizing a multi-source sensor array integrated with the cooling unit, and acquiring state information of other cooling units physically adjacent to the cooling unit; Step 2, constructing a digital Fisher communication mechanism, enabling each intelligent agent to conduct distributed negotiation based on a local environment state and a neighbor state without intervention of a central controller, wherein the digital Fisher is used for representing local heat load intensity and cooling resource demand degree, and dynamically transmitting and updating between the intelligent agents; Step 3, according to the concentration gradient of digital feromone, each intelligent agent automatically adjusts the working parameters of an executing mechanism of the intelligent agent, and the adjusting process follows the bionic group cooperation rule, so that a plurality of adjacent intelligent agents automatically form a cooperation alliance when sensing a local overheat event and cooperatively allocate cooling resources; Step 4, training historical operation data by introducing a machine learning model, and generating a local decision strategy suitable for each intelligent agent, wherein the machine learning model is constructed based on a reinforcement learning framework, and continuously iterates and updates the behavior strategy of each intelligent agent by taking local energy consumption and temperature deviation as optimization targets through a reward function; And 5, establishing a fault self-healing mechanism, and when a certain cooling unit is abnormal in operation or communication is interrupted, automatically identifying an abnormal state by an adjacent intelligent agent through the change of a digital Fisher signal, dynamically taking over the originally born cooling task, and ensuring that the local area thermal environment is maintained within a safety threshold.
  2. 2. The intelligent regulation and control method for the cooling system of the data center based on machine learning of claim 1, wherein each cooling unit in the step 1 is provided with an embedded computing module and a multi-source sensor array; The physical environment parameters comprise air inlet temperature, air outlet temperature, chilled water inlet pressure, backwater pressure and air flow speed data of an air outlet end of the cabinet; The embedded computing module is internally pre-provided with high-frequency sampling logic, and the sampling period is set between 10 milliseconds and 100 milliseconds; the original current or voltage signals output by the sensor array are converted into digital quantities through an analog-to-digital conversion circuit, and then enter a local state evaluation unit of the embedded computing module; The local state evaluation unit carries out moving average filtering processing on the temperature and pressure data; Meanwhile, each intelligent agent acquires state information of other cooling units adjacent to the intelligent agent in physical space by utilizing a dynamic discovery protocol through an integrated communication interface of the intelligent agent; The status information is recorded in a neighbor list that includes unique identifiers of neighbor agents, geographic coordinates, cooling capacity ratings, and current operating load rates.
  3. 3. The intelligent regulation and control method of the data center cooling system based on machine learning of claim 1, wherein the digital Fisher communication mechanism in the step 2 adopts a lightweight message broadcasting protocol, and each agent periodically broadcasts its own current digital Fisher value to other agents in its physical vicinity; The digital feromon is defined as a multi-dimensional data structure containing the amount of temperature deviation, pressure anomaly gradients, residual refrigeration margin and energy efficiency factor perceived by the current agent; Each intelligent agent receives and weights and fuses the Fisher signal from the neighbor at the same time; The fusion weight is dynamically adjusted according to the physical distance and the thermal coupling strength; the thermal coupling strength is characterized by a temperature correlation coefficient between two devices in historical operation data; the fused feromorphic field provides the whole heat load portrait of the peripheral area for the intelligent body, so that the intelligent body can sense the potential heat threat outside the self-responsibility area; In the weighted fusion process, the received neighbor fertigmine value is multiplied by the corresponding fusion weight, and the multiplication results of all neighbors are accumulated and summed to serve as the correction basis of the local fertigmine field.
  4. 4. The intelligent regulation and control method of a data center cooling system based on machine learning of claim 3 wherein the dynamic propagation and updating of digital Fisher-Tropsch process follows the volatilization and diffusion logic of simulated biological characteristics; The digital Fisomon concentration value is attenuated along with the increase of the propagation distance, and the attenuation coefficient is dynamically set according to the environmental thermal diffusion characteristic; the attenuation law is described as that if the physical distance between the receiving agent and the source agent increases, the effective ferorosity transmitted to the receiving end decreases according to the square relation of the distance; The digital Fisher-Tropsch process simulates the natural volatilization rule of the biological Fisher-Tropsch, ensures that the effective action range of a signal in space is limited in a thermal influence area, and avoids regulation and control misalignment caused by long-distance interference.
  5. 5. The intelligent regulation and control method for the cooling system of the data center based on machine learning of claim 1, wherein the forming process of the collaboration alliance in the step 3 adopts a threshold trigger mechanism; when the comprehensive heat load index of the local area exceeds a preset cooperative triggering threshold, all the intelligent agents in the area automatically change into a cooperative mode from an independent operation mode; The comprehensive heat load index is a weighted sum of local temperature deviation and neighbor average feromon concentration; under the cooperative mode, each intelligent agent shares the respective real-time cooling output capacity and distributes main and auxiliary roles according to a preset priority rule; the priority rule considers the energy efficiency ratio and the response speed of the equipment, allocates the equipment with higher energy efficiency ratio and closer to the hot spot position as a dominant role, takes charge of main incremental cooling output, and other equipment as auxiliary roles provides redundant support; and each intelligent agent mutually notifies the residual refrigeration allowance of the intelligent agent through digital feromon, and when the residual refrigeration allowance of the peripheral intelligent agents is larger than a preset threshold value, the intelligent agent automatically joins the collaboration alliance.
  6. 6. The intelligent regulation and control method of the data center cooling system based on machine learning of claim 1, wherein the working parameter of the executing mechanism in the step 3 is adjusted by calculating a concentration gradient between a local ferlony concentration and a peripheral ferlony concentration; The actuating mechanism comprises a wind valve and a water valve, and the opening degree adjustment of the wind valve and the water valve follows a nonlinear response rule which is output by a local machine learning model; the adjusting amplitude and the digital Fisomon concentration gradient are in positive correlation, namely the opening degree adjusting quantity is larger as the concentration gradient is larger; When the concentration gradient change rate exceeds a preset protection threshold, introducing a smooth inhibition factor into the system, and avoiding mechanical abrasion and hydraulic imbalance by limiting the maximum displacement of the actuating mechanism in unit time; the intelligent agent can also correct according to the outdoor environment parameters acquired in real time, and when the outdoor temperature is lower than a preset natural cooling threshold, the intelligent agent sends a higher-priority calling request to the intelligent agent of the cooling tower through digital Fisher and Monte, and the external natural cold source is preferentially utilized.
  7. 7. The intelligent regulation and control method of the machine learning-based data center cooling system of claim 1, wherein the training data of the machine learning model in the step 4 is derived from a data center historical operation log, and comprises a load change curve, an environment temperature and humidity sequence and a cooling equipment energy consumption record; in the model training process, an experience playback mechanism is adopted, triples consisting of high-value states, actions and rewards are stored in a local memory, and a local decision function is optimized through a strategy gradient algorithm; the design logic of the reward function is that positive reward value is given when the temperature of the area is maintained within a set range and the energy efficiency ratio is increased, and negative reward value is given when the temperature exceeds a safety threshold or the action frequency of an executing mechanism exceeds a preset frequency threshold; the optimization objective is set to minimize the weighting of the local energy consumption value and the temperature deviation value, by continually adjusting the weighting parameters of the decision function, so as to maximize the expectation of the jackpot value.
  8. 8. The intelligent regulation and control method of the machine learning-based data center cooling system of claim 7, wherein the input feature vector of the machine learning model comprises local temperature deviation, historical energy consumption trend, neighbor average Fisher-on concentration and current actuator state, wherein the actuator state comprises current air valve opening and water valve opening; the model reasoning process is executed on the embedded computing module at fixed time intervals, so that the real-time generation of the regulation and control instruction is ensured; The machine learning model adopts a lightweight quantization neural network, model parameters are reduced from 32-bit floating point numbers to 8-bit integers, and calculation logic related to the reasoning process is converted into fixed-point operation matrix multiplication; at the data processing level, the memory inside the embedded computing module is divided into a real-time data area, a history buffer area and a strategy model area, and data exchange is carried out through an internal high-speed bus, so that the total delay from sensing decision to executing is controlled within 200 milliseconds.
  9. 9. The intelligent regulation and control method for the cooling system of the data center based on machine learning of claim 1, wherein the fault self-healing mechanism in the step 5 comprises an abnormality detection sub-module and a task reassignment sub-module; the abnormality detection sub-module judges whether the neighbor unit is invalid or not by monitoring continuous missing of the digital Fisher signal or a significant deviation from a preset threshold; once the adjacent intelligent agent does not receive the heartbeat signal or the Fiveleaf update of the neighbor in three continuous sampling periods, the neighbor unit is judged to be in a failure state; The task reassignment sub-module is used for rescheduling a cooling resource assignment scheme based on the capacity margin and the heat conduction path of the rest available cooling units so as to ensure heat load balance; The task redistribution process follows the optimal principle of a heat conduction path, and preferentially schedules other cooling devices on the same airflow loop with the fault unit; The original share of the heat load born by the failure unit is dynamically mapped into the decision space of the peripheral intelligent agent, and the peripheral intelligent agent generates larger cooling output in a local decision strategy by automatically adjusting the sensitive coefficient of the Fisher perception of the peripheral intelligent agent so as to fill the refrigerating gap caused by the fault.
  10. 10. The intelligent regulation and control method of the data center cooling system based on machine learning of claim 1, wherein the communication topology structure among the intelligent agents is a dynamic reconfigurable network, and the network connection relation is adjusted in real time along with the physical layout and the running state of the cooling unit; When a cooling unit is added or removed, the system automatically triggers a topology discovery protocol, updates neighbor lists of all the agents, and maintains the integrity and robustness of distributed coordination; under the application scene of the liquid cooling system, the cooling unit comprises a liquid cooling distribution unit and a secondary side circulating pump, and the acquired data increase the flow rate of the cooling liquid, the temperature difference between an inlet and an outlet and the state of a liquid leakage detection sensor; adding a pressure sensitive factor into a digital Fisher-Monte data structure, and synchronizing state information among all liquid cooling distribution unit intelligent bodies through a digital Fisher-Monte mechanism when the pressure fluctuation of a main pipe is caused by the increase of the demand of a certain liquid cooling branch; The fault self-healing mechanism further comprises hydraulic balance automatic recovery logic, when a pump group failure of a certain liquid cooling loop is detected, the task redistribution submodule calculates the hydrodynamic distribution of the whole network, and the cooling liquid is guided to be automatically compensated to an affected area by adjusting the opening of valves of other branches and utilizing pressure difference change.

Description

Intelligent regulation and control method for data center cooling system based on machine learning Technical Field The invention belongs to the technical field of machine learning and data center heat management intersection, and particularly relates to an intelligent regulation and control method of a data center cooling system based on machine learning. Background With the continuous evolution of internet infrastructure, the large-scale construction of data centers has placed stringent demands on thermal management systems. The cooling system serves as a key infrastructure for maintaining the running stability of the server hardware, and the energy efficiency ratio directly determines the running efficiency of the data center. In high density deployment scenarios, fluctuations in the cooling environment present a high degree of complexity, which requires that the control system be able to perform fast and accurate resource scheduling in accordance with real-time load changes. The intelligent regulation and control method based on machine learning attempts to realize optimal parameter matching of the cooling equipment by constructing a mapping model between the environment parameters and the executing mechanism. The technical direction aims at realizing the cooperative control of equipment such as a cooling tower, a precise air conditioner, a water pump and the like through the processing of multidimensional sensing data. In the process of pursuing energy efficiency optimization, the calculation efficiency of a regulation scheme and the reliability of the whole system become core indexes for measuring technical advantages and disadvantages. However, when the traditional centralized optimization model processes a high-dimensional state space with a large number of control variables, the traditional centralized optimization model faces a calculation problem, so that the decision generation period is too long to match the real-time regulation and control requirements of dynamic loads. Because the system logic is highly coupled with the central control unit, the single-point fault risk is extremely high, once the main control node has logic abnormality or the communication link is interrupted, the whole cooling network is exposed to breakdown risk, and the effective self-adaptive compensation and local self-healing capability are lacked. In addition, the prior art shows weakness in the aspects of processing nonlinear thermal coupling and multi-equipment cooperative interaction, so that local overheating or resource redundancy phenomenon coexist, and the operation safety of the data center is influenced. Accordingly, a machine learning based method of intelligent regulation of a data center cooling system is desired. Disclosure of Invention The invention aims to provide an intelligent regulation and control method for a data center cooling system based on machine learning, which can solve the problems in the background technology. In order to achieve the above purpose, the technical scheme adopted by the invention is an intelligent regulation and control method of a data center cooling system based on machine learning, which comprises the following specific steps: Step 1, configuring each cooling unit in a data center as an intelligent body with independent sensing and decision-making capability, wherein each cooling unit comprises a precise air conditioner, a chiller and a cooling tower, and each intelligent body acquires temperature and pressure data of a local area where the intelligent body is located in real time and acquires state information of other cooling units physically adjacent to the intelligent body; Step 2, constructing a digital feromone communication mechanism, enabling each intelligent body to conduct distributed negotiation based on a local environment state and a neighbor state without intervention of a central controller, wherein the digital feromone is used for representing local heat load intensity and cooling resource demand degree, and dynamically spreading and updating among the intelligent bodies; Step 3, according to the concentration gradient of digital feromone, each intelligent agent automatically adjusts the working parameters of an executing mechanism, wherein the executing mechanism comprises a wind valve and a water valve, and the adjusting process follows the cooperation rule of a bionic group, so that a plurality of adjacent intelligent agents automatically form a cooperation alliance when sensing a local overheat event, and cooperatively allocate cooling resources; Step 4, training historical operation data by introducing a machine learning model, and generating a local decision strategy suitable for each intelligent agent, wherein the machine learning model is constructed based on a reinforcement learning framework, and continuously iterates and updates the behavior strategy of each intelligent agent by taking local energy consumption and temperature deviation as optimiza