CN-121979370-A - Control method, device, equipment and storage medium of server liquid cooling heat dissipation system
Abstract
The application relates to a control method, a device, equipment and a storage medium of a server liquid cooling heat dissipation system, belonging to the field of heat sinks, wherein the control method of the server liquid cooling heat dissipation system comprises the steps of obtaining CPU heat load data, liquid cooling heat dissipation system state data and surrounding environment data of the liquid cooling heat dissipation system, and generating CPU heat load trend and grade; and responding to the CPU heat load trend and grade, and adjusting one or more parameters of the cooling liquid flow in a plurality of branches, the energy consumption of the circulating pump and the energy consumption of the heat dissipation terminal so as to maintain the CPU temperature in a reasonable range. According to the control method of the server liquid cooling heat dissipation system, heat dissipation resources are allocated through fine management, the energy consumption of the circulating pump and the heat dissipation terminal is regulated in real time, the temperature of each CPU is independently monitored and controlled, the waste of power consumption of the circulating pump and the heat dissipation terminal is avoided, the heat dissipation efficiency of the liquid cooling heat dissipation system is effectively improved, and the energy consumption of the liquid cooling heat dissipation system is reduced.
Inventors
- LI YANXIONG
- PAN HUA
Assignees
- 东莞仁海科技股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260123
Claims (10)
- 1. The utility model provides a control method of server liquid cooling system, liquid cooling system includes coolant circulation pipeline, circulating pump, flow valve, heat dissipation terminal, coolant circulation pipeline includes trunk road and many branches, and many branches are used for cooling a plurality of CPUs of server, the circulating pump is used for driving coolant circulation flow in the coolant circulation pipeline, flow valve is used for adjusting the coolant flow in the branch road, heat dissipation terminal is used for cooling the coolant in the trunk road, its characterized in that, control method includes: acquiring CPU heat load data, liquid cooling heat dissipation system state data and surrounding environment data of the liquid cooling heat dissipation system, and generating CPU heat load trend and grade; And responding to the CPU heat load trend and grade, and adjusting one or more parameters of the cooling liquid flow in a plurality of branches, the energy consumption of the circulating pump and the energy consumption of the heat dissipation terminal so as to maintain the CPU temperature in a reasonable range.
- 2. The control method according to claim 1, wherein the CPU heat load data comprises CPU core average temperature, CPU hot spot temperature, CPU real-time power consumption, CPU load rate; The state data of the liquid cooling heat dissipation system comprises the flow rate of the cooling liquid in the main road, the temperature before and after entering and exiting the branch road, the flow rate of the cooling liquid in each branch road, the temperature before and after entering and exiting the CPU, the temperature before and after entering and exiting the heat dissipation terminal of the cooling liquid, the energy consumption of the circulating pump and the heat dissipation terminal; the surrounding environment data of the liquid cooling heat dissipation system comprises environment temperature and humidity.
- 3. The control method of claim 2, wherein the heat sink terminal has a fan, the energy consumption of the heat sink terminal includes a rotational speed of the fan, and the ambient data of the liquid cooling heat sink system further includes an intake air temperature of the fan.
- 4. The control method of claim 1, wherein the CPU heat load data and the liquid cooling system state data are obtained more frequently than the surrounding environment data of the liquid cooling system; The higher the CPU load rate is, the higher the acquisition frequency of CPU heat load data and liquid cooling heat dissipation system state data is.
- 5. The control method according to claim 1, wherein the steps of obtaining CPU heat load data, liquid cooling system state data, and liquid cooling system ambient environment data, and generating CPU heat load trend and level include: establishing a CPU heat load level system, and setting a trigger threshold value and a corresponding control strategy of each level; Acquiring CPU heat load data, liquid cooling heat dissipation system state data and liquid cooling heat dissipation system surrounding environment data, adjusting a trigger threshold value and a corresponding control strategy of each level, and judging the level of the current CPU heat load; And predicting the CPU heat load trend based on the acquired CPU heat load data.
- 6. The control method according to claim 5, wherein predicting the CPU heat load trend based on the acquired CPU heat load data includes predicting the CPU heat load trend within 3 minutes of the future based on the CPU heat load data of the past 5 minutes.
- 7. The control method according to claim 1, wherein the coolant flow rate in the main passage is in the range of 0.8 to 1.5m/s; The temperature difference range of the cooling liquid in the main road before and after entering and exiting the branch road is 5-8 ℃; The minimum flow of the cooling liquid in the branch is not lower than 5% of the flow of the cooling liquid in the main road; the temperature of the cooling liquid leaving the heat sink terminal was 35 ℃.
- 8. The control device of the liquid cooling heat dissipation system of the server is characterized by comprising: the data acquisition module is configured to acquire CPU heat load data, liquid cooling heat dissipation system state data and surrounding environment data of the liquid cooling heat dissipation system; The analysis judging module is configured to judge the CPU heat load trend and grade based on the CPU heat load data, the state data of the liquid cooling heat dissipation system and the surrounding environment data of the liquid cooling heat dissipation system; And the adjusting module is configured to adjust one or more parameters of the cooling liquid flow rate in the branch, the energy consumption of the circulating pump and the energy consumption of the heat dissipation terminal based on the CPU heat load trend and the grade so as to maintain the CPU temperature in a reasonable range.
- 9. An apparatus comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the control method of any one of claims 1 to 7.
- 10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the control method of any one of claims 1 to 7.
Description
Control method, device, equipment and storage medium of server liquid cooling heat dissipation system Technical Field The present application relates to the field of heat sinks, and in particular, to a method, an apparatus, a device, and a storage medium for controlling a server liquid cooling heat dissipation system. Background With the rapid development of artificial intelligence, big data and cloud computing technology, an AI server is used as a core computing force carrier and is iteratively upgraded towards multi-chip integration, high-density deployment and high-performance operation. In order to meet the high-strength calculation force requirements of large-scale deep learning training, mass data reasoning and the like, an AI server is usually integrated with a plurality of high-performance CPUs and special calculation force chips (such as GPU (graphics processing Unit) and NPU (non-processing Unit), the chip operation frequency and the core density are continuously improved, and the heat load in a unit volume is exponentially increased. The related data show that the power consumption of a single high-end AI power chip breaks through 300W, the whole heat power consumption of the multi-chip integrated AI server can reach thousands of watts, the heat density is far higher than that of a traditional universal server, and strict requirements are provided for the heat dissipation efficiency, the temperature control precision and the stability of a heat dissipation system. At present, the heat dissipation schemes of the AI server are mainly divided into two main types, namely air cooling heat dissipation and liquid cooling heat dissipation. Air cooling heat dissipation is widely applied to a medium-low load server scene by virtue of the advantages of simple structure and low cost, but is limited by the physical characteristics of low air convection heat transfer coefficient, and in a high-density and high-heat-load AI server, a heat dissipation bottleneck is easy to occur, the problem of local hot spots generated by multi-chip parallel operation cannot be effectively solved, so that the chip is triggered to be subjected to frequency reduction protection due to overhigh temperature, even hardware high-temperature aging occurs, and calculation force release is severely restricted. Compared with air cooling heat dissipation, liquid cooling heat dissipation is based on the advantages of high liquid convection heat exchange efficiency and good temperature uniformity, heat dissipation capacity can be remarkably improved, and the whole temperature and local hot spots of the multi-chip cluster are effectively controlled, so that the liquid cooling heat dissipation system becomes a main stream heat dissipation scheme of a high-end AI server. However, the existing liquid cooling heat dissipation system of the AI server mostly adopts a logic of constant speed control or simple threshold triggering type passive control, the core design thought of the system takes 'meeting the heat dissipation requirement of the extremely limited heat load' as the primary target, the dynamic balance optimization of heat dissipation efficiency and system power consumption is lacking, a plurality of technical defects are exposed in the actual operation process, and the operation characteristics of large load fluctuation and uneven heat load distribution of the AI server are difficult to adapt, and the specific problems are as follows: Firstly, the heat dissipation resource allocation is extensive, and the local overheating and the energy consumption waste coexist. The existing liquid cooling heat dissipation system mostly adopts a fixed distribution strategy of evenly distributing the flow of cooling liquid, and does not consider the heat load difference of a plurality of CPUs and computing power chips under different task scenes, so that the chips with higher heat loads are locally overheated, and the branches of the chips with lower heat loads have redundancy of the flow of the cooling liquid, thereby wasting the power consumption of the circulating pump. Meanwhile, the circulating pump and the cold exhaust fan mostly adopt a constant-speed operation mode, or the rotation speed switching is triggered only through a single temperature threshold value, no matter the system is in a low-load, medium-load or high-load state, higher heat dissipation power output is maintained, dynamic adjustment cannot be carried out according to actual heat load requirements, and therefore the overall power consumption of the liquid cooling system is higher, and the heat dissipation efficiency (heat dissipation power/system power consumption) is lower. And secondly, the passive control response is delayed, and the temperature overshoot influences the calculation force stability. The load scenario of the AI server (such as model training initiation, multitasking, computing power cluster scheduling) has significant variability, and the thermal load can surge