CN-121979739-A - Heat dissipation disaster recovery method and device of fan, storage medium and electronic equipment
Abstract
The invention provides a heat dissipation disaster recovery method and device for fans, a storage medium and electronic equipment, wherein the method comprises the steps of detecting working state information of the fans in a server, generating pre-warning information of target fans according to the working state information, identifying fan types of the target fans, and identifying fault types of the target fans according to the working state information and the fan types, wherein the fan types are used for representing rotor types and clusters of the target fans, and carrying out disaster recovery treatment on the target fans according to the fault types. By adopting the invention, the technical problems of untimely disaster recovery and large disaster recovery power consumption of the fans in the related technology are solved, the heat dissipation efficiency of the fans is improved, and the stable operation of the server is effectively ensured.
Inventors
- WANG ZIHAN
- ZHANG HAO
- HU YUANMING
Assignees
- 智锐达科技(杭州)有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251229
Claims (10)
- 1. The heat dissipation disaster recovery method of the fan is characterized by comprising the following steps of: Detecting working state information of a fan in a server; Generating pre-warning information of the target fan according to the working state information; Identifying a fan type of the target fan, and identifying a fault type of the target fan according to the working state information and the fan type, wherein the fan type is used for representing a rotor type and a cluster of the target fan; and carrying out disaster recovery treatment on the target fan according to the fault type.
- 2. The method of claim 1, wherein generating the alert information for the target fan based on the operational status information comprises: For each target fan, determining a Pulse Width Modulation (PWM) value output to the target fan by a Complex Programmable Logic Device (CPLD) of the server from the working state information, and determining an actual rotating speed fed back by the target fan from the working state information; Calculating the expected rotating speed of the target fan according to the PWM value; judging whether the rotation speed difference between the expected rotation speed and the actual rotation speed exceeds a preset range or not, and judging whether the actual rotation speed is lower than a lowest rotation speed or not; If the rotation speed difference between the expected rotation speed and the actual rotation speed exceeds a preset range or the actual rotation speed is lower than the lowest rotation speed, determining that the target fan fails, and generating the pre-warning information of the target fan.
- 3. The method of claim 1, wherein disaster recovery for the target fan based on the fault type comprises: Judging whether the fan type of the target fan is a combined fan or not; If the fan type of the target fan is a combined fan, judging whether a fan module in which the target fan is positioned is of an upper-lower layer structure, wherein the fan module comprises a plurality of fans, and each fan protects hardware of an independent partition; And if the fan module in which the target fan is positioned is of an upper-lower layer structure, disaster recovery processing is carried out on the target fan in the fan module according to the fault type.
- 4. The method of claim 1, wherein identifying the type of failure of the target fan based on the fan type comprises: reading in-place information of the target fan from the working state information, wherein the in-place information is used for representing whether the target fan is inserted or not; If the in-place information indicates that the target fan is out of place, determining that all rotors of the target fan have been damaged; if the in-place information indicates that the target fan is in place, reading the real-time rotating speed of each rotor in the target fan from the working state information; Judging whether the real-time rotating speed is lower than the lowest rotating speed; if the real-time rotating speed is lower than the lowest rotating speed, determining that the corresponding rotor of the target fan is damaged; If all rotors of the target fan are damaged and the target fan is monitored to be recovered to a normal state from a damaged state, determining that the fault type of the target fan is a start-up fault; If the rotor type of the target fan is multiple rotors and only part of the rotors are damaged, determining that the fault type of the target fan is a rotor fault; and if the adjacent fans of the cluster where the target fan is located are damaged, determining that the fault type of the target fan is regional fault.
- 5. The method of claim 1, wherein disaster recovery for the target fan based on the fault type comprises: if the fault type is a cranking fault, determining a fault recovery time of the target fan and determining adjacent fans of a cluster where the target fan is located, wherein the cranking fault is used for representing that all rotors of the target fan are recovered from a damaged state to a normal state; and starting from the fault recovery moment, continuously adopting a first preset rotating speed to control the operation of the target fan in a first disaster recovery time, and adopting a second preset rotating speed to control the operation of the adjacent fan, wherein the first preset rotating speed is higher than the second preset rotating speed.
- 6. The method of claim 1, wherein disaster recovery for the target fan based on the fault type comprises: if the fault type is rotor fault, determining the maximum full rotation speed of the target fan, wherein the rotor fault is used for representing that part of a plurality of rotors of the target fan are damaged; and continuously adopting the maximum full rotation speed to control the operation of the target fan in the second disaster recovery time.
- 7. The method of claim 1, wherein disaster recovery for the target fan based on the fault type comprises: If the fault type is a region fault, determining the real-time temperature of a heat dissipation object of the target fan from the working state information, wherein the region fault is used for representing that the adjacent fans of the cluster where the target fan is located are damaged; calculating an initial rotating speed according to the real-time temperature self-adaption; And if the adjacent fans are multi-rotor fans, controlling the target fans to run by adopting the initial rotating speed and the second incremental rotating speed in the third disaster recovery time, wherein the first incremental rotating speed is larger than the second incremental rotating speed.
- 8. The utility model provides a heat dissipation disaster recovery device of fan which characterized in that includes: the detection module is used for detecting the working state information of the fan in the server; The generating module is used for generating the pre-warning information of the target fan according to the working state information; The identification module is used for identifying the fan type of the target fan and identifying the fault type of the target fan according to the working state information and the fan type, wherein the fan type is used for representing the rotor type and the cluster of the target fan; And the disaster recovery module is used for carrying out disaster recovery treatment on the target fan according to the fault type.
- 9. A storage medium, wherein a computer program is stored in the storage medium, wherein the computer program is arranged to perform the steps of the heat dissipation and disaster recovery method of a fan as claimed in any one of claims 1 to 7 when run.
- 10. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are in communication with each other through the communication bus, and wherein: a memory for storing a computer program; a processor for executing the steps of the heat dissipation disaster recovery method of a fan according to any one of claims 1 to 7 by running a program stored on a memory.
Description
Heat dissipation disaster recovery method and device of fan, storage medium and electronic equipment Technical Field The invention relates to the technical field of servers, in particular to a heat dissipation disaster recovery method and device of a fan, a storage medium and electronic equipment. Background In the related art, in modern data centers and high performance servers, fans serve as critical heat dissipation components, playing an important role in ensuring system stability and efficient operation. With the increase of computing demands and the complexity of hardware design, the amount of heat generated inside the server is also increasing, and the reliability of the heat dissipation system becomes critical. Fan failure or failure often results in local area over-temperature, affecting system performance and possibly even causing hardware damage or system crashes. Therefore, how to improve the disaster recovery capability of the heat dissipation system of the server, and ensure that the system can maintain good heat dissipation performance at the cost of lower power consumption and lower noise even if a single fan or a plurality of fans fail, is an important subject in the design of the server. In the related art, fan disaster recovery schemes or techniques typically include a fan redundancy design in which many modern servers employ redundant fan designs, such as an n+1 redundancy architecture. This means that the system is configured with more fans than actually needed (e.g., one extra fan) to ensure that in the event of a fan failure, the remaining fans can continue to operate, guaranteeing the heat dissipation requirements of the system. Redundant fans are automatically activated when the fans fail, but this design requires the addition of redundant fans, severely impacting the server architecture design. Fan fault monitoring-some servers use a hardware monitoring system of the BMC (Baseboard Management Controller ) to monitor the running state of the fan in real time. When a fan fails (e.g., stalls or abnormal rotational speed), the monitoring system will alert and notify the administrator. The running state of the fan can be recorded, but the fault alarm usually needs manual intervention treatment, and untimely treatment can cause the over-temperature of the server component, so that the problem can not be automatically solved. The rotation speed of the fan is automatically regulated, and some server systems in the related art can automatically regulate the rotation speed of the fan according to feedback of a temperature sensor. The system can dynamically adjust the rotating speed of the fan according to the temperature information of hardware such as a CPU, a GPU, a memory and the like so as to increase the heat dissipation efficiency. Although this approach works well under normal fan operation, if the fan fails or stalls, the system typically cannot adjust or compensate the load of other fans in time, resulting in a system temperature that may still exceed a safe range. And (3) emergency regulation of the rotating speed of the fan, wherein after the abnormality of the fan is detected, some server systems in the related art can directly pull the rotating speeds of other fans of the system to full rotation so as to ensure the heat dissipation function of the server. However, this approach can lead to a dramatic increase in server power consumption and, after the abnormal fan is restored, the normal rotational speed cannot be restored due to wind speed backflow problems. The related art scheme has the disadvantage that although the redundancy design of the fan can provide a certain guarantee when one fan fails, if a plurality of fans fail at the same time, or the redundancy fan itself has a problem, a system relying on the redundancy design may still face the problem of insufficient heat dissipation. Most current fan failure disaster recovery methods still rely on manual intervention. In the event of a fan failure, while the system can issue a warning, it typically requires administrator intervention to manually replace the failed fan or adjust the fan configuration. During this time, the server may be exposed to a risk of overheating, resulting in hardware damage or a system crash. The speed is automatically regulated only by the PID according to the temperature sensor, the response is slow, and when the fan fails or stops, the rotating speed is not pulled up to the past, so that the temperature of the system exceeds a safe range. When the fans are abnormal, other fans are pulled to the highest rotating speed, and the heat dissipation risk can be effectively avoided, so that the power consumption of the whole machine is overhigh. In view of the above problems in the related art, an efficient and accurate solution has not been found. Disclosure of Invention The invention provides a heat dissipation disaster recovery method and device of a fan, a storage medium and electronic equipment, and