CN-122019301-A - Health monitoring method and device of electronic equipment and electronic equipment
Abstract
The embodiment of the application provides a health monitoring method and device of electronic equipment and the electronic equipment, wherein in the health monitoring method, system management interrupt can be periodically triggered based on a timer, and comparing the current state of each device to be monitored with the normal state in the interruption process, determining whether an abnormality occurs, and outputting an alarm after the abnormality occurs. Therefore, the real-time monitoring of the electronic equipment is realized, the abnormal state of the equipment to be monitored can be timely found before the equipment is completely failed, and the alarm is given in advance. Therefore, the system fault risk can be prejudged in advance, and the reliability and maintainability of the system are further improved. Particularly, under the condition that the electronic equipment is a network server in a network adopting FTTR architecture, the state abnormality of the equipment to be monitored can be timely found, so that the maintenance can be timely carried out, and serious problems such as network disconnection caused by hardware faults can be reduced.
Inventors
- LI XUE
Assignees
- 新华三信息技术有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260123
Claims (10)
- 1. A method for health monitoring of an electronic device, comprising: After an operating system of the electronic equipment is started, judging whether the current moment reaches the triggering moment of the system management interrupt or not based on a timer, and calling a system management interrupt program under the condition of reaching the triggering moment; The system management interrupt program executes interrupt processing, and comprises the steps of scanning each device to be monitored to obtain current state information of each device to be monitored, comparing the current state information of each device to be monitored with normal state information of each device to be monitored, determining whether abnormality occurs to be monitored or not based on a comparison result, outputting an alarm and ending the system management interrupt program when abnormality occurs, and ending the system management interrupt program when abnormality does not occur, wherein the normal state information of each device to be monitored is obtained and stored in the starting process of an operating system of the electronic device.
- 2. The method of claim 1, wherein the device to be monitored comprises one or more of a PCIe device, a memory device, a USB device, and a SATA device; In the case where a PCIe device or a memory device or a USB device or a SATA device is included in the device to be monitored, The step of scanning each device to be monitored to obtain the current state information of each device to be monitored comprises the steps of obtaining the current in-place state information of PCIe devices or memory devices or USB devices or SATA devices; Comparing the current state information of each device to be monitored with the normal state information of each device to be monitored, and determining whether the device to be monitored is abnormal based on the comparison result, wherein the step of determining whether the device to be monitored is abnormal comprises determining that the device to be monitored is abnormal if the current in-place state information of the device to be monitored is not in place.
- 3. The method for health monitoring as set forth in claim 2, wherein the device to be monitored further comprises a CPU; the step of scanning each device to be monitored to obtain the current state information of each device to be monitored, and the step of obtaining the core number and the working frequency of the CPU running currently; comparing the current state information of each device to be monitored with the normal state information of each device to be monitored, and determining whether the device to be monitored is abnormal or not based on the comparison result, wherein the method further comprises the following steps: If the number of cores currently operated by the CPU is smaller than the number of cores of the CPU installed by the electronic equipment or the difference between the current working frequency and the normal working frequency is larger than a preset working frequency difference threshold value, judging that the CPU is abnormal.
- 4. The method of health monitoring as set forth in claim 2, wherein, When the equipment to be monitored comprises PCIe equipment, the step of scanning each equipment to be monitored to obtain the current state information of each equipment to be monitored further comprises the steps of obtaining the current bandwidth information and the current speed information of the PCIe equipment; comparing the current state information of each device to be monitored with the normal state information of each device to be monitored, and determining whether the device to be monitored is abnormal or not based on the comparison result, wherein the method further comprises the following steps: if the difference value between the current bandwidth and the normal bandwidth is larger than a preset bandwidth difference threshold value or the difference value between the current rate and the normal rate is larger than a preset rate difference threshold value, judging that the PCIe device is abnormal.
- 5. The method of health monitoring as set forth in claim 2, wherein, When the equipment to be monitored comprises the memory equipment, the step of scanning each piece of equipment to be monitored to obtain the current state information of each piece of equipment to be monitored further comprises the steps of obtaining the current power management information of the memory equipment by reading a power management register; the step of comparing the current state information of each device to be monitored with the normal state information of each device to be monitored, and determining whether the abnormality occurs to the memory device based on the comparison result further comprises the step of judging that the abnormality occurs to the memory device if the power management information of the memory device indicates that the memory device has the overcurrent or overvoltage condition.
- 6. The method for health monitoring according to claim 2, wherein the CPU of the electronic device triggers a system management interrupt after receiving a fault signal sent by any device to be monitored, and executes the system management interrupt program.
- 7. The method of health monitoring as set forth in claim 1, wherein, In the starting process of the operating system, further reading monitoring configuration information of whether periodic monitoring is performed in the substrate management controller or not, wherein the monitoring configuration information is set by a user through a control interface; And under the condition that the monitoring configuration information indicates that the periodic monitoring is performed, after the operating system of the electronic equipment is started, judging whether the current moment reaches the triggering moment of the system management interrupt or not based on a preset triggering period, and calling the system management interrupt program under the condition that the current moment reaches the triggering moment.
- 8. A health monitoring apparatus of an electronic device, comprising: The system comprises an interrupt period triggering module, a system management interrupt program and a control module, wherein the interrupt period triggering module is used for judging whether the current moment reaches the triggering moment of the system management interrupt or not based on a timer after an operating system of the electronic equipment is started; The interrupt processing module is used for executing interrupt processing of the system management interrupt program and comprises the steps of scanning each device to be monitored to obtain current state information of each device to be monitored, comparing the current state information of each device to be monitored with normal state information of each device to be monitored, determining whether abnormality occurs to be monitored or not based on a comparison result, outputting an alarm and ending the system management interrupt program when abnormality occurs, ending the system management interrupt program when abnormality does not occur, and acquiring and storing the normal state information of each device to be monitored in the starting process of an operating system of the electronic device.
- 9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; a processor for implementing the method steps of any of claims 1-7 when executing a program stored on a memory.
- 10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-7.
Description
Health monitoring method and device of electronic equipment and electronic equipment Technical Field The present application relates to the field of monitoring technologies of electronic devices, and in particular, to a health monitoring method and apparatus for an electronic device, and an electronic device. Background At present, electronic devices such as a network server and a computer mainly monitor the running conditions of software and hardware of the electronic devices in an interrupt mode. For example, when a serious hardware event occurs in the electronic device, such as a memory error, a peripheral component interconnect (PCI, PERIPHERAL COMPONENT INTERCONNECT) bus error, or a chipset failure, the system firmware triggers a system management Interrupt (SMI, system Management Interrupt) to log the event to the system event and initiate a recovery procedure. Because the hardware interrupt mechanism generally can only passively respond after the occurrence of a fault, and in the case of serious hardware faults, shutdown maintenance is also required for the electronic equipment, so that the operation of the electronic equipment is greatly influenced. For example, for a network server in a network employing FTTR (Fiber To The Room, fiber-to-room) architecture, if an SMI is triggered by a hardware failure, serious problems such as network disconnection may occur due to the hardware failure. Therefore, how to predict the risk of system failure before the equipment fails completely is a problem to be solved. Disclosure of Invention The embodiment of the application aims to provide a health monitoring method and device of electronic equipment and the electronic equipment, so as to predict the system fault risk in advance. The specific technical scheme is as follows: in a first aspect, an embodiment of the present application provides a health monitoring method for an electronic device, including: After an operating system of the electronic equipment is started, judging whether the current moment reaches the triggering moment of the system management interrupt or not based on a timer, and calling a system management interrupt program under the condition of reaching the triggering moment; The system management interrupt program executes interrupt processing, and comprises the steps of scanning each device to be monitored to obtain current state information of each device to be monitored, comparing the current state information of each device to be monitored with normal state information of each device to be monitored, determining whether abnormality occurs to be monitored or not based on a comparison result, outputting an alarm and ending the system management interrupt program when abnormality occurs, and ending the system management interrupt program when abnormality does not occur, wherein the normal state information of each device to be monitored is obtained and stored in the starting process of an operating system of the electronic device. In one possible implementation manner, the device to be monitored comprises one or more of PCIe device, memory device, USB device and SATA device; In the case where a PCIe device or a memory device or a USB device or a SATA device is included in the device to be monitored, The step of scanning each device to be monitored to obtain the current state information of each device to be monitored comprises the steps of obtaining the current in-place state information of PCIe devices or memory devices or USB devices or SATA devices; Comparing the current state information of each device to be monitored with the normal state information of each device to be monitored, and determining whether the device to be monitored is abnormal based on the comparison result, wherein the step of determining whether the device to be monitored is abnormal comprises determining that the device to be monitored is abnormal if the current in-place state information of the device to be monitored is not in place. In one possible implementation manner, the device to be monitored further comprises a CPU; the step of scanning each device to be monitored to obtain the current state information of each device to be monitored, and the step of obtaining the core number and the working frequency of the CPU running currently; comparing the current state information of each device to be monitored with the normal state information of each device to be monitored, and determining whether the device to be monitored is abnormal or not based on the comparison result, wherein the method further comprises the following steps: If the number of cores currently operated by the CPU is smaller than the number of cores of the CPU installed by the electronic equipment or the difference between the current working frequency and the normal working frequency is larger than a preset working frequency difference threshold value, judging that the CPU is abnormal. In one possible implementation manner, in the case that the devi