EP-4738113-A1 - INTERRUPT HANDLING METHOD, APPARATUS AND DEVICE
Abstract
This application relates to the field of computer technologies, and provides an interrupt processing method, apparatus, and device, to resolve system suspension or firmware suspension caused by a large quantity of interrupts triggered by non-fatal errors. The method includes: Firmware receives a plurality of interrupts in a first duration, where the plurality of interrupts are triggered when non-fatal errors occur in a hardware layer. The firmware masks, based on interrupt information and layered masking information of the plurality of interrupts, an interrupt that is of a target to-be-masked object corresponding to the plurality of interrupts and that is after the first duration, where the layered masking information indicates a plurality of to-be-masked objects, a plurality of to-be-masked objects are obtained by layering transmission of the interrupts, and the plurality of to-be-masked objects include the target to-be-masked object. The firmware sends the interrupt information of the plurality of interrupts to a kernel driver.
Inventors
- WU, Rongjun
- ZHOU, WEI
Assignees
- Huawei Technologies Co., Ltd.
Dates
- Publication Date
- 20260506
- Application Date
- 20240204
Claims (20)
- An interrupt processing method, applied to firmware, wherein the firmware is configured to run on a hardware layer, and the method comprises: receiving, by the firmware, a plurality of interrupts in a first duration, wherein the plurality of interrupts are triggered when non-fatal errors occur in the hardware layer; masking, by the firmware, based on interrupt information and layered masking information of the plurality of interrupts, an interrupt that is of a target to-be-masked object corresponding to the plurality of interrupts and that is after the first duration, wherein the layered masking information indicates a plurality of to-be-masked objects, a plurality of to-be-masked objects are obtained by layering transmission of the interrupts, and the plurality of to-be-masked objects comprise the target to-be-masked object; and sending, by the firmware, the interrupt information of the plurality of interrupts to a kernel driver.
- The method according to claim 1, wherein the plurality of to-be-masked objects comprise at least one of the following: a hardware module in the hardware layer, an interrupt line for transmission of an interrupt, and an interrupt bit of a status register for storing an interrupt.
- The method according to claim 1 or 2, wherein the masking, by the firmware, based on the interrupt information and the layered masking information of the plurality of interrupts, the interrupt that is of the target to-be-masked object corresponding to the plurality of interrupts and that is after the first duration comprises: when the plurality of interrupts belong to the target to-be-masked object, and a quantity of the plurality of interrupts reaches a preset threshold corresponding to the target to-be-masked object, masking, by the firmware, the interrupt that is of the target to-be-masked object and that is after the first duration, wherein interrupt information of each interrupt indicates a to-be-masked object corresponding to the interrupt.
- The method according to claim 3, wherein the interrupt information comprises at least one of the following: a hardware module identifier, an interrupt line identifier, and an error type of the non-fatal error and corresponding to the interrupt bit.
- The method according to claim 4, wherein the method further comprises: when the interrupt information comprises the hardware module identifier, determining, by the firmware based on the hardware module identifier in the interrupt information of each interrupt, a hardware module to which each interrupt belongs; when the interrupt information comprises the interrupt line identifier, determining, by the firmware based on the interrupt line identifier in the interrupt information of each interrupt, an interrupt line to which each interrupt belongs; and/or when the interrupt information comprises the error type of the non-fatal error and corresponding to the interrupt bit, determining, by the firmware based on the error type in the interrupt information of each interrupt, an interrupt bit to which each interrupt belongs.
- The method according to any one of claims 1 to 5, wherein the hardware layer comprises a first hardware system, the firmware comprises first firmware running on the first hardware system, the non-fatal errors comprise an RSA specification error and/or a safety error, and the first firmware is configured to mask a to-be-masked object corresponding to the RSA specification error and/or a to-be-masked object corresponding to the safety error.
- The method according to any one of claims 1 to 6, wherein the hardware layer further comprises a second hardware system, the firmware further comprises second firmware running on the second hardware system, the non-fatal errors comprise a non-RSA specification error, and the second firmware is configured to mask a to-be-masked object corresponding to the non-RSA specification error.
- The method according to claim 7, wherein for the non-RSA specification error, the method further comprises: unmasking, by the second firmware, the masked object in the plurality of to-be-masked objects after a delay of second duration.
- The method according to claim 8, wherein the unmasking is performed after processing of a last interrupt in the plurality of interrupts is completed; or the unmasking is performed in sequence.
- An interrupt processing method, applied to a kernel driver, wherein the kernel driver is configured to run on a hardware layer, and the method comprises: receiving, by the kernel driver, interrupt information of a plurality of interrupts sent by firmware, wherein the plurality of interrupts are received by the firmware in a first duration, and the plurality of interrupts are triggered when non-fatal errors occur in the hardware layer; and processing, by the kernel driver, the plurality of interrupts based on the interrupt information of the plurality of interrupts, wherein an interrupt that is of a target to-be-masked object corresponding to the plurality of interrupts and that is after the first duration is masked by the firmware, the firmware performs masking based on the interrupt information and layered masking information of the plurality of interrupts, a plurality of to-be-masked objects are obtained by layering transmission of the interrupts, and the plurality of to-be-masked objects comprise the target to-be-masked object.
- The method according to claim 10, wherein the plurality of to-be-masked objects comprise at least one of the following: a hardware module in the hardware layer, an interrupt line for transmission of an interrupt, and an interrupt bit of a status register for storing an interrupt.
- The method according to claim 10 or 11, wherein the interrupt information comprises at least one of the following: a hardware module identifier, an interrupt line identifier, and an error type of the non-fatal error and corresponding to the interrupt bit.
- The method according to any one of claims 10 to 12, wherein the non-fatal errors comprise an RSA specification error and/or a safety error, and the method further comprises: unmasking, by the kernel driver, the masked object in the plurality of to-be-masked objects after a delay of second duration.
- The method according to claim 13, wherein the unmasking is performed after processing of a last interrupt in the plurality of interrupts is completed; or the unmasking is performed in sequence.
- An interrupt processing device, used in firmware, wherein the firmware is configured to run on a hardware layer, and the apparatus comprises: a receiving unit, configured to receive a plurality of interrupts in a first duration, wherein the plurality of interrupts are triggered when non-fatal errors occur in the hardware layer; a processing unit, configured to mask, based on interrupt information and layered masking information of the plurality of interrupts, an interrupt that is of a target to-be-masked object corresponding to the plurality of interrupts and that is after the first duration, wherein the layered masking information indicates a plurality of to-be-masked objects, a plurality of to-be-masked objects are obtained by layering transmission of the interrupts, and the plurality of to-be-masked objects comprise the target to-be-masked object; and a sending unit, configured to send the interrupt information of the plurality of interrupts to a kernel driver.
- The apparatus according to claim 15, wherein the plurality of to-be-masked objects comprise at least one of the following: a hardware module in the hardware layer, an interrupt line for transmission of an interrupt, and an interrupt bit of a status register for storing an interrupt.
- The apparatus according to claim 15 or 16, wherein the processing unit is further configured to: when the plurality of interrupts belong to the target to-be-masked object, and a quantity of the plurality of interrupts reaches a preset threshold corresponding to the target to-be-masked object, mask the interrupt that is of the target to-be-masked object and that is after the first duration, wherein interrupt information of each interrupt indicates a to-be-masked object corresponding to the interrupt.
- The apparatus according to claim 17, wherein the interrupt information comprises at least one of the following: a hardware module identifier, an interrupt line identifier, and an error type of the non-fatal error and corresponding to the interrupt bit.
- The apparatus according to claim 18, wherein the processing unit is further configured to: when the interrupt information comprises the hardware module identifier, determine, based on the hardware module identifier in the interrupt information of each interrupt, a hardware module to which each interrupt belongs; when the interrupt information comprises the interrupt line identifier, determine, based on the interrupt line identifier in the interrupt information of each interrupt, an interrupt line to which each interrupt belongs; and/or when the interrupt information comprises the error type of the non-fatal error and corresponding to the interrupt bit, determine, based on the error type in the interrupt information of each interrupt, an interrupt bit to which each interrupt belongs.
- The apparatus according to any one of claims 15 to 19, wherein the hardware layer comprises a first hardware system, the firmware comprises first firmware running on the first hardware system, the non-fatal errors comprise an RSA specification error and/or a safety error, and the first firmware is configured to mask a to-be-masked object corresponding to the RSA specification error and/or a to-be-masked object corresponding to the safety error.
Description
This application claims priority to Chinese Patent Application No. 202310942590.9, filed with the China National Intellectual Property Administration on July 27, 2023 and entitled "INTERRUPT PROCESSING METHOD, APPARATUS, AND DEVICE", which is incorporated herein by reference in its entirety. TECHNICAL FIELD This application relates to the field of computer technologies, and in particular, to an interrupt processing method, apparatus, and device. BACKGROUND In a system on a chip (system on a chip, SoC), a large quantity of non-fatal errors (non-fatal errors, NFEs) occur suddenly. For example, the non-fatal errors may include correctable errors (correctable errors, CEs) and non-fatal uncorrectable errors. For example, when faults occur in a large quantity of areas (for example, exceptions simultaneously occur in a plurality of pages (pages)) in a memory of the SoC, and/or an access frequency of the fault areas is high, a large quantity of non-fatal errors occur suddenly in the memory. These non-fatal errors trigger a large quantity of interrupts, affecting normal service access and even causing system suspension or suspension of firmware (firmware) in a system. SUMMARY This application provides an interrupt processing method, apparatus, and device, to resolve system suspension or firmware suspension caused by a large quantity of interrupts triggered by non-fatal errors. To achieve the foregoing objectives, the following technical solutions are used in embodiments of this application. According to a first aspect, an interrupt processing method is provided, and is applied to firmware, where the firmware is configured to run on a hardware layer, and the hardware layer may include hardware modules such as a processor, an accelerator, a storage, an I/O unit, a sensor, and a bus. The method includes: The firmware receives a plurality of interrupts in a first duration, where the plurality of interrupts are triggered when non-fatal errors occur in the hardware layer, and the non-fatal errors may include correctable errors and non-fatal uncorrectable errors. The firmware masks, based on interrupt information and layered masking information of the plurality of interrupts, an interrupt that is of a target to-be-masked object corresponding to the plurality of interrupts and that is after the first duration, where the layered masking information indicates a plurality of to-be-masked objects, a plurality of to-be-masked objects are obtained by layering transmission of the interrupts, and the plurality of to-be-masked objects include the target to-be-masked object. The firmware sends the interrupt information of the plurality of interrupts to a kernel driver. In the foregoing technical solution, when receiving, in a first duration, the plurality of interrupts that are triggered by the non-fatal errors, the firmware may send the interrupt information of the plurality of interrupts to the kernel driver. In addition, the firmware may further mask, based on the interrupt information and the layered masking information of the plurality of interrupts, the interrupt that is of the target to-be-masked object corresponding to the plurality of interrupts and that is after the first duration. The layered masking information indicates the plurality of to-be-masked objects, and the plurality of to-be-masked objects are obtained by layering the transmission of the interrupts. Therefore, according to a strategy of layered masking, a case in which a large quantity of non-fatal errors trigger a large quantity of interrupts is avoided, thereby resolving system suspension or firmware suspension. In a possible implementation of the first aspect, the plurality of to-be-masked objects include at least one of the following: a hardware module in the hardware layer, an interrupt line for transmission of an interrupt, and an interrupt bit of a status register for storing an interrupt. In a possible example, a plurality of layers obtained through layering may include a hardware module layer, an interrupt line layer, and an interrupt bit layer. A to-be-masked object in the hardware module layer may include a plurality of different hardware modules. For example, the to-be-masked object in the hardware module layer includes a CPU, an accelerator, and a storage. A to-be-masked object in the interrupt line layer may include a plurality of different interrupt lines. For example, the to-be-masked object in the interrupt line layer includes a plurality of interrupt lines identified by different interrupt numbers. A to-be-masked object in the interrupt bit layer may include different interrupt bit layers. For example, the to-be-masked object in the interrupt bit layer includes different interrupt bits. In a possible implementation of the first aspect, that the firmware masks, based on the interrupt information and the layered masking information of the plurality of interrupts, the interrupt that is of the target to-be-masked object corresponding to the plural