CN-122019241-A - PCIe device lost retry method, device and system
Abstract
The embodiment of the invention relates to a method for retrying lost PCIe equipment, which comprises the steps of obtaining power failure state information of PCIe equipment, judging whether the power failure state information is warm start, scanning slots of the PCIe equipment to obtain slot state information if the power failure state information is warm start, judging whether the slot state information is first preset state information, and retrying start of the PCIe equipment if the slot state information is first preset state information. According to the method, the initialization success rate of the equipment can be optimized, and the robustness of the system is improved.
Inventors
- WANG BIN
- LI HUA
Assignees
- 上海凌华智能科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (10)
- 1. The PCIe device loss retry method is characterized by comprising the following steps: Acquiring power failure state information of PCIe equipment; judging whether the power failure state information is warm start state information or not; If yes, respectively scanning the slot positions of the PCIe equipment based on the PCIe bus and the I2C bus to obtain slot position state information; judging whether the slot position state information is first preset state information or not; If yes, the PCIe device is restarted.
- 2. The method of claim 1, wherein the scanning slots of the PCIe device based on the PCIe bus and the I2C bus, respectively, to obtain slot status information specifically includes: Scanning the slot bit through a PCIe protocol to obtain PCIe state information; Scanning the slot position through an I2C protocol to obtain I2C state information; And summarizing the PCIe state information and the I2C state information to obtain the slot state information.
- 3. The method of claim 2, wherein the first predetermined status information is that the PCIe status is not in place and the I2C status is in place.
- 4. The method of claim 3, wherein the retrying the PCIe device comprises: Judging whether a warm start counter is smaller than a preset number of times, wherein the warm start counter is used for recording the start number of times when the PCIe equipment is in warm start state information; If yes, the warm start counter is increased for 1 time, and the retry counter of the PCIe device is increased for 1 time, wherein the retry counter is the retry start time of the PCIe device; and triggering the warm start state information of the PCIe equipment, and initializing and starting the PCIe equipment.
- 5. The method according to claim 4, wherein the method further comprises: if the warm start counter is larger than or equal to the preset times, the retry of the PCIe device is terminated, and the PCIe device is marked as abnormal.
- 6. The method according to claim 4, wherein the method further comprises: and when judging that the power failure state information is cold start state information, clearing the power failure state information of the PCIe equipment, resetting a warm start counter, and initializing and starting the PCIe equipment.
- 7. A PCIe device loss retry apparatus, comprising: the first acquisition module is used for acquiring power failure state information of the PCIe equipment; the first judging module is used for judging whether the power failure state information is warm start state information or not; The obtaining module is used for respectively scanning the slot positions of the PCIe equipment based on the PCIe bus and the I2C bus to obtain slot position state information when the first judging module judges that the power failure state information is warm start state information; The second judging module is used for judging whether the slot position state information is first preset state information or not; and the retry module is used for restarting the PCIe equipment when the second judging module judges that the slot position state information is the first preset state information.
- 8. The PCIe device loss retry apparatus of claim 7 wherein the deriving module comprises: The PCIe state information obtaining unit is used for scanning the slot bit through a PCIe protocol to obtain PCIe state information; The I2C state information obtaining unit is used for scanning the slot through an I2C protocol to obtain I2C state information; and the slot position state information obtaining unit is used for summarizing the PCIe state information and the I2C state information to obtain the slot position state information.
- 9. The PCIe device loss retry apparatus of claim 7, wherein the apparatus further comprises: And the initialization starting module is used for clearing the power failure state information of the PCIe equipment when the first judging module judges that the power failure state information is cold starting state information, resetting a warm starting counter and initializing and starting the PCIe equipment.
- 10. A PCIe device lost retry system comprising a memory and a processor coupled to said memory, said memory storing a computer program, said processor executing a PCIe device lost retry method as defined in any one of claims 1 to 6 when running said computer program.
Description
PCIe device lost retry method, device and system Technical Field The invention relates to the field of equipment control, in particular to a PCIe equipment loss retry method, a retry device and a retry system. Background PCIe (peripheralcomponentinterconnectexpress, high speed serial computer expansion bus standard) devices in servers or embedded devices have a small probability of "losing" (i.e., device loss) at startup, which can affect system availability. In the prior art, if no PCIe presence signal or an initialization response of the target device is received and is overtime, the device loss retry mechanism is determined. However, the retry mechanism cannot distinguish between the problem of "physical connection failure" and the problem of "PCIe protocol layer abnormality", so that a false retry or a missing retry is easy to occur, and thus the problem of low failure detection efficiency and system start-up delay is caused. Disclosure of Invention Therefore, in order to overcome at least part of the defects and shortcomings of the prior art, the embodiment of the invention provides a method for retrying lost PCIe equipment so as to improve the robustness of the system. On one hand, the PCIe equipment loss retry method comprises the steps of obtaining power failure state information of PCIe equipment, judging whether the power failure state information is warm start state information, if yes, scanning slots of the PCIe equipment based on a PCIe bus and an I2C bus to obtain slot state information, judging whether the slot state information is first preset state information, and if yes, retrying the PCIe equipment. In one embodiment of the invention, the scanning of the slot of the PCIe device based on the PCIe bus and the I2C bus to obtain slot state information specifically comprises the steps of scanning the slot through a PCIe protocol to obtain PCIe state information, scanning the slot through an I2C protocol to obtain I2C state information, and summarizing the PCIe state information and the I2C state information to obtain the slot state information. In one embodiment of the invention, the first preset state information is that the PCIe state is not in place and the I2C state is in place. In one embodiment of the invention, the method for retrying and starting the PCIe device specifically comprises the steps of judging whether a warm start counter is smaller than a preset number of times, wherein the warm start counter is used for recording the starting number of times when the PCIe device is in warm start state information, if yes, increasing the warm start counter for 1 time and increasing the retry counter of the PCIe device for 1 time, wherein the retry counter is the retry starting number of times of the PCIe device, triggering the warm start state information of the PCIe device and initializing and starting the PCIe device. In one embodiment of the invention, the method further comprises terminating retry of the PCIe device and marking the PCIe device as abnormal if the warm boot counter is determined to be greater than or equal to a preset number of times. In one embodiment of the invention, the method further comprises clearing the power failure state information of the PCIe device, resetting a warm boot counter, and initializing the PCIe device when the power failure state information is judged to be cold boot state information. On the other hand, the PCIe equipment loss retry device comprises a first acquisition module, a first judgment module, an acquisition module and a retry module, wherein the first acquisition module is used for acquiring power failure state information of the PCIe equipment, the first judgment module is used for judging whether the power failure state information is warm start state information, the acquisition module is used for respectively scanning slots of the PCIe equipment based on a PCIe bus and an I2C bus to obtain slot state information when the first judgment module judges that the power failure state information is warm start state information, the second judgment module is used for judging whether the slot state information is first preset state information, and the retry module is used for carrying out retry start on the PCIe equipment when the second judgment module judges that the slot state information is the first preset state information. In one embodiment of the invention, the obtaining module comprises a PCIe state information obtaining unit, an I2C state information obtaining unit and a slot state information obtaining unit, wherein the PCIe state information obtaining unit is used for scanning the slot through a PCIe protocol to obtain PCIe state information, the I2C state information obtaining unit is used for scanning the slot through an I2C protocol to obtain I2C state information, and the slot state information obtaining unit is used for summarizing the PCIe state information and the I2C state information to obtain the slot state information