US-12619486-B2 - Mechanism of enabling fault handling with PCIe re-timer
Abstract
An extension device is positioned within a point-to-point link to connect two devices, where the extension device includes error detection circuitry to detect a set of errors at the extension device. The extension device further includes memory to store an event register, where the extension device is to write data to the event register to describe detection of an error by the error detection circuitry. The extension device further includes a transmitter to transmit a notification signal to indicate the detection of the error and presence of the data in the event register associated with the error.
Inventors
- Haifeng Gong
- Manisha M. Nilange
- Shiwei XU
- Xiaoxia Fu
Assignees
- INTEL CORPORATION
Dates
- Publication Date
- 20260505
- Application Date
- 20230724
Claims (20)
- 1 . An apparatus comprising: a retimer to be positioned between two devices on a point-to-point link to extend a physical length of the link, wherein the retimer comprises: an upstream pseudo port to connect to a first one of the two devices; a downstream pseudo port to connect to a second one of the two devices; error detection circuitry to detect an error at the retimer; an error register comprising a plurality of register fields to indicate detection of any one of a plurality of error types by the error detection circuitry, wherein the retimer is to write data to a respective one of the plurality of register fields in the error register to identify a corresponding one of the plurality of error types applicable to the error as detected by the error detection circuitry, wherein the plurality of error types comprise at least one of receiver errors, internal retimer errors, or equalization errors; and circuitry to send a notification of the error to at least one of the two devices.
- 2 . The apparatus of claim 1 , wherein the notification comprises data from the error register.
- 3 . The apparatus of claim 1 , wherein the notification comprises an interrupt.
- 4 . The apparatus of claim 1 , wherein the notification comprises a sideband signal to be sent over a sideband channel, and the retimer comprises one or more pins to support the sideband channel.
- 5 . The apparatus of claim 4 , wherein the sideband channel comprises a system management bus (SMBUS).
- 6 . The apparatus of claim 1 , wherein the link is in compliance with a Peripheral Component Interconnect Express (PCIe)-based protocol.
- 7 . The apparatus of claim 6 , wherein the retimer comprises protocol circuitry to be protocol aware of a physical layer of the PCIe-based protocol.
- 8 . The apparatus of claim 7 , wherein the retimer is to participate in equalization of the link using the protocol circuitry.
- 9 . The apparatus of claim 1 , further comprising retimer circuitry to pass data received on the upstream port on to the downstream port and pass data received on the downstream port on to the upstream port.
- 10 . The apparatus of claim 1 , wherein the error register further comprises additional fields to indicate which error types in the plurality of error types the error detection circuitry is enabled to detect.
- 11 . The apparatus of claim 1 , wherein the plurality of error types comprise receiver errors, internal retimer errors, and equalization errors.
- 12 . A method comprising: forwarding data using a retimer between a first device and a second device interconnected by a link; detecting, at the retimer, an error associated with the link or the retimer; determining, at the retimer, an error type for the error from a plurality of different error types, wherein the plurality of error types comprise at least one of receiver errors, internal retimer errors, or equalization errors; recording the error type of the error in an error register in the retimer; sending a notification of the error to at least one of the first device or the second device; and returning data from the error register to at least one of the first device or the second device.
- 13 . The method of claim 12 , wherein the link is compliant with a PCIe-based protocol, and the error register is based on the PCIe-based protocol.
- 14 . The method of claim 12 , wherein the retimer is compliant with a PCIe-based protocol.
- 15 . A system comprising: a first device; a second device connected to the first device by a point-to-point link; and a retimer positioned between the first device and second device in the link to extend physical distance of the link, wherein the retimer comprises: a first pseudo port to connect to the first device; a second pseudo port to connect to the second device; error detection circuitry to detect an error at the retimer; an error register comprising a plurality of register fields to indicate detection of any one of a plurality of error types by the error detection circuitry, wherein the retimer is to write data to a respective one of the plurality of register fields in the error register to identify a corresponding one of the plurality of error types applicable to the error as detected by the error detection circuitry, wherein the plurality of error types comprise at least one of receiver errors, internal retimer errors, or equalization errors; and circuitry to return data from the error register to one of the first device or the second device.
- 16 . The system of claim 15 , further comprising an event handler to process the data from the error register responsive to the notification signal.
- 17 . The system of claim 16 , wherein the event handler is further to: generate log data for the error based on the data in the event register; and initiate corrective action for the error.
- 18 . The system of claim 15 , wherein the retimer further comprises a transmitter to send a notification signal to indicate detection of the error.
- 19 . The system of claim 18 , wherein the notification signal comprises an interrupt.
- 20 . The system of claim 18 , wherein the notification signal comprises a sideband signal to be sent over a sideband channel, and the retimer comprises one or more pins to support the sideband channel.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 17/255,317, filed Dec. 22, 2020, and entitled, “A MECHANISM OF ENABLING FAULT HANDLING WITH PCIE RE-TIMER,” which is a national stage application under 35 U.S.C. § 371 of PCT International Application Serial No. PCT/CN2018/108442, filed on Sep. 28, 2018, and entitled “ERROR REPORTING IN LINK EXTENSION DEVICES,” the entire disclosure of which is incorporated herein by reference. FIELD This disclosure pertains to computing system, and in particular (but not exclusively) to extension devices in point-to-point interconnects. BACKGROUND Advances in semi-conductor processing and logic design have permitted an increase in the amount of logic that may be present on integrated circuit devices. As a corollary, computer system configurations have evolved from a single or multiple integrated circuits in a system to multiple cores, multiple hardware threads, and multiple logical processors present on individual integrated circuits, as well as other interfaces integrated within such processors. A processor or integrated circuit typically comprises a single physical processor die, where the processor die may include any number of cores, hardware threads, logical processors, interfaces, memory, controller hubs, etc. As a result of the greater ability to fit more processing power in smaller packages, smaller computing devices have increased in popularity. Smartphones, tablets, ultrathin notebooks, and other user equipment have grown exponentially. However, these smaller devices are reliant on servers both for data storage and complex processing that exceeds the form factor. Consequently, the demand in the high-performance computing market (i.e. server space) has also increased. For instance, in modern servers, there is typically not only a single processor with multiple cores, but also multiple physical processors (also referred to as multiple sockets) to increase the computing power. But as the processing power grows along with the number of devices in a computing system, the communication between sockets and other devices becomes more critical. In fact, interconnects have grown from more traditional multi-drop buses that primarily handled electrical communications to full blown interconnect architectures that facilitate fast communication. Unfortunately, as the demand for future processors to consume at even higher-rates corresponding demand is placed on the capabilities of existing interconnect architectures. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an embodiment of a computing system including an interconnect architecture. FIG. 2 illustrates an embodiment of a interconnect architecture including a layered stack. FIG. 3 illustrates an embodiment of a request or packet to be generated or received within an interconnect architecture. FIG. 4 illustrates an embodiment of a transmitter and receiver pair for an interconnect architecture. FIGS. 5A-5C illustrate simplified block diagrams of example links including one or more extension devices. FIGS. 6A-6B illustrate simplified block diagrams of systems including example extension devices. FIG. 7 illustrates a simplified block diagram of an example retimer device. FIGS. 8A-8B illustrate examples of systems including example retimer devices equipped with error detection logic. FIG. 9 is a representation of an example event register. FIG. 10 is a simplified block diagram of an example retimer. FIGS. 11A-11B are flow diagrams illustrating handling of errors detected by example extension devices. FIG. 12 is a flowchart illustrating an example technique involving error detection at extension devices. FIG. 13 illustrates an embodiment of a block diagram for a computing system including a multicore processor. FIG. 14 illustrates an embodiment of a block for a computing system including multiple processors. DETAILED DESCRIPTION In the following description, numerous specific details are set forth, such as examples of specific types of processors and system configurations, specific hardware structures, specific architectural and micro architectural details, specific register configurations, specific instruction types, specific system components, specific measurements/heights, specific processor pipeline stages and operation etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that these specific details need not be employed to practice the present invention. In other instances, well known components or methods, such as specific and alternative processor architectures, specific logic circuits/code for described algorithms, specific firmware code, specific interconnect operation, specific logic configurations, specific manufacturing techniques and materials, specific compiler implementations, specific expression of algorithms in code, specific power down and gating techniques/logic and other