Search

EP-4526775-B1 - EFFICIENTLY STRIPING ORDERED PCIE WRITES ACROSS MULTIPLE SOCKET-TO-SOCKET LINKS

EP4526775B1EP 4526775 B1EP4526775 B1EP 4526775B1EP-4526775-B1

Inventors

  • GEETHA, VEDARAMAN
  • PFLEDERER, KEITH ROBERT

Dates

Publication Date
20260506
Application Date
20230331

Claims (15)

  1. A local socket (110) comprising: one or more local input-output, IO, coherence request nodes, RNI, (125) configured to communicate with corresponding one or more IO agents (120), the one or more local RNIs comprising a local RNI configured to communicate with an IO agent; and a plurality of local socket-to-socket, S2S, bridges configured to communicate with corresponding remote S2S bridges of a remote socket, the plurality of local S2S bridges comprising a first local S2S bridge (145) configured to communicate with a first remote S2S bridge (185), wherein the local RNI is configured to send (610), on behalf of the IO agent, a first remote write transaction for a first address in an intra-socket protocol to the first local S2S bridge, wherein the first local S2S bridge is configured to: send (620) a first address coherency request in an inter-socket protocol to a first remote S2S bridge of the remote socket upon receiving the first remote write transaction, the first address coherency request being a request to clean all coherent copies of the first address out of one or more caches and written to memory; receive (630) a first completion response in the inter-socket protocol from the first remote S2S bridge subsequent to sending the first address coherency request, the first completion response being a response indicating that a coherency for the first address is completed; and send (640) a first combination response in the intra-socket protocol to the local RNI upon receiving the first completion response, the first combination response indicating that the coherency for the first address is completed and a buffer is ready to receive a first data for the first address, wherein the local RNI is further configured to send (650) the first data for the first address in the intra-socket protocol to the first local S2S bridge upon receiving the first combination response, and wherein the first local S2S bridge is further configured to forward (660) the first data as a write transaction in the inter-socket protocol to the first remote S2S bridge upon receiving the first data.
  2. The local socket of claim 1, wherein the intra-socket protocol is a Coherent Hub Interconnect, CHI, protocol, wherein the inter-socket protocol is a Cache Coherent Interconnect for Accelerators, CCIX, protocol, or both.
  3. The local socket of claim 2, wherein the first remote write transaction is a CHI WriteUnique or a CHI WriteNoSnp for the first address, wherein the first address coherency request is a CCIX CleanUnique transaction for the first address, wherein the first completion response is a CCIX Comp transaction for the first address, wherein the first combination response is a CHI CompDBID transaction for the first address, wherein the first data is sent in a CHI NCBWData transaction for the first address, wherein the first data is forwarded in a CCIX WriteBack+Data transaction for the first address, or any combination thereof.
  4. The local socket of claim 1, wherein the one or more local RNIs and the plurality of local S2S bridges are comprised on a single die or a system-on-chip, SoC.
  5. The local socket of claim 4, wherein the single die or SoC also comprises the one or more IO agents.
  6. The local socket of claim 4, wherein the local socket and the remote socket are separate dies or SoCs.
  7. The local socket of claim 1, wherein none of the one or more local RNIs comprises any hardware-coherent cache.
  8. The local socket of claim 7, further comprising: one or more local fully coherent request nodes, RNF, (135) configured to communicate with one or more core agents (130), wherein the one or more local RNFs do comprise one or more hardware coherent caches.
  9. The local socket of claim 1, wherein the plurality of local S2S bridges further comprises a second local S2S bridge configured to communicate with a second remote S2S bridge, wherein the local RNI is configured to send (615), on behalf of the IO agent, a second remote write transaction for a second address in the intra-socket protocol to the second local S2S bridge, wherein the second local S2S bridge is configured to: send (625) a second address coherency request in the inter-socket protocol to the second remote S2S bridge upon receiving the second remote write transaction, the second address coherency request being a request to clean all coherent copies of the second address out of one or more caches and written to memory; receive (635) a second completion response in the inter-socket protocol from the second remote S2S bridge subsequent to sending the second address coherency request, the second completion response being a response indicating that a coherency for the second address is completed; and send (645) a second combination response in the intra-socket protocol to the local RNI upon receiving the second completion response, the second combination response indicating that the coherency for the second address is completed and a buffer is ready to receive a second data for the second address, wherein the local RNI is further configured to send (655) the second data for the second address in the intra-socket protocol to the second local S2S bridge upon receiving the second combination response, and wherein the second local S2S bridge is further configured to forward (665) the second data in the inter-socket protocol to the second remote S2S bridge upon receiving the second data.
  10. The local socket of claim 9, wherein the local RNI sends the second remote write transaction after sending the first remote write transaction, and wherein if the local RNI receives the second combination response before the first combination response, the local RNI is configured to wait until the first combination response is received, send the first data for the first address to the first local S2S bridge after receiving the first combination response, and send the second data for the second address to the second local S2S bridge after sending the first data to the first local S2S bridge.
  11. The local socket of claim 9, wherein the intra-socket protocol is a Coherent Hub Interconnect, CHI, protocol, the inter-socket protocol is a Cache Coherent Interconnect for Accelerators, CCIX, protocol, or both, and wherein any combination of the following are true: the second remote write transaction is a CHI WriteUnique or a CHI WriteNoSnp for the second address, the second address coherency request is a CCIX CleanUnique transaction for the second address, the second completion response is a CCIX Comp transaction for the second address, the second combination response is a CHI CompDBID transaction for the second address, the second data is sent in a CHI NCBWData transaction for the second address, and the second data is forwarded in a CCIX WriteBack+Data transaction for the second address.
  12. The local socket of claim 1, wherein the local socket is suitable for being incorporated into an apparatus selected from the group consisting of a music player, a video player, an entertainment unit, a navigation device, a communications device, a mobile device, a mobile phone, a smartphone, a personal digital assistant, a fixed location terminal, a tablet computer, a computer, a wearable device, an Internet of things, IoT, device, a laptop computer, a server, and a device in an automotive vehicle.
  13. A method (600) for operating a local socket, the method comprising: sending (610), by a local input-output, IO, coherence request node, RNI, on behalf of an IO agent, a first remote write transaction for a first address in an intra-socket protocol to a first local S2S bridge, the local RNI being one of one or more local RNIs of the local socket configured to communicate with corresponding one or more IO agents, and the first local S2S bridge being one of a plurality of local S2S bridges of the local socket configured to communicate with corresponding remote S2S bridges of a remote socket; sending (620), by the first local S2S bridge, a first address coherency request in an inter-socket protocol to a first remote S2S bridge upon receiving the first remote write transaction, the first address coherency request being a request to clean all coherent copies of the first address out of one or more caches and written to memory; receiving (630), by the first local S2S bridge, a first completion response in the inter-socket protocol from the first remote S2S bridge subsequent to sending the first address coherency request, the first completion response being a response indicating that a coherency for the first address is completed; sending (640), by the first local S2S bridge, a first combination response in the intra-socket protocol to the local RNI upon receiving the first completion response, the first combination response indicating that the coherency for the first address is completed and a buffer is ready to receive a first data for the first address; sending (650), by the local RNI, the first data for the first address in the intra-socket protocol to the first local S2S bridge upon receiving the first combination response; and forwarding (660), by the first local S2S bridge, the first data as a write transaction in the inter-socket protocol to the first remote S2S bridge upon receiving the first data.
  14. The method of claim 13, wherein the intra-socket protocol is a Coherent Hub Interconnect, CHI, protocol, wherein the inter-socket protocol is a Cache Coherent Interconnect for Accelerators, CCIX, protocol, or both.
  15. The method of claim 14, wherein the first remote write transaction is a CHI WriteUnique or a CHI WriteNoSnp for the first address, wherein the first address coherency request is a CCIX CleanUnique transaction for the first address, wherein the first completion response is a CCIX Comp transaction for the first address, wherein the first combination response is a CHI CompDBID transaction for the first address, wherein the first data is sent in a CHI NCBWData transaction for the first address, wherein the first data is forwarded in a CCIX WriteBack+Data transaction for the first address, or any combination thereof.

Description

FIELD OF DISCLOSURE This disclosure relates generally to efficiently striping writes across socket-to-socket (S2S) links. The writes may be peripheral component interconnect (PCI) enhanced (PCIe) ordered writes. BACKGROUND Peripheral component interconnect (PCI) enhanced (PCIe) writes from a same source often must be completed in order. This implies that any agents in a system should not be able to see stale value of address A while it is able to see an updated value of address B, where address B is younger than address A as issued by the PCIe agent. This is referred to as ordered write observation (OWO). Maintaining OWO within a single chip or socket is not much of a concern since in-socket latency is typically very short. However, when exchanging data among separate sockets, maintaining OWO can be problematic in that inter-socket latencies can be relatively long. Thus, when data is exchanged among different chips or sockets, performance can degrade due to the long latencies. Accordingly, there is a need for systems, apparatus, and methods that overcome the deficiencies of conventional data exchange among separate sockets. Attention is drawn to US 2014/297967 A1 describing methods and apparatus relating to an inter-queue anti-starvation mechanism with dynamic deadlock avoidance in a retry based pipeline. Logic may arbitrate between two queues based on various rules. The queues may store data including local or remote requests, data responses, non-data responses, external interrupts, etc. SUMMARY The present invention is set forth in the independent claims, respectively. Further embodiments of the invention are described in the dependent claims. The following presents a simplified summary relating to one or more aspects and/or examples associated with the apparatus and methods disclosed herein. As such, the following summary should not be considered an extensive overview relating to all contemplated aspects and/or examples, nor should the following summary be regarded to identify key or critical elements relating to all contemplated aspects and/or examples or to delineate the scope associated with any particular aspect and/or example. Accordingly, the following summary has the sole purpose to present certain concepts relating to one or more aspects and/or examples relating to the apparatus and methods disclosed herein in a simplified form to precede the detailed description presented below. An exemplary local socket is disclosed. The local socket may comprise one or more local input-output (IO) coherence request nodes (RNI) configured to communicate with corresponding one or more IO agents. The one or more local RNIs may comprise a local RNI configured to communicate with an IO agent. The local socket may also comprise a plurality of local socket-to-socket (S2S) bridges configured to communicate with corresponding remote S2S bridges of a remote socket. The plurality of local S2S bridges may comprise a first local S2S bridge configured to communicate with a first remote S2S bridge. The local RNI may be configured to send, on behalf of the IO agent, a first remote write transaction for a first address in an intra-socket protocol to the first local S2S bridge. The first local S2S bridge may be configured to send a first address coherency request in an inter-socket protocol to a first remote S2S bridge of the remote socket upon receiving the first remote write transaction. The first address coherency request may be a request to clean all coherent copies of the first address out of one or more caches and written to memory. The first local S2S bridge may also be configured to receive a first completion response in the inter-socket protocol from the first remote S2S bridge subsequent to sending the first address coherency request. The first completion response may be a response indicating that a coherency for the first address is completed. The first local S2S bridge may further be configured to send a first combination response in the intra-socket protocol to the local RNI upon receiving the first completion response. The first combination response may indicate that the coherency for the first address is completed and a buffer is ready to receive a first data for the first address. The local RNI may also be configured to send the first data for the first address in the intra-socket protocol to the first local S2S bridge upon receiving the first combination response. The first local S2S bridge may yet further be configured to forward the first data as a write transaction in the inter-socket protocol to the first remote S2S bridge upon receiving the first data. An exemplary method of a local socket is disclosed. The method may comprise sending, by a local input-output (IO) coherence request node (RNI) on behalf of an IO agent, a first remote write transaction for a first address in an intra-socket protocol to a first local S2S bridge. The local RNI may be one of one or more local RNIs of the local socket configured