CN-115004163-B - Apparatus and method for managing packet transfer across a physical layer interface of a memory architecture
Abstract
An apparatus and method for managing packet transfer between memory fabrics having a physical layer interface from which incoming packets are received, the physical layer interface having a higher data rate than the physical layer interface of another device, wherein at least some of the packets include different instruction types. The apparatus and method determine a packet type of the incoming packet received from the memory fabric physical layer interface and when the determined incoming packet type is a type containing an atomic request, the method and apparatus communicate the incoming packet with the atomic request to memory access logic that accesses local memory within the apparatus in preference to other packet types of the incoming packet.
Inventors
- BLAGODUROV SERGEY
Assignees
- 超威半导体公司
- 超威半导体公司
Dates
- Publication Date
- 20260421
- Application Date
- 20201002
- Priority Date
- 20191203
Claims (20)
- 1. A method for managing packet transmissions by an apparatus, the method comprising: receiving, by a memory fabric physical layer interface, incoming packets for a local memory within the device, the local memory configured to fabric-attached memory and addressable by other devices, at least some of the packets including different instruction types, the memory fabric physical layer interface having a higher data rate than a data rate of a physical layer interface of the device; Determining, by a controller of the apparatus, a packet type of the incoming packet received from the memory fabric physical layer interface over the physical layer interface, and When the determined type of incoming packet from the memory fabric physical layer interface is a type containing an atomic request, queuing incoming packets determined to contain the atomic request in a first priority buffer, queuing incoming packets determined to contain other types of requests in one or more priority buffers that are lower in priority relative to the first priority buffer, and the controller transferring the incoming packets with the atomic request to memory access logic that accesses the local memory within the device in preference to other packet types of incoming packets.
- 2. The method of claim 1, wherein prioritizing the incoming packet with the atomic request over other packet type transmissions of incoming packets comprises: queuing incoming packets determined to contain the atomic request in the first priority buffer; Queuing other packet types in a second priority buffer, and Prioritizing the output of packets from the first priority buffer over the output of packets from the second priority buffer.
- 3. The method of claim 2, the method further comprising: Accessing data defining at least a plurality of memory regions of a local memory of the device as priority memory regions, wherein each memory region allows an unrestricted maximum number of memory architecture physical layer interface accesses per time interval; Maintaining a count of a number of memory accesses that have been made to a defined memory region through the memory fabric physical layer interface within the time interval; Storing read packets in the second priority buffer when the maximum number of allowed memory architecture physical layer interface accesses is exceeded, and The stored packet is provided from the second priority buffer to the memory access logic in a next time interval.
- 4. The method of claim 1, wherein the prioritizing the incoming packet with the atomic request over other packet type transmissions of incoming packets comprises queuing incoming packets determined to contain a store request in a buffer while providing the incoming packets with the atomic request to the memory access logic.
- 5. A method as claimed in claim 3, the method further comprising: Allocating the second priority buffer to include a plurality of second priority buffers, wherein each of the plurality of second priority buffers corresponds to a different defined memory region, and Incoming packets determined to contain a type of read request are stored in respective second priority buffers corresponding to the different defined memory regions based on addresses associated with the incoming packets.
- 6. An apparatus, the apparatus comprising: One or more processors; memory access logic operatively coupled to the one or more processors; A local memory within the apparatus, the local memory operatively coupled to the memory access logic and configurable as an addressable portion of memory that is addressable as a fabric attached memory and that is addressable by other devices through a memory fabric physical layer interface; A physical layer interface operatively coupled to the memory access logic and operative to receive incoming packets from the memory fabric physical layer interface for the local memory configured as fabric attached memory, the memory fabric physical layer interface having a higher data rate than a data rate of the physical layer interface, at least some of the packets comprising different instruction types; a controller operatively coupled to the physical layer interface and configured to: determining a packet type of the incoming packet received from the memory fabric physical layer interface over the physical layer interface, and When the determined type of incoming packet is a type containing an atomic request, queuing incoming packets determined to contain the atomic request in a first priority buffer, queuing incoming packets determined to contain other types of requests in one or more priority buffers that are lower in priority relative to the first priority buffer, and transmitting the incoming packets with the atomic request to the memory access logic in preference to other packet types of incoming packets.
- 7. The apparatus of claim 6, the apparatus comprising: The first priority buffer and a second priority buffer having a priority lower than that of the first priority buffer, and Wherein the controller is further configured to: The incoming packet with the atomic request is prioritized over other packet types of incoming packets by queuing incoming packets determined to contain the atomic request in the first priority buffer; queuing other packet types in the second priority buffer, and Prioritizing the output of packets from the first priority buffer over the output of packets from the second priority buffer.
- 8. The apparatus of claim 6, the apparatus further comprising: A buffer operatively coupled to the controller, and Wherein the controller is further configured to prioritize the incoming packets with the atomic request over other packet types of incoming packets by queuing incoming packets determined to contain a store request in the buffer while providing the incoming packets with the atomic request to the memory access logic.
- 9. The apparatus of claim 7, the apparatus further comprising: A configuration register configured to include data defining at least a plurality of memory regions of the local memory of the device as priority memory regions, wherein each memory region allows an unrestricted maximum number of memory architecture physical layer interface accesses per time interval, and Wherein the controller is further configured to: Maintaining a count of a number of memory accesses that have been made to a defined memory region through the memory fabric physical layer interface within the time interval; Storing read packets in the second priority buffer when the maximum number of allowed memory architecture physical layer interface accesses is exceeded, and The stored packet is provided from the second priority buffer to the memory access logic in a next time interval.
- 10. The apparatus of claim 9, wherein: The second priority buffer includes a plurality of second priority buffers, wherein each of the plurality of second priority buffers corresponds to a different defined memory region, and The controller is further configured to store an incoming packet determined to contain a type of read request in a respective second priority buffer corresponding to the different defined memory region based on an address associated with the incoming packet.
- 11. The apparatus of claim 6, further comprising a memory fabric bridge circuit operatively coupled to the controller and the memory fabric physical layer interface and operative to pass packets between the physical layer interface and the memory fabric physical layer interface.
- 12. An apparatus, the apparatus comprising: A local memory operatively coupled to the memory access logic and configurable as an addressable portion of the memory addressable by the memory fabric physical layer interface; a physical layer interface operative to receive incoming packets from the memory fabric physical layer interface, the memory fabric physical layer interface having a higher data rate than a data rate of the physical layer interface, at least some of the packets comprising different instruction types; an incoming packet buffer structure comprising a hierarchically ordered priority buffer structure comprising at least a first priority buffer and a second priority buffer having a lower priority than the first priority buffer; A controller operatively coupled to the physical layer interface and the incoming packet buffer structure; wherein the controller is configured to: Determining a packet type of the incoming packet from the memory fabric physical layer interface, and When the determined packet type indicates that there is an atomic request in an incoming packet, storing the incoming packet in the first priority buffer, Storing the incoming packet in the second priority buffer when the determined packet type indicates that a load instruction is present in the incoming packet, and The stored incoming packets are provided to the memory access logic in a hierarchical order according to a priority buffer order.
- 13. The apparatus of claim 12, the apparatus further comprising: A buffer operatively coupled to the controller, and Wherein the controller is further configured to prioritize the incoming packets with the atomic request over other packet types of incoming packets by queuing incoming packets determined to contain a store request in a store buffer while providing the incoming packets with the atomic request to the memory access logic.
- 14. The apparatus of claim 13, the apparatus further comprising: A configuration register configured to include data defining at least a plurality of memory regions of a local memory of the device as priority memory regions, wherein each memory region allows an unrestricted maximum number of memory architecture physical layer interface accesses per time interval, and Wherein the controller is further configured to: Maintaining a count of a number of memory accesses that have been made to a defined memory region through the memory fabric physical layer interface within the time interval; storing packets in the second priority buffer when the maximum number of memory architecture physical layer interface accesses allowed is exceeded, and The stored packets in the second priority buffer are provided from the buffer to the memory access logic in a next time interval.
- 15. The apparatus of claim 12, wherein: The second priority buffer includes a plurality of second priority buffers, wherein each of the plurality of second priority buffers corresponds to a different defined memory region, and The controller is further configured to store incoming packets determined to contain a type of read request in respective second priority buffers corresponding to different defined memory regions based on addresses associated with the incoming packets.
- 16. A system, the system comprising: A memory architecture operative to interconnect a plurality of distributed non-volatile memories; A first device operatively coupled to the memory architecture, and A second device operatively coupled to the memory architecture, the first device and the second device having a physical layer interface to receive memory access requests from each other via the memory architecture, the second device comprising: a local memory operatively coupled to memory access logic of the second device, the local memory being configurable as an addressable portion of the distributed non-volatile memory addressable by the memory architecture; the physical layer interface operatively coupled to the memory access logic, operative to receive, from the first device, incoming packets addressed to the local memory of the second device through the memory fabric, the memory fabric having a higher data rate than a data rate of the physical layer interface, at least some of the packets comprising different instruction types; a controller operatively coupled to the physical layer interface and configured to: determining a packet type of the incoming packet received from the memory fabric, and When the determined type of incoming packet is of a type containing an atomic request, queuing the incoming packet in a first priority buffer, queuing incoming packets determined to contain other types of requests in one or more priority buffers having a lower priority relative to the first priority buffer, and transmitting the incoming packet with the atomic request to the memory access logic in preference to other packet types of incoming packets.
- 17. The system of claim 16, wherein: the second apparatus includes the first priority buffer and a second priority buffer having a lower priority than the first priority buffer, and The controller is further configured to: The incoming packet with the atomic request is prioritized over other packet types of incoming packets by queuing incoming packets determined to contain the atomic request in the first priority buffer; queuing other packet types in the second priority buffer, and Prioritizing the output of packets from the first priority buffer over the output of packets from the second priority buffer.
- 18. The system of claim 16, wherein: The second apparatus includes a buffer operatively coupled to the controller and the controller is further configured to prioritize the incoming packets with the atomic request over other packet types of incoming packets by queuing incoming packets determined to contain a store request in the buffer while providing the incoming packets with the atomic request to the memory access logic.
- 19. The system of claim 17, wherein the second means comprises: A configuration register configured to include data defining at least a plurality of memory regions of a local memory of the device as priority memory regions, wherein each memory region allows an unrestricted maximum number of memory architecture accesses per time interval, and Wherein the controller is further configured to: maintaining a count of a number of memory accesses that have been made to a defined memory region by the memory architecture within the time interval; Storing read packets in said second priority buffer when said maximum number of allowed memory fabric accesses is exceeded, and The stored packet is provided from the second priority buffer to the memory access logic in a next time interval.
- 20. The system of claim 19, wherein: The second priority buffer includes a plurality of second priority buffers, wherein each of the plurality of second priority buffers corresponds to a different defined memory region, and The controller is further configured to store incoming packets determined to contain a type of read request in respective second priority buffers corresponding to different defined memory regions based on addresses associated with the incoming packets.
Description
Apparatus and method for managing packet transfer across a physical layer interface of a memory architecture Government licensing rights The present invention was carried out under government support under item PathForward (basic contract number DE-AC52-07NA27344 subcontracting contract number B620717) from the florisil fromo national security agency granted by the united states department of energy (DOE). The government has certain rights in this invention. Background Systems are being employed that use a memory-semantic architecture that extends a Central Processing Unit (CPU) memory byte-addressable load-store model to an entire system, such as a data center. A memory fabric is a type of point-to-point communication switch (also known as a Gen-Z fabric) that is external to a processor system-on-a-chip (SoC), a media module, and other types of devices that allow the device to interface with a pool of external memory modules through the memory fabric in a system such as a data center. For example, some processor socs include a processor that includes a plurality of processing cores that communicate with local memory, such as Dynamic Random Access Memory (DRAM) or other suitable memory, via local memory access logic, such as a data architecture. Processor socs and other devices also require interfacing with a memory fabric to use a Fabric Attached Memory (FAM) module, which may be, for example, external (e.g., non-local) memory directly attached to the data center memory fabric. In some systems, the FAM module has memory access logic to handle load and store requests, but has little or no computing power. Furthermore, the memory architecture attaches the FAM module as an addressable portion of the entire main memory. The FAM module use case implements a split memory pool in the cloud data center. In the presence of the FAM module, the host is not constrained by the memory capacity limitations of the local server. Instead, the host gains access to a large pool of memory that is not attached to any host. The hosts coordinate to partition memory among themselves or share FAM modules. The Gen-Z architecture has become a high-performance, low-latency memory-semantic architecture that can be used to communicate with each device in the system. There is a need for improved apparatus and methods for managing traffic across a physical layer interface of a memory fabric employing the fabric to attach memory. Drawings Implementations will be more readily understood from the following description, with the accompanying drawings, in which like reference numerals designate like elements, and in which: FIG. 1 is a block diagram illustrating a system employing an apparatus for managing packet transfer across a physical layer interface with a memory architecture in accordance with one example set forth in the disclosure; FIG. 2 is a flow chart illustrating a method for managing packet transfers by a device coupled to a memory architecture physical layer interface according to one example set forth in the present disclosure; Fig. 3 is a block diagram illustrating an apparatus for managing packet transmissions according to one example set forth in the present disclosure; FIG. 4 is a flow chart illustrating a method for managing packet transfers by a device coupled to a memory architecture physical layer interface according to one example set forth in the present disclosure; FIG. 5 is a block diagram illustrating an apparatus for managing packet transmissions in accordance with one example set forth in the disclosure and Fig. 6 is a flow chart illustrating a method for managing packet transfers by a device coupled to a memory architecture physical layer interface according to one example set forth in the present disclosure. Detailed Description Traffic bottlenecks may occur in the memory architecture. The physical layer interface of the memory fabric, also referred to as a memory fabric physical layer (PHY) interface, has higher performance operations than physical layer interfaces associated with a system on a chip (e.g., a host SoC) or other devices connected to the memory fabric PHY interface. For example, the signaling standard for and messaging through the memory fabric to enable access to FAMs may be about 56Gt/s as compared to using peripheral component interconnects on the SoC, such as 16 or 32Gt/s for PCIe interfaces. In addition, the link width for the memory architecture is also designed to be larger. Some current processor SoC devices that interface with the PCI-e bus use first-in-first-out (FIFO) buffers to queue packet traffic, however, differences in data rate and link width across PHY interfaces, such as the PCI-e physical layer (PHY) interface to the memory fabric PHY interface, remain potential bottlenecks for packet traffic. In some implementations, a device serves as an interface to manage traffic priorities at connection points between multiple physical layer interfaces, such as between a PCIe PHY interface