US-12619358-B1 - System and method for memory access distribution
Abstract
A system for accessing memory, comprising: transformation circuitry configured to: receive a memory access request; access a transformation mode value associated with the memory access request and indicative of an address transformation function; apply the address transformation function, indicated by the transformation mode value, to a memory address of the memory access request to compute a transformed memory address; and generate a new memory access request using the memory access request and the transformed memory address; and at least one memory area configured to serve the new memory access request according to the transformed memory address.
Inventors
- Elad RAZ
- ILAN TAYARI
- Dan SHECHTER
- Yoav Lossin
- Ronen Gal
- Daniel Greenspan
Assignees
- NEXT SILICON LTD
Dates
- Publication Date
- 20260505
- Application Date
- 20250613
Claims (20)
- 1 . A system for accessing memory, comprising: transformation circuitry configured to: receive a memory access request; access a transformation mode value associated with the memory access request and indicative of an address transformation function of a plurality of address transformation functions, using different address transformation functions for different memory access requests; apply the address transformation function, indicated by the transformation mode value, to a memory address of the memory access request to compute a transformed memory address; and generate a new memory access request using the memory access request and the transformed memory address; and at least one memory area configured to serve the new memory access request according to the transformed memory address.
- 2 . The system of claim 1 , wherein generating the new memory access request comprises replacing the memory address with the transformed memory address in the memory access request.
- 3 . The system of claim 1 , further comprising map selection circuitry configured to: determine the transformation mode value based on the memory access request; and provide the transformation circuitry with an association between the transformation mode value and the memory access request.
- 4 . The system of claim 3 , wherein the map selection circuitry is further configured to insert the transformation mode value into the memory access request; wherein the map selection circuitry provides the association between the transformation mode value and the memory access request by inserting the transformation mode value into the memory access request; and wherein the transformation circuitry accesses the transformation mode value by extracting the transformation mode value from the memory access request.
- 5 . The system of claim 3 , wherein the system further comprises: a plurality of processing cores connected to a plurality of memory areas, wherein the at least one memory area is a member of the plurality of memory areas; and at least one hardware processor; wherein the map selection circuitry further comprises a mapping storage for storing at least one association between one or more memory access parameters and another transformation mode value; wherein the map selection circuitry and the transformation circuitry are associated with at least one processing core of the plurality of processing cores; wherein the map selection circuitry determining the transformation mode value based on the memory access request comprises: computing at least one additional memory access parameter according to the memory access request; and accessing the transformation mode value in the mapping storage according to the at least one additional memory access parameter, and wherein the at least one hardware processor is configured to: configure the at least one association in the mapping storage of the map selection circuitry; and configure the address transformation function in the map selection circuitry.
- 6 . The system of claim 5 , wherein the at least one processing core is configured to execute a software application; wherein executing the software application comprises issuing a plurality of memory access requests; and wherein configuring the address transformation function comprises configuring the map selection circuitry to use for each of the plurality of memory access requests an identified transformation mode value as the transformation mode value used to generate the new memory access request therefor.
- 7 . The system of claim 5 , further comprising at least one page table associated with the at least one processing core, the at least one page table comprising a plurality of page table entries (PTEs); wherein the at least one processing core is configured to execute a software application; wherein application memory of the software application is organized in a plurality of application memory pages, each of the plurality of application memory pages mapped to at least one of the plurality of memory areas via at least one PTE of the plurality of PTEs; wherein executing the software application comprises issuing a plurality of memory access requests to the plurality of application memory pages; and wherein configuring the address transformation function comprises: for each application memory page of the plurality of application memory pages, storing in the at least one FTE mapping the application memory page a page-specific transformation mode value indicative of one of the plurality of address transformation functions; and configuring the map selection circuitry to use the page-specific transformation mode value of a PTE of the plurality of PTEs as the transformation mode value when the memory address is in the at least one application memory page mapped by the PTE.
- 8 . The system of claim 7 , wherein the address transformation function indicated by the page-specific transformation mode value is selected based on at least one memory access metric of the application memory page.
- 9 . The system of claim 5 , wherein configuring the address transformation function in the map selection circuitry comprises adding to the mapping storage an association between at least one memory access parameter and the transformation mode value, where the at least one memory access parameter is computed using the memory access request.
- 10 . The system of claim 5 , wherein the at least one processing core is configured to execute a software application; wherein application memory of the software application is organized in a plurality of application memory pages; and wherein each of the plurality of address transformation functions defines a distribution pattern of an application memory page of the plurality of application memory pages across the plurality of memory areas.
- 11 . The system of claim 10 , wherein each memory area of the plurality of memory areas is associated with one or more memory storage entities of a plurality of memory storage entities, the one or more memory storage entities storing at least part of the memory area, wherein the plurality of memory storage entities is one of: a plurality of cache bins or a plurality of random access memory banks; and wherein the distribution pattern of the application memory page across the plurality of memory areas corresponds to a pattern of distribution of a plurality of application memory addresses of the application memory page across the plurality of memory storage entities.
- 12 . The system of claim 10 , wherein each memory area of the plurality of memory areas is associated with one or more memory access entities of a plurality of memory access entities, the one or more memory access entities controlling access to the memory area, wherein the plurality of memory access entities is one of: a plurality of HBM controllers, a plurality of cache controllers, or a plurality of double data rate (DDR) controllers; and wherein the distribution pattern of the application memory page across the plurality of memory areas corresponds to another pattern of distribution of a plurality of application memory addresses of the application memory page across the plurality of memory access entities.
- 13 . The system of claim 5 , wherein the at least one processing core is configured to execute a software application; wherein application memory of the software application is organized in a plurality of application memory pages; and wherein configuring the address transformation function further comprises: collecting a plurality of memory access statistical values during execution of the software application; and configuring the address transformation function based on the plurality of memory access statistical values.
- 14 . The system of claim 13 , further comprising: a telemetry collector configured to collect the plurality of memory access statistical values; and a memory manager configured to select the address transformation function for each application memory page based on the collected memory access statistics; wherein the memory manager is further configured to dynamically modify the address transformation function during execution of the software application responsive to changes in the plurality of memory access statistical values.
- 15 . The system of claim 5 , wherein the plurality of processing cores are implemented in a reconfigurable processing grid comprising a plurality of reconfigurable logical elements connected via a plurality of reconfigurable routing junctions; and wherein the reconfigurable processing grid comprises at least some of the plurality of memory areas.
- 16 . The system of claim 1 , wherein the transformation circuitry further comprises another mapping storage for storing at least one other association between one or more other memory access parameters and yet another transformation mode value; and wherein the transformation circuitry is further configured to: compute at least one other additional memory access parameter according to the new memory access request; and access the transformation mode value in the other mapping storage according to the at least one other additional memory access parameter; and wherein the system further comprises at least one other hardware processor configured to configure the at least one other association in the other mapping storage of the transformation circuitry.
- 17 . The system of claim 1 , wherein applying the address transformation function comprises: determining a linear page address for the memory address, where the linear page address represents a unique sequential range of memory addresses before any distribution transformation is applied; and applying a pattern-specific transformation to address bits within a page of memory according to the transformation mode value.
- 18 . The system of claim 1 , wherein the address transformation function comprises a bit transposition operation applied to at least part of the memory address.
- 19 . The system of claim 1 , wherein the memory access request is to an application memory page; wherein the system further comprises: at least one processing core configured to execute a software application, wherein application memory of the software application is organized in a plurality of application memory pages; map selection circuitry comprising a mapping storage for storing at least one association between one or more memory access parameters and another transformation mode value, indicative of one of the plurality of address transformation functions, the map selection circuitry configured to: receive the memory access request to the application memory page; compute at least one additional memory access parameter according to the memory access request; access a page-specific transformation mode value in the mapping storage according to the at least one additional memory access parameter, wherein the page-specific transformation mode value is associated with the application memory page; and provide an association between the page-specific transformation mode value and the memory access request; and wherein the transformation circuitry is further configured to: access the page-specific transformation mode value provided by the map selection circuitry as the transformation mode associated with the memory access request; and apply the respective address transformation function, indicated by the page-specific transformation mode value, as the address transformation function.
- 20 . A method for accessing memory, comprising: receiving a memory access request; accessing a transformation mode value associated with the memory access request and indicative of an address transformation function of a plurality of address transformation functions, using different address transformation functions for different memory access requests; applying the address transformation function, indicated by the transformation mode value, to a memory address of the memory access request to compute a transformed memory address; generating a new memory access request using the memory access request and the transformed memory address; and serving the new memory access request by at least one memory area according to the transformed memory address.
Description
BACKGROUND Some embodiments described in the present disclosure relate to memory architecture in computing systems and, more specifically, but not exclusively, to dynamically distributing memory access operations across multiple physical memory areas. In modem computing systems, the performance gap between the system's memory sub-system and the system's compute units that generate and process memory operations continues to grow. This gap presents a significant challenge to system performance as memory access can become a bottleneck for overall system operation. To combat this performance disparity, computing systems typically employ multiple independent physical memory entities such as cache elements (cache-bins), High Bandwidth Memory (HBM) channels, memory controllers, and multiple HBM memory stacks with their internal subdivisions into bank-groups and banks. As used herein, the term “memory area” refers to any physical memory region to which application memory is mapped and can service a memory operation. A memory area may be stored in full or in part in a memory storage entity, for example a cache, a cache bin, a scratchpad and a random access memory bank, for example a HBM bank, a dynamic random-access memory (DRAM) and a synchronous dynamic random-access memory (SDRAM). A scratchpad may be a static random access memory (SRAM). A memory area may be accessed via one or more memory access entities that control access to the memory area, for example a HBM controller, a cache controller, a double data rate (DDR) controller and a DRAM controller providing access to a memory component. Some examples of memory areas include L1 cache associated with a specific processing core, shared L2 or L3 cache distributed across multiple processing cores, HBM memory banks within a memory stack, random access memory (RAM) components, and DRAM controllers providing access to off-chip memory. Distributing memory accesses across multiple physical memory areas is critical for modem high-performance computing systems for several reasons. First, it improves the total available memory bandwidth by allowing multiple memory operations to be processed in parallel across different memory areas. In addition, it reduces memory access latency by minimizing contention for any single memory area, thereby decreasing queuing delays. Furthermore, it enables more efficient utilization of cache resources by allowing programs to potentially use more cache capacity than would be available in a single cache area. In addition, it helps balance the load across memory resources, preventing any single component from becoming a performance bottleneck while others remain underutilized. Some memory controllers apply a global address transformation function, also referred to as “scrambling function,” to distribute outgoing memory requests across the available independent memory areas. As used herein, the term “address transformation function” refers to a function that maps a linear range of memory addresses to one or more physical memory areas. For brevity, henceforth the terms “address transformation function,” “transformation function,” and “distribution function” are used interchangeably. SUMMARY It is an object of some embodiments described in the present disclosure to provide a system and a method for distributing memory accesses in a computerized system. In such embodiments transformation mode values are used to identify address transformation functions, and each memory access request is associated with a transformation mode value that is indicative of an address transformation function for the memory access request. In such embodiments, transformation circuitry is configured to receive a memory access request and access a transformation mode value that is associated with the memory access request and apply the address transformation function that is indicated by the transformation mode value to an address of the memory access request such that a memory area may serve a new memory access request according to the transformed memory address. Using a transformation mode value associated with the memory access request allows using different address transformation functions for different memory access requests, allowing more control over distribution of all the memory access requests in the computerized system. Controlling the distribution of memory access requests in the computerized system facilitates reducing overall latency of the memory access requests and additionally or alternatively reducing latency of a subset of the memory access requests and additionally or alternatively improving one or more memory access metrics in the computerized system, thus improving overall performance of the computerized system by one or more performance metrics. The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures. According to a fir