CN-122003672-A - Prefetching using a direct memory access engine
Abstract
A processing system (100) includes one or more DMA engines (150, 160) that load data from a memory (140) or another cache location without storing the data after the data is loaded. As data propagates through caches ("intermediate caches") located between the memory or other cache locations storing the requested data, the data is selectively copied to these intermediate caches based on a cache replacement policy. Rather than the DMA engine manually storing data into these intermediate caches, the cache replacement policies (512) of these intermediate caches determine whether the data is copied into each respective cache and the replacement priority of the data. By bypassing the stored data, the DMA engine enables prefetching of these intermediate caches without consuming unnecessary bandwidth or searching memory locations to store the data, thereby reducing latency and saving energy.
Inventors
- Vidhianantan Kaliana Sundaram
- Christopher J. Brennan
- Joseph Grethelhaus
- MARK FOWLER
Assignees
- 超威半导体公司
Dates
- Publication Date
- 20260508
- Application Date
- 20240614
- Priority Date
- 20231113
Claims (15)
- 1. A method, the method comprising: Loading data from a memory in response to a request from a Direct Memory Access (DMA) engine associated with a processor; Selectively copying the data to one or more caches of the processor between the DMA engine and the memory, and Bypassing storing the data for the DMA engine.
- 2. The method of claim 1, wherein the request is a prefetch request.
- 3. The method of claim 1 or claim 2, the method further comprising: A command is received at the DMA engine to send the request to load the data without storing the data.
- 4. A method according to claim 3, wherein the command indicates a priority of the request.
- 5. The method of any one of claims 1-4, wherein the memory is one of a system memory or a second cache of the processor.
- 6. The method of any of claims 1-5, wherein selectively copying the data to the one or more caches is based on a cache replacement policy.
- 7. The method of claim 6, wherein the cache replacement policy determines the one or more caches into which the data is copied.
- 8. The method of claim 6, wherein the cache replacement policy determines a replacement priority for the data.
- 9. A processing system, the processing system comprising: A processor; A memory; a direct memory access (DMA engine) configured to load data from the memory, and One or more cache controllers configured to selectively copy the data to one or more caches of the processor located between the DMA engine and the memory, wherein the DMA engine is further configured to bypass storing the data.
- 10. The processing system of claim 9, wherein the DMA engine is further configured to bypass storing the data in response to receiving a command to send a request to load the data without storing the data.
- 11. The processing system of claim 10, wherein the request is a prefetch request.
- 12. The processing system of claim 10 or claim 11, wherein the command indicates a priority of the command.
- 13. The processing system of any of claims 9 to 12, wherein the memory is one of a system memory or a second cache of the processor.
- 14. The processing system of any of claims 9 to 13, wherein selectively copying the data to the one or more caches is based on a cache replacement policy.
- 15. The processing system of claim 14, wherein the cache replacement policy determines at least one of the one or more caches into which the data is copied and a replacement priority for the data.
Description
Prefetching using a direct memory access engine Background A system Direct Memory Access (DMA) engine is a hardware device that coordinates direct memory access data transfers between devices within a computer system (e.g., input/output interfaces and a display controller) and memory or between different locations in memory. The DMA engine is typically located on a processor such as a Central Processing Unit (CPU) or an acceleration processing unit and receives commands from an application running on the processor. Based on the command, the DMA engine reads data from a DMA source (e.g. a first memory buffer defined in memory) and writes the data to a DMA destination (e.g. a second buffer defined in memory). Disclosure of Invention In embodiments described herein, techniques are provided for a DMA engine to load data from memory or another cache location without storing the data after loading the data. In one example embodiment, a method may include loading data from a memory in response to a request from a Direct Memory Access (DMA) engine associated with a processor, selectively copying the data to one or more caches of the processor between the DMA engine and the memory, and bypassing storing the data for the DMA engine. In some embodiments, the request is a prefetch request. In some embodiments, the method further includes receiving a command at the DMA engine to send the request to load the data without storing the data. The command may indicate a priority of the request. The memory may be one of a system memory or a second cache of the processor. In some embodiments, selectively copying the data to the one or more caches is based on a cache replacement policy. In addition, the cache replacement policy may determine the one or more caches into which the data is copied. In some embodiments, the cache replacement policy determines a replacement priority for the data. In another example embodiment, a processing system includes a processor, a memory, a direct memory access (DMA engine) configured to load data from the memory, and one or more cache controllers configured to selectively copy the data to one or more caches of the processor between the DMA engine and the memory, wherein the DMA engine is further configured to bypass storing the data. The DMA engine may be further configured to bypass storing the data in response to receiving a command to send a request to load the data without storing the data. In some embodiments, the request is a prefetch request. In some embodiments, the command indicates a priority of the command. The memory may be one of a system memory or a second cache of the processor. Further, in some embodiments, selectively copying data to one or more caches is based on a cache replacement policy. The cache replacement policy may determine at least one of the one or more caches into which the data is copied and a replacement priority for the data. Drawings The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items. FIG. 1 is a block diagram of a processing system including a Direct Memory Access (DMA) engine configured to prefetch data to one or more caches, according to some embodiments. FIG. 2 is a block diagram of a processing system illustrating a DMA engine sending a prefetch request to system memory for data and a cache between the system memory and the DMA engine selectively copying data based on a cache replacement policy, according to some embodiments. FIG. 3 is a block diagram illustrating a computing unit sending a prefetch command to a DMA engine and the DMA engine sending a prefetch request based on the command, according to some embodiments. FIG. 4 is a block diagram of a portion of a processing system illustrating a prefetch request from a DMA engine propagating via a PCIe bus to a system memory and an intermediate cache controller that selectively replicates requested data based on cache replacement policies, according to some embodiments. FIG. 5 illustrates a cache controller that selects where to insert data prefetched by a DMA engine in a replacement chain of a cache based on a cache replacement policy, according to some embodiments. FIG. 6 is a flow diagram illustrating a method of prefetching data using a DMA engine according to some embodiments. Detailed Description Conventional processors include one or more DMA engines to read and write blocks of data stored in system memory. The DMA engine relieves the processor core of the burden of managing the transfer. In response to a data transfer request from a processor core, the DMA engine provides the necessary control information to the corresponding source and destination so that the data transfer operation can be performed without delaying the computation code, allowing the communications and computation