CN-122029511-A - Adaptive system probe actions for minimizing input/output dirty data transmissions

CN122029511ACN 122029511 ACN122029511 ACN 122029511ACN-122029511-A

Abstract

An adaptive system probe action for minimizing input/output dirty data transmissions is described. In one or more implementations, a system includes a processor, a memory configured to store data, and a cache configured to store a portion of the data stored in the memory for execution by the processor. The system also includes a cache coherence controller that includes a cache line history. The cache coherence controller is configured to detect direct memory access requests from an input/output device. The direct memory access request is associated with an input/output operation involving data. The cache coherence controller is further configured to identify a cache line associated with the direct memory access request and selectively communicate probes to the cache based on a state of the cache line in response to the cache line history record including a dirty data transfer record corresponding to the cache line.

Inventors

OU LI
Ganesh baralisnan
Amit Apt

Assignees

超威半导体公司

Dates

Publication Date: 20260512
Application Date: 20240614
Priority Date: 20231228

Claims (20)

1. A system, the system comprising: A processor; A memory configured to store data; a cache configured to store a portion of the data stored in the memory for execution by the processor, and A cache coherence controller, said cache coherence controller comprising a cache line history, the cache coherence controller is configured to: Identifying a cache line associated with a direct memory access request associated with an operation involving the data, and Probes are selectively transferred to the cache in response to the cache line history record including a dirty data transfer record corresponding to the cache line.
2. The system of claim 1, wherein the cache coherence controller is configured to selectively communicate the probe to the cache based on a state of the cache line, the state of the cache line comprising a modified state, an exclusive state, a shared state, or an invalid state.
3. The system of claim 2, wherein the cache coherence controller is further configured to transmit the probe to the cache to invalidate the cache line in response to determining that the state of the cache line is the exclusive state.
4. The system of claim 2, wherein the cache coherence controller is further configured to transmit the probe to transition the state of the cache line from a dirty shared state to a clean shared state in response to determining that the state of the cache line is the shared state.
5. The system of claim 1, wherein the processor comprises a first core and a second core, and wherein the cache comprises a first cache corresponding to the first core and a second cache corresponding to the second core.
6. The system of claim 5, wherein the first cache stores the portion of the data in the cache line.
7. The system of claim 5, wherein the first core transfers the portion of the data from the first cache to the second cache.
8. The system of claim 5, wherein the first core modifies the portion of the data to create a new portion of the data, the new portion of the data being different from the portion of the data.
9. The system of claim 8, wherein the cache coherence controller is further configured to update the cache line history to reflect that the portion of the data was modified before being transferred to the second cache.
10. The system of claim 1, wherein the operation involving the data comprises an input or output operation.
11. A cache coherence controller, the cache coherence controller comprising: A memory configured to store a cache line history record, and A hardware circuit configured to: identifying a cache line associated with a direct memory access request associated with an operation involving data stored in a cache, and Probes are selectively transferred to the cache in response to the cache line history record including a dirty data transfer record corresponding to the cache line.
12. The cache coherence controller of claim 11, wherein hardware circuitry comprises the memory.
13. The cache coherence controller of claim 11, wherein said hardware circuitry is configured to selectively communicate said probes to said cache based on states including a modified state, an exclusive state, a shared state, or an invalid state.
14. The cache coherence controller of claim 13, wherein the hardware circuit is further configured to transmit the probe to transition the state of the cache line from a dirty shared state to a clean shared state in response to determining that the state of the cache line is the shared state.
15. The cache coherence controller of claim 11, wherein the hardware circuit is further configured to update the cache line history to reflect that a portion of the data was modified before being transferred to the cache.
16. The cache coherence controller of claim 11, wherein the operations involving the data comprise input operations or output operations.
17. A method, the method comprising: Detecting, by a cache coherence controller, a direct memory access request from a direct memory access engine of an input/output device, the direct memory access request being associated with an input/output operation performed by the input/output device, and Responsive to detecting the direct memory access request, an adaptive algorithm is selectively performed by the cache coherence controller to ensure coherence of data between a plurality of caches and memory, the data being associated with the input/output operations.
18. The method of claim 17, wherein selectively executing, by the cache coherence controller, the adaptation algorithm comprises executing, by the cache coherence controller, the adaptation algorithm in response to the data being associated with a dirty data transfer from a first cache of the plurality of caches to a second cache of the plurality of caches.
19. The method of claim 18, wherein executing the adaptive algorithm by the cache coherence controller comprises determining whether a state of a cache line of the first cache is an exclusive state or a shared state.
20. The method of claim 19, the method further comprising: In response to determining that the state of the cache line of the first cache is the exclusive state, transmitting a probe to invalidate the cache line of the first cache, or In response to determining that the state of the cache line of the first cache is the shared state, a probe is transmitted to transition the state of the cache line of the first cache from a dirty shared state to a clean shared state.

Description

Adaptive system probe actions for minimizing input/output dirty data transmissions RELATED APPLICATIONS The present application claims priority from U.S. patent application Ser. No. 18/399,283, entitled "ADAPTIVE SYSTEM Probe Action to Minimize Input/Output DIRTY DATA TRANSFERS (adaptive system probe action for minimizing input/Output dirty data transmissions)" filed on month 28 of 2023, the entire disclosure of which is incorporated herein by reference in its entirety. Background Various operations of a computing system involve transferring data between memory (e.g., main memory) and other devices (e.g., peripheral devices). Advances in memory access technology continue to be sought to improve the performance of computing systems. One example advancement is Direct Memory Access (DMA), a technique for enhancing the efficiency of data transfer between a device and memory. DMA allows devices to directly access memory without intervention of a Central Processing Unit (CPU) rather than relying on the CPU for each data transfer. This greatly reduces the CPU workload and improves overall system performance. Drawings FIG. 1 is a block diagram depicting a non-limiting example system having a system on a chip (SoC), memory, and one or more I/O devices. FIG. 2 is a block diagram depicting a non-limiting example of a cache coherence controller and a cache. FIG. 3 is a block diagram depicting non-limiting example interactions between various components of the system depicted in FIG. 1. FIG. 4 is a flow chart depicting a procedure in a non-limiting example implementation of a DMA controller configured to perform operations to minimize I/O dirty data transfers. FIG. 5 is a flow chart depicting a procedure in a non-limiting example implementation of a cache coherence controller configured to selectively perform an adaptive algorithm. FIG. 6 is a flow chart depicting another procedure in a non-limiting example implementation of a cache coherence controller configured to perform operations to minimize I/O dirty data transmissions. Detailed Description SUMMARY In I/O workloads, where there is a large amount of interaction between the cores and the I/O devices, the complexity of such participation can be reduced to a simpler producer-consumer model. In this model, the producer generates I/O data and the consumer processes the I/O data using DMA buffers allocated into system memory, which serve as the primary interface for their interaction. For example, in applications that use a high-speed Network Interface Card (NIC) to send data, the core acts as a producer, prepares and writes network packets into a DMA buffer, and the NIC acts as a consumer to read packets from the DMA buffer and send the packets to the network. Within this interaction model, high performance I/O devices require the co-operation of multiple cores to generate or consume I/O data quickly enough to support the high I/O speeds that modern I/O devices can provide. In a standard I/O software stack, the same DMA buffers are shared among multiple cores that drive the same I/O device. Taking a transmission control protocol/internet protocol (TCP/IP) implementation as an example, a shared buffer pool is pre-allocated for each compute non-uniform memory access or "NUMA" node. All cores from the same compute NUMA node retrieve the buffer from the common pool before driving the NIC to send data and then return the buffer to the pool after the NIC has completed its transmission, resulting in the potential problem of high latency in transferring dirty data between cores and potentially affecting performance and I/O device capabilities. Under the current coherency protocol, certain operational considerations begin to appear when dirty data is handled within the context of a DMA read request. Conventional processes involve dispatching probes from a coherency point to a cache with the goal of fetching dirty data without causing a state change in the cache line—no-op action. Upon generation and subsequent consumption of I/O data by the core and I/O devices, respectively, the associated DMA buffers are re-integrated into the buffer pool by DMA reads, thereby becoming available for the impending I/O transaction. Upon consumption, modified cache lines residing in the original core cache become functionally redundant, indicating that their reservations do not directly serve subsequent computation or transaction processes. The systems and techniques disclosed herein change probe actions transferred to a core cache in response to a DMA read request. Instead of continuing normal operation (i.e., not changing cache line state), the consistency point operates in one of two probing modes for DMA reads, based on a history identifying whether a dirty data transfer was triggered by a previous request for the same cache line. In the first mode, the coherency point operates normally and does not change the cache line state. In the second mode, the coherency point changes the cache line s