US-12625810-B2 - Delayed cache entry invalidation update for potential overwrite re-use
Abstract
Techniques are disclosed relating to cache control in cache hierarchies. In some embodiments, processor execution circuitry is configured to perform operations on input operand data from a first-level cache, including a first operation that reads first data from an entry in the first-level cache and signals an invalidation of the first data. Control circuitry may set an indicator, in response to the first operation, to indicate that the entry in the first-level cache has a pending invalidation (e.g., a last-use indicator). The control circuitry may, in response to a second operation overwriting the entry in the first-level cache while the indicator is set, clear the indicator without invalidating a corresponding entry in a second-level cache. This may advantageously reduce invalidate operations and bandwidth to the second-level cache.
Inventors
- David K. Li
- Yoong Chert Foo
- Benjiman L. Goodman
- Chance C. Coats
Assignees
- APPLE INC.
Dates
- Publication Date
- 20260512
- Application Date
- 20241010
Claims (19)
- 1 . An apparatus, comprising: first-level cache circuitry; second-level cache circuitry, wherein the first-level cache circuitry and second-level cache circuitry are at different levels in a cache hierarchy; processor execution circuitry configured to perform operations on input operand data from the first-level cache circuitry, including a first operation that reads first data from an entry in the first-level cache circuitry and signals an invalidation of the first data; and control circuitry configured to: set an indicator, in response to the first operation, to indicate that the entry in the first-level cache circuitry has a pending invalidation; in response to a second operation overwriting the entry in the first-level cache circuitry while the indicator is set, clear the indicator without invalidating a corresponding entry in the second-level cache circuitry; and in response to eviction of the entry in the first-level cache circuitry based on a replacement policy while the indicator is set, invalidate the corresponding entry in the second-level cache circuitry.
- 2 . The apparatus of claim 1 , wherein: the first-level cache circuitry is an operand cache; the second-level cache circuitry is a register file or a register data cache; and the first operation is a register read with a last-use indication that signals the invalidation.
- 3 . The apparatus of claim 2 , wherein the operand cache includes entries configured to store register data for multiple threads of single-instruction multiple-thread (SIMT) groups of graphics programs executed by the processor execution circuitry.
- 4 . The apparatus of claim 1 , wherein the control circuitry is configured to guarantee suppression of an invalidate operation to the second-level cache circuitry for the first operation if an immediately-subsequent operation to the first operation overwrites the entry in the first-level cache circuitry.
- 5 . The apparatus of claim 4 , wherein the processor execution circuitry executes a program that was compiled to re-use operand locations for instructions subsequent to operations that signal an invalidation of their read data.
- 6 . The apparatus of claim 1 , wherein the first-level cache circuitry is a closest cache level to the processor execution circuitry in the cache hierarchy.
- 7 . The apparatus of claim 1 , wherein the first operation is a load operation that specifies a memory address.
- 8 . The apparatus of claim 1 , wherein the entry in the first-level cache circuitry includes a valid indication.
- 9 . The apparatus of claim 1 , further comprising: replacement control circuitry configured to prioritize entries in the first-level cache circuitry indicated as having a pending invalidation for eviction.
- 10 . The apparatus of claim 1 , wherein the second-level cache circuitry is a write-back cache and is inclusive of data from locations cached in the first-level cache circuitry.
- 11 . The apparatus of claim 1 , further comprising: eviction port circuitry configured to handle evictions only for entries in the first-level cache circuitry indicated as having a pending invalidation.
- 12 . A method, comprising: performing, by a computing system, operations on input operand data from a first-level cache, including a first operation that reads first data from an entry in the first-level cache and signals an invalidation of the first data; setting, by the computing system, an indicator, in response to the first operation, to indicate that the entry in the first-level cache has a pending invalidation; and clearing, by the computing system in response to a second operation overwriting the entry in the first-level cache while the indicator is set, the indicator without invalidating a corresponding entry in a second-level cache, wherein the second-level cache is a write-back cache and is inclusive of data from locations cached in the first-level cache.
- 13 . The method of claim 12 , wherein: the first-level cache is an operand cache; the second-level cache is a register file or a register data cache; and the first operation is a register read with a last-use indication that signals the invalidation.
- 14 . The method of claim 12 , wherein: the computing system guarantees suppression of an invalidate operation to the second-level cache for the first operation if an immediately-subsequent operation to the first operation overwrites the entry in the first-level cache.
- 15 . The method of claim 14 , further comprising: compiling a program that includes the first operation, including selecting a destination for the immediately-subsequent operation to overwrite the entry in the first-level cache.
- 16 . The method of claim 12 , wherein the first operation is a load operation that specifies a memory address.
- 17 . The method of claim 12 , further comprising: prioritizing, by the computing system, entries in the first-level cache indicated as having a pending invalidation for eviction.
- 18 . The method of claim 12 , wherein the second-level cache is a write-back cache and is inclusive of data from locations cached in the first-level cache.
- 19 . A non-transitory computer-readable medium having instructions stored thereon that are executable by a computing device to perform operations comprising: compiling a program for execution by a processor that includes a first-level cache and a second-level cache, including: generating a first operation that reads first data and signals an invalidation of the first data; and generating a second operation, subsequent to the first operation in program order, including selecting a destination for the second operation to overwrite the first data; wherein the processor includes control circuitry configured to: read the first data from an entry in the first-level cache for the first operation; set an indicator, in response to the first operation, to indicate that the entry in the first-level cache has a pending invalidation; and in response to the second operation overwriting the entry in the first-level cache while the indicator is set, clear the indicator without invalidating a corresponding entry in the second-level cache.
Description
The present application claims priority to U.S. Provisional App. No. 63/696,970, entitled “Delayed Cache Entry Invalidation Update for Potential Overwrite Re-use,” filed Sep. 20, 2024, the disclosure of which is incorporated by reference herein in its entirety. BACKGROUND Technical Field This disclosure relates generally to computer processors and more particularly to cache invalidation control. Description of Related Art Computer processors often utilize multiple levels of data caches to store data that is likely to be accessed again (reducing latency relative to retrieving the data from memory, for example). In this context, a processor may perform various cache control operations, e.g., to evict data from one cache to another cache or invalidate data in one or more caches. Cache maintenance and control operations may utilize bandwidth to a given cache level. Therefore, it may generally be desirable to reduce cache control operations. BRIEF DESCRIPTION OF DRAWINGS FIG. 1A is a diagram illustrating an overview of example graphics processing operations, according to some embodiments. FIG. 1B is a block diagram illustrating an example graphics unit, according to some embodiments. FIG. 2 is a block diagram illustrating example cache circuitry that supports an invalidate-pending status, according to some embodiments. FIG. 3 is a diagram illustrating example cache entry re-use and invalidation suppression in the context of an example instruction sequence, according to some embodiments. FIG. 4 is a flow diagram illustrating an example compiler method for re-use, according to some embodiments. FIG. 5 is a block diagram illustrating example replacement control circuitry that considers invalidate-pending status as an input, according to some embodiments. FIG. 6 is a flow diagram illustrating an example method, according to some embodiments. FIG. 7 is a block diagram illustrating an example computing device, according to some embodiments. FIG. 8 is a diagram illustrating example applications of disclosed systems and devices, according to some embodiments. FIG. 9 is a block diagram illustrating an example computer-readable medium that stores circuit design information, according to some embodiments. DETAILED DESCRIPTION Disclosed techniques allow delayed cache entry invalidation in certain scenarios, which may allow a compiler to potentially re-use the entry and avoid invalidation signaling to one or more cache levels. This may advantageously improve bandwidth for those cache levels, which may ultimately improve processor performance, reduce power consumption, or both. Some processors may implement last-use signaling, e.g., where the compiler provides a last-use indication for one or more input operands of an instruction. When it is the last use of an operand, that operand does not need to be cached after the operation is complete (and indeed, the data should no longer be considered valid). As one specific example, a read operand cache in a graphics processor may store register data very close to the execution circuitry (e.g., closer to the execution circuitry than a register file or level 0 data cache). When the compiler signals the last use of a register, that register can be re-used for other data and the operand data need not be cached. This is one example of a more general concept that a read of data may have an attached invalidation of the data. Traditional implementations might treat this invalidation as a hazard and might immediately perform an invalidation operation to one or more cache levels further from the execution circuitry (e.g., for a last use of data in the operand cache, control circuitry might send an invalidation to the data cache at the next level in a cache hierarchy). In contrast, in disclosed embodiments, cache control circuitry may mark operand data as having a pending invalidation, which may be cleared if the corresponding location (e.g., register) is re-used. In this scenario, control circuitry may not send an invalidate command to other cache levels at all (even though an invalidate was attached to the read data), which may reduce bandwidth to the other cache level(s). These advantages may be particularly relevant in the context of single-instruction multiple-thread (SIMT) implementations in which data for multiple threads of a SIMT group may be cached together in a given cache entry. In some embodiments, a compiler may attempt to re-use operand locations for instructions after operations that signal an invalidation of their read data, which may increase opportunities to suppress cache invalidation signaling. The processor may guarantee invalidation suppression if the compiler does so within certain parameters. Also note that while register reads are discussed in various detailed examples for purposes of illustration, similar techniques may be used for other types of operand data (e.g., for traditional load operations) or for data from other types of data caches than operand caches. Cache