US-12625704-B2 - Low power late-selected caches using a set-prediction history
Abstract
A method, computer program product, and computer system for reading data stored in a set associative cache. A cache read instruction that did not read the cache after being previously launched is relaunched after an effective address (EA) of the instruction was ascertained. A hash of the ascertained EA (EAHash) and a class congruence class (CCC) is determined from the ascertained EA. A search is performed for a match of the EAHash and CCC of the ascertained EA to the EAHash and CCC, respectively, of an instruction whose EAHash, CCC, and set are stored in an instruction history stream. If the match is found, only read enables associated with the stored set of the match, which is a read enable of only one class of one address group in the cache, are activated. If the match is not found, all read enables of the one address group are activated.
Inventors
- David A. Hrusecky
- Wolfgang Penth
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20240429
Claims (20)
- 1 . A computer-implemented method for reading data stored in a set associative cache, said method comprising: relaunching, by one or more processors of a computer system, a cache read instruction that did not read the set associative cache after being previously launched, wherein an effective address (EA) of the instruction was ascertained prior to said relaunching and after the instruction was previously launched, wherein the set associative cache comprises G address groups encompassing all of the cache's stored data, wherein each address group comprises S sets and C classes, wherein S mod C=0 and each class comprises S/C sets, wherein each class has a read enable, and wherein G is at least 1, S is at least 2, and C is at least 1; determining, by the one or more processors, a hash of the ascertained EA (EAHash) and a cache congruence class (CCC) from the ascertained EA; and searching, by the one or more processors, for a match of the EAHash and a CCC of the ascertained EA to an EAHash and a CCC, respectively, of an instruction whose EAHash, CCC, and set are stored in an instruction history stream, wherein if the match is found from said searching, then the stored set of the match is referred to as an inferred set which is indicative of a read enable of only one class of one address group, and the one or more processors activate only read enables associated with the inferred set which indicates the read enable of the only one class of the one address group in the cache; and wherein if the match is not found from said searching, then activating, by the one or more processors, all read enables of the one address group in the cache.
- 2 . The method of claim 1 , wherein C is at least 2, wherein the match is found from the search, and wherein said activating only read enables associated with the inferred set comprises: inferring a class associated with the inferred set; address decoding the relaunched instruction to determine the G address groups; logically combining the inferred class and the classes of the determined G address groups to identify the one class of the one address group in the cache; and activating the read enable of only the one class of the one address group in the cache.
- 3 . The method of claim 2 , wherein said logically combining is implemented via use of multiple AND gates comprising one AND gate for each class of each address group.
- 4 . The method of claim 1 , wherein each class of each address group includes at least one SRAM having global bit lines, wherein each class of each address group includes at least 2 subclasses, and wherein the method further comprises prior to execution of a read of the cache at the inferred set: identifying, by the one or more processors, a subclass of the one class of the one address group such that the subclass includes the inferred set; and precharging, by the one or more processors, only global bit lines associated with the identified subclass.
- 5 . The method of claim 4 , wherein C=2, wherein the C classes consist of a class of even numbered sets and a class of odd numbered sets, wherein the inferred set consists of a least significant bit and remaining upper bits, wherein said inferring the class of the inferred set utilizes the least significant bit, and wherein said identifying the subclass of the one class of the one address group utilizes the remaining upper bits.
- 6 . The method of claim 1 , wherein the instruction history stream is a dynamically changing data buffer of constant depth K that stores an array of data for each processed instruction of K previously processed instructions, wherein the arrays of data are sequentially ordered in the buffer according to a latest time of entry into the buffer of the processed instructions such that each new processed instruction entering the buffer results in the instruction having the earliest time of entry into the buffer being dropped out of the buffer, wherein each stored array includes an EAHash, CCC, and predicted set (SETP) determined set of a respective processed instruction that entered the buffer, and wherein K is at least 2.
- 7 . The method of claim 6 , wherein K is in a range of 3 to 5.
- 8 . The method of claim 6 , wherein each stored array further includes a valid bit (V) selected from the group consisting of 1 or 0 denoting that the processed instruction is valid or invalid, respectively, and wherein the valid bit is set to 1 for each processed instruction entering the buffer.
- 9 . The method of claim 8 , wherein in response to a determination that a CCC and a SETP determined set of a cache write instruction respectively matches a CCC and a SETP determined set in one stored array in the buffer and that the valid bit of the one stored array is 1, setting the valid bit to 0 for the one stored array.
- 10 . The method of claim 6 , wherein an invalid SETP determined set in an array of one instruction in the instruction history stream is indicative of a SETPmiss due to the one instruction having attempted to read non-existent data from a cache line in the cache at the invalid SETP determined set.
- 11 . The method of claim 6 , wherein said searching results in a multihit SETP determined set match due to a hit on two different set values.
- 12 . The method of claim 1 , wherein said relaunching comprises relaunching the cache read instruction from a load launch queue that includes the ascertained EA.
- 13 . The method of claim 1 , wherein the match is not found from the search.
- 14 . The method of claim 1 , wherein the match is found from the search, and wherein the method further comprises: obtaining, by the one or more processors, an actual set of the relaunched instruction from a predicted set (SETP) array; and determining, by the one or more processors, that the inferred set is not equal to the actual set so that data read from the cache is incorrect and cannot be used and in response, performing, by the one or more processors, an auto-correct process that mitigates incorrect data having been read from the cache.
- 15 . A computer program product, comprising one or more computer readable hardware storage devices having computer readable program code stored therein, said program code containing instructions executable by one or more processors of a computer system to implement a computer-implemented method for reading data stored in a set associative cache, said method comprising: relaunching, by the one or more processors, a cache read instruction that did not read the set associative cache after being previously launched, wherein an effective address (EA) of the instruction was ascertained prior to said relaunching and after the instruction was previously launched, wherein the set associative cache comprises G address groups encompassing all of the cache's stored data, wherein each address group comprises S sets and C classes, wherein S mod C=0 and each class comprises S/C sets, wherein each class has a read enable, and wherein G is at least 1, S is at least 2, and C is at least 1; determining, by the one or more processors, a hash of the ascertained EA (EAHash) and a cache congruence class (CCC) from the ascertained EA; searching, by the one or more processors, for a match of the EAHash and a CCC of the ascertained EA to an EAHash and a CCC, respectively, of an instruction whose EAHash, CCC, and set are stored in an instruction history stream, wherein if the match is found from said searching, then the stored set of the match is referred to as an inferred set which is indicative of a read enable of only one class of one address group, and the one or more processors activate only read enables associated with the inferred set which indicates the read enable of the only one class of the one address group in the cache; and wherein if the match is not found from said searching, then activating, by the one or more processors, all read enables of the one address group in the cache.
- 16 . The computer program product of claim 15 , wherein C is at least 2, and wherein the match is found from the search, and wherein said activating only read enables associated with the inferred set comprises: inferring a class associated with the inferred set; address decoding the relaunched instruction to determine the G address groups; logically combining the inferred class and the classes of the determined G address groups to identify the one class of the one address group in the cache; and activating the read enable of only the one class of the one address group in the cache.
- 17 . The computer program product of claim 15 , wherein each class of each address group includes at least one SRAM having global bit lines, wherein each class of each address group includes at least 2 subclasses, and wherein the method further comprises prior to execution of a read of the cache at the inferred set: identifying, by the one or more processors, a subclass of the one class of the one address group such that the subclass includes the inferred set; and precharging, by the one or more processors, only global bit lines associated with the identified subclass.
- 18 . A computer system, comprising one or more processors, one or more memories, and one or more computer readable hardware storage devices, said one or more hardware storage devices containing program code executable by the one or more processors via the one or more memories to implement a computer-implemented method for reading data stored in a set associative cache, said method comprising: relaunching, by the one or more processors, a cache read instruction that did not read the set associative cache after being previously launched, wherein an effective address (EA) of the instruction was ascertained prior to said relaunching and after the instruction was previously launched, wherein the set associative cache comprises G address groups encompassing all of the cache's stored data, wherein each address group comprises S sets and C classes, wherein S mod C=0 and each class comprises S/C sets, wherein each class has a read enable, and wherein G is at least 1, S is at least 2, and C is at least 1; determining, by the one or more processors, a hash of the ascertained EA (EAHash) and a cache congruence class (CCC) from the ascertained EA; searching, by the one or more processors, for a match of the EAHash and a CCC of the ascertained EA to an EAHash and a CCC, respectively, of an instruction whose EAHash, CCC, and set are stored in an instruction history stream, wherein if the match is found from said searching, then the stored set of the match is referred to as an inferred set which is indicative of a read enable of only one class of one address group, and the one or more processors activate only read enables associated with the inferred set which indicates the read enable of the only one class of the one address group in the cache; and wherein if the match is not found from said searching, then activating, by the one or more processors, all read enables of the one address group in the cache.
- 19 . The computer system of claim 18 , wherein C is at least 2, and wherein the match is found from the search, and wherein said activating only read enables associated with the inferred set comprises: inferring a class associated with the inferred set; address decoding the relaunched instruction to determine the G address groups; logically combining the inferred class and the classes of the determined G address groups to identify the one class of the one address group in the cache; and activating the read enable of only the one class of the one address group in the cache.
- 20 . The computer system of claim 18 , wherein each class of each address group includes at least one SRAM having global bit lines, wherein each class of each address group includes at least 2 subclasses, and wherein the method further comprises prior to execution of a read of the cache at the inferred set: identifying, by the one or more processors, a subclass of the one class of the one address group such that the subclass includes the inferred set; and precharging, by the one or more processors, only global bit lines associated with the identified subclass.
Description
BACKGROUND The present invention relates generally to reading data stored in a cache, and more specifically, to reading data stored in a set associative cache using an instruction history stream. SUMMARY Embodiments of the present invention provide a method, a computer program product, and a computer system, for reading data stored in a set associative cache. One or more processors of a computer system relaunch a cache read instruction that did not read the set associative cache after being previously launched, wherein an effective address (EA) of the instruction was ascertained prior to the relaunching and after the instruction was previously launched, wherein the set associative cache includes G address groups encompassing all of the cache's stored data, wherein each address group comprises S sets and C classes, wherein S mod C=0 and each class comprises S/C sets, wherein each class has a read enable, and wherein G is at least 1, S is at least 2, and C is at least 1. The one or more processors determine a hash of the ascertained EA (EAHash) and a cache congruence class (CCC) from the ascertained EA. The one or more processors search for a match of the EAHash and a CCC of the ascertained EA to an EAHash and a CCC, respectively, of an instruction whose EAHash, CCC, and set are stored in an instruction history stream. If the match is found from the search, then the stored set of the match is referred to as an inferred set and the one or more processors activate only read enables associated with the inferred set which is a read enable of only one class of one address group in the cache. If the match is not found from the search, then the one or more processors activate all read enables of the one address group in the cache. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 depicts a set associative cache, in accordance with embodiments of the present invention. FIG. 2 depicts a load/store system for accessing data from a set associative cache organized into address groups configured with static random-access memories (SRAMs), in accordance with embodiments of the present invention. FIG. 3 is a flow chart of a computer-implemented method for reading data stored in a set associative cache, in accordance with embodiments of the present invention. FIG. 4 is a flow chart of a process for activating a read enable of only one class of one address group in the cache, in accordance with embodiments of the present invention. FIG. 5 is a flow chart of a process for selectively precharging global bit lines of at least one static random-access memory (SRAM) in the cache, in accordance with embodiments of the present invention. FIG. 6 illustrates a computer system, in accordance with embodiments of the present invention. FIG. 7 depicts a computing environment which contains an example of an environment for the execution of at least some of the computer code involved in performing the inventive methods, in accordance with embodiments of the present invention. DETAILED DESCRIPTION Embodiments of the present invention provide a method, system and computer program product for accessing data in a set associative cache efficiently at reduced power consumption. Computer processor Level 1 caching mechanisms are required to be as fast as possible to be effective, so much so that often consideration for implementing low power alternative schemes are shunned due to the caching mechanisms' inherently deliberate design style that typically favors power economy over speed. L1 caches are typically built with set associativity arrangements that give rise to anywhere from 4 to 16 sets of data being accessed within the structure in parallel for highest performance. Performance studies of different designs typically show that increasing the number of sets that make up the congruence class is the most effective way of increasing a cache's usefulness; for example, using 16 sets instead of 8 sets for a cache structure of similar capacity (32 K bytes for example). To that end, recently designed L1 cache mechanisms typically have a set-prediction scheme that runs somewhat in parallel to the addressing access method of the cache structure itself. The set-prediction method delivers a reasonable choice of which of the simultaneously accessed 4 to 16 sets, depending on the design, will be selected to continue forward in the pipeline toward delivering this data result to the central processing unit (CPU). The designs with the largest set-associativity will consume the most power but may be grudgingly justified by the need for high performance. Although power could be saved by a power-saving design that includes waiting for the set prediction to complete before accessing the cache, and then accessing only the set that is chosen by the set-prediction method, an intolerable delay would be placed onto every cache access, so that this power-saving design may not be sufficiently efficient to be useful in practice. Embodiments of the present invention provide a new p