CN-122018985-A - Instruction cache management method, instruction cache, computing device and system

CN122018985ACN 122018985 ACN122018985 ACN 122018985ACN-122018985-A

Abstract

The disclosure relates to an instruction cache management method, an instruction cache, a computing device and a system, which are used for reducing the risk of long-time invalidation of an LRU replacement strategy. The method comprises the steps of receiving a memory access request aiming at an instruction cache, determining a target cache group mapped by the memory access request, detecting whether a target instruction indicated by the memory access request exists in the target cache group, selecting a target cache line to be replaced from the target cache group based on a reference age value of each cache line in the target cache group when the target instruction is missing in the target cache group and the target cache group is full, wherein the reference age value of the cache line is obtained by downsampling an original age value of the corresponding cache line by a set extraction step size, the original age value represents the number of continuous miss times of the corresponding cache line, the reference age value represents the replacement priority of the corresponding cache line in the target cache group, and reading the target instruction from a lower-level memory of the instruction cache and writing the target instruction into the target cache line.

Inventors

Request for anonymity
Request for anonymity
Request for anonymity
Request for anonymity

Assignees

上海壁仞科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (13)

1. An instruction cache management method, comprising: receiving a memory access request aiming at an instruction cache, wherein the instruction cache comprises a plurality of cache groups, and each cache group comprises a plurality of cache lines; determining a target cache group mapped by the access request, and detecting whether a target instruction indicated by the access request exists in the target cache group; selecting a target cache line to be replaced from the target cache group based on a reference age value of each cache line in the target cache group when the target instruction is missing in the target cache group and the target cache group is full, wherein the reference age value of the cache line is obtained by downsampling an original age value of the corresponding cache line by a set extraction step length, the original age value represents continuous miss times of the corresponding cache line, and the reference age value represents replacement priority of the corresponding cache line in the target cache group; and reading the target instruction from a lower-level memory of the instruction cache, and writing the target instruction into the target cache line.
2. The method of claim 1, wherein the extraction step size is an integer power of 2, the raw age value of the cache line is binary coded, the lower field of the binary coding is used for accumulating the number of consecutive misses within a range defined by the extraction step size, and the higher field of the binary coding is used for accumulating the reference age value of the corresponding cache line.
3. The method of claim 1, wherein after the writing the target instruction to the target cache line, the method further comprises: setting the original age value of the target cache line to 0, and Determining the weighted miss times of the target instruction in the cache line miss according to the weight value corresponding to the access request; and updating the original age values of other cache lines in the target cache group based on the weighted miss times.
4. The method of claim 3, wherein the weight value corresponding to the memory request is determined based on at least one of a request type of the memory request and an instruction address carried by the memory request.
5. The method of claim 4, wherein the weight value corresponding to the memory request is determined based on a request type of the memory request, wherein in a case where the memory request is a prefetch instruction request, the weight value is a first weight value set for the prefetch instruction request, and the first weight value is greater than a default weight value.
6. The method of claim 5, wherein the first weight value is greater than or equal to the extraction step size and less than a doubling of the extraction step size.
7. The method of claim 4, wherein the weight value corresponding to the memory access request is determined based on an instruction address carried by the memory access request, wherein if the instruction address belongs to any address segment in a weight configuration table, the weight value is a second weight value configured for the address segment, and the second weight value is greater than a default weight value.
8. The method of claim 7, wherein in the case where the access request is a prefetch instruction request, the weight value is a first weight value set for the prefetch instruction request; and under the condition that the access request is an instruction fetching request, determining a weight value corresponding to the access request based on an instruction address carried by the access request.
9. The method of any of claims 3 to 8, wherein prior to the determining the weighted number of misses for the target instruction for the current cache line miss, the method further comprises: detecting whether full-age cache lines with the original age value reaching an upper limit value exist in the other cache lines; If the full-age cache line exists, keeping the original age value of the other cache lines unchanged; And if the full-age cache line does not exist, executing the step of updating the original age values of other cache lines in the target cache group.
10. The instruction cache is characterized by comprising a plurality of cache groups, each cache group comprises a plurality of cache lines, the instruction cache stores age information of each cache line, the age information comprises an original age value and a reference age value, the reference age value of the cache line is obtained by downsampling the original age value of the corresponding cache line by a set extraction step length, the original age value represents the continuous miss times of the corresponding cache line, and the reference age value represents the replacement priority of the corresponding cache line in the corresponding cache group.
11. An instruction cache arranged to perform the method of any of claims 1 to 9.
12. A computing device, comprising: a plurality of computing units; an instruction cache, which is connected with the computing unit, and is an instruction cache according to claim 10 or 11.
13. A computing system comprising a control device for controlling the computing device to perform a computing task and the computing device of claim 12.

Description

Instruction cache management method, instruction cache, computing device and system Technical Field The present disclosure relates to the field of chip technologies, and in particular, to an instruction cache management method, an instruction cache, a computing device, and a computing system. Background High-performance computing devices, represented by graphics processors (Graphics Processing Unit, GPUs), occupy a central role in the fields of artificial intelligence reasoning, scientific computing, and the like. To meet the access requirements of the massive parallel computing units, the storage architecture of the computing device typically integrates a multi-level Cache (Cache) architecture. The instruction Cache (Instruction Cache, I-Cache) is used as a key storage unit for temporarily storing recently accessed instructions, and the performance of the instruction Cache directly determines the instruction acquisition efficiency of the computing unit. Limited by capacity, instruction caching is required to improve utilization efficiency through replacement strategies. When a certain cache set (set) of the instruction cache is full and a new instruction needs to be loaded, a cache line within the set needs to be selected for replacement. A common replacement policy is least recently Used (LEAST RECENTLY Used, LRU), i.e., the least recently accessed cache line within a set is selected for replacement. To implement LRU, the instruction cache maintains an age value (age) for each cache line to characterize the replacement priority of the corresponding cache line. In conventional implementations, when a cache line in a group is hit, the age value of the cache line is set to 0, and the age values of other cache lines in the same group are all incremented by 1. The smaller the age value, the more recently accessed the cache line, and the larger the age value, the longer the cache line is not accessed. Based on the replacement policy, in a computing architecture with high parallelism, such as a GPU, multiple computing units may access the same cache line during the same period of time. Such high frequency centralized access in a short period of time may cause the age values of most cache lines in the same group to reach the set upper numerical limit rapidly, and lose the ability to prioritize replacement, resulting in a long-term failure of the replacement policy. Disclosure of Invention It is an object of embodiments of the present disclosure to provide a new solution for instruction cache management, reducing the risk of long-term invalidation of LRU-based replacement policies. According to a first aspect of the present disclosure, there is provided an instruction cache management method, the method including: receiving a memory access request aiming at an instruction cache, wherein the instruction cache comprises a plurality of cache groups, and each cache group comprises a plurality of cache lines; determining a target cache group mapped by the access request, and detecting whether a target instruction indicated by the access request exists in the target cache group; selecting a target cache line to be replaced from the target cache group based on a reference age value of each cache line in the target cache group when the target instruction is missing in the target cache group and the target cache group is full, wherein the reference age value of the cache line is obtained by downsampling an original age value of the corresponding cache line by a set extraction step length, the original age value represents continuous miss times of the corresponding cache line, and the reference age value represents replacement priority of the corresponding cache line in the target cache group; and reading the target instruction from a lower-level memory of the instruction cache, and writing the target instruction into the target cache line. Optionally, the extraction step length is an integer power of 2, the original age value of the cache line adopts binary coding, a low-order field of the binary coding is used for accumulating continuous miss times in a range limited by the extraction step length, and a high-order field of the binary coding is used for accumulating the reference age value of the corresponding cache line. Optionally, after the writing the target instruction to the target cache line, the method further includes: setting the original age value of the target cache line to 0, and Determining the weighted miss times of the target instruction in the cache line miss according to the weight value corresponding to the access request; and updating the original age values of other cache lines in the target cache group based on the weighted miss times. Optionally, the weight value corresponding to the access request is determined based on at least one of a request type of the access request and an instruction address carried by the access request. Optionally, the weight value corresponding to the access request is det