CN-116257467-B - MCM-GPU self-adaptive last-level cache structure and cache switching method thereof

CN116257467BCN 116257467 BCN116257467 BCN 116257467BCN-116257467-B

Abstract

The invention discloses an MCM-GPU self-adaptive last-level cache structure which is arranged in a GPU module and comprises Tag Array and DATE ARRAY, DATA ARRAY, wherein the Tag Array is used for storing data, the Tag Array is used for checking whether data corresponding to an address are in a cache or not, a local access queue is used for storing access requests of a current GPU module, a remote access queue is used for storing access requests of other GPU modules, an LLC architecture change mark bit register is used for storing an LLC architecture change mark bit indicating whether the architecture organization mode of the current last-level cache needs to be changed or not, and an LLC architecture mark bit register is used for storing an LLC architecture mark bit indicating that the current last-level cache is switched into a private last-level cache design or a shared last-level cache design. The invention can support the dynamic switching of the shared last-level cache and the private last-level cache, can adaptively select the last-level cache architecture organization mode according to the configuration of the program in running, meets the program access storage requirement and improves the performance of the MCM-GPU.

Inventors

ZHAO XIA
WANG HUIQUAN
ZHANG GUANGDA
CHEN RENZHI
WAN ZHONG
ZHANG HONGYUN
WANG LU
FANG JIAN

Assignees

中国人民解放军军事科学院国防科技创新研究院

Dates

Publication Date: 20260508
Application Date: 20221230

Claims (5)

1. The MCM-GPU self-adaptive last-level cache structure is arranged in a GPU module of the MCM-GPU and comprises a Tag Array and DATE ARRAY, DATA ARRAY, wherein the Tag Array is used for storing data, and the Tag Array is used for checking whether data corresponding to an address are in a cache or not, and the self-adaptive last-level cache structure is characterized by further comprising: The local access queue is used for storing access requests of the current GPU module; The remote access queue is used for storing access requests of other GPU modules in the MCM-GPU; An LLC architecture change flag bit register for storing an LLC architecture change flag bit indicating whether a current last level cache architecture organization needs to be changed; an LLC architecture tag bit register for storing an LLC architecture tag bit indicating that a current last level cache is switched to a private last level cache design or a shared last level cache design when a change in an architecture organization of the last level cache is required; When the final level cache uses the private final level cache design, a memory access request sent by a stream processor in the current GPU module is put into the local memory access queue to wait for accessing the final level cache; When the last-stage cache is designed by using a shared last-stage cache, according to a storage space accessed by an access address of an access request sent by a stream processor in a current GPU module, placing the access request into the local access queue of the current GPU module or the remote access queues of other GPU modules, and waiting for accessing the last-stage cache; if the memory space accessed by the memory address of the memory request sent by the stream processor in the current GPU module is the local memory space, the memory request is put into the local memory queue of the current GPU module; if the memory space accessed by the memory address of the memory request sent by the stream processor in the current GPU module is the remote memory space, the memory request is put into the remote memory queue of the GPU module corresponding to the remote memory space.
2. The MCM-GPU adaptive last-level cache architecture according to claim 1, wherein when the LLC architecture change flag bit is 0, no change is required to indicate the current last-level cache architecture organization; when the LLC architecture change flag bit is 1, the architecture organization indicating the current last level cache needs to be changed.
3. The MCM-GPU adaptive last-level cache architecture according to claim 1, wherein when the LLC architecture flag bit is 0, indicating that the current last-level cache is switched to a shared last-level cache design; when the LLC architecture tag bit is 1, this indicates that the current last-level cache is switched to the private last-level cache design.
4. A cache switching method for an MCM-GPU adaptive last-level cache architecture as recited in any one of claims 1-3, comprising: Responding to an executed program, reading LLC architecture change marking bits in real time in the execution process of the program and judging whether the architecture organization mode of the current final-stage cache needs to be changed or not; if the change is needed, reading LLC architecture marking bits; and switching the current final-stage cache into a private final-stage cache design or a shared final-stage cache design according to the read LLC architecture tag bit.
5. The cache switching method according to claim 4, further comprising: all cache data in the current last level cache is flushed before the LL architectural tag bit is read.

Description

MCM-GPU self-adaptive last-level cache structure and cache switching method thereof Technical Field The invention relates to the technical field of GPU (graphics processing unit), in particular to an MCM-GPU self-adaptive last-level cache structure and a cache switching method thereof. Background With the continued development of GPU (Graphics Processing Unit, graphics processor) technology, the number of cores in GPUs has increased, and computing power has increased, for example, from Fermi to Volta architecture, and the number of stream processors (STREAMING MULTIPROCESSOR, SM) in GPUs has increased from 14 to 80. Due to limitations of manufacturing process and chip size, it is becoming increasingly difficult to integrate a plurality of SMs on a single wafer, in order to further improve the performance of the GPU and avoid the constraint of the chip size, researches have been proposed on a multi-chip module GPU (MCM-GPU), which packages several GPU modules together to form a new chip by adopting a multi-module packaging mode, and communicates with each other by using technologies such as multilayer interconnection substrate routing and interposer layer routing according to the packaging technology, where each GPU module is connected with a DRAM (Dynamic Random Access Memory ), where the DRAM directly connected with the GPU module is a local storage space of the GPU module, and the DRAM directly connected with other GPU modules is a remote storage space of the GPU module. Referring to FIG. 1, FIG. 1 is a schematic diagram illustrating an MCM-GPU in which a Last-level cache (Last-LEVEL CACHE, LLC) for a GPU module is generally organized using two architectures, including a shared Last-level cache design and a private Last-level cache design. In the design of the shared last-level cache, the stream processors of different GPU modules use the same last-level cache, and only one copy of the same data exists in the last-level cache, so that the hit rate of the last-level cache can be greatly improved. In the private last-level cache design, the private last-level cache sets the LLC on each GPU module as the private LLC of the GPU module, a memory access request sent by a stream processor in the GPU module firstly accesses the LLC of the GPU module, if the memory access request hits, data is returned, if the memory access request fails, the corresponding memory space is accessed, and compared with a shared last-level cache architecture, the private last-level cache architecture can reduce communication among different GPU modules. However, although the use of the shared last-level cache can save storage space, if the stream processor of one GPU module needs to access data in the last-level caches of other GPU modules, all the requests need to reach the last-level caches of the corresponding GPU modules through the interconnection network between the GPU modules, and at this time, a large number of parallel memory access requests will generate a violent network-on-chip conflict, resulting in a decrease in MCM-GPU performance. The private last-level cache is used to store data copies in the local last-level cache to reduce communication among different GPU modules, but if a program has a large shared data set, a great number of data copies can bring waste of last-level cache space to generate extremely high cache failure rate, and a great number of cache invalidation requests need to access storage space of corresponding GPU modules through interconnection networks among the GPU modules, so that on one hand, conflict among the GPU modules can be caused, and on the other hand, the access pressure can be increased, and the MCM-GPU performance is reduced. Disclosure of Invention In order to solve some or all of the technical problems in the prior art, the invention provides an MCM-GPU self-adaptive last-level cache structure and a cache switching method thereof. The technical scheme of the invention is as follows: In a first aspect, an MCM-GPU self-adaptive last level cache structure is provided, and the self-adaptive last level cache structure is arranged in a GPU module of the MCM-GPU, and comprises Tag Array and DATE ARRAY, DATA ARRAY, wherein the Tag Array is used for storing data, and the Tag Array is used for checking whether data corresponding to an address are in a cache or not, and further comprises: The local access queue is used for storing access requests of the current GPU module; The remote access queue is used for storing access requests of other GPU modules in the MCM-GPU; An LLC architecture change flag bit register for storing an LLC architecture change flag bit indicating whether a current last level cache architecture organization needs to be changed; an LLC architecture tag bit register for storing LLC architecture tag bits indicating that the current last level cache is switched to a private last level cache design or a shared last level cache design when the architectural organization of the la