CN-121996582-A - Data caching hardware prefetching device and method

CN121996582ACN 121996582 ACN121996582 ACN 121996582ACN-121996582-A

Abstract

The invention discloses a data caching hardware prefetching device and method, comprising a safety caching module, a page information spreading module, a safety filter and a hardware prefetcher, wherein the safety caching module is connected with an L1 data cache in parallel and simultaneously receives a memory access request of a CPU core, the page information spreading module is connected between a TLB and an MSHR of the L1 data cache and is used for extracting page size information from the TLB and transmitting the page size information to the L1 data cache MSHR, the safety filter is connected between the safety caching module and the L1 data cache and monitoring data flow related to the memory access in the CPU core during submission, and the hardware prefetcher is connected at an output end of the L1 data cache MSHR and receives miss request information marked by the page size and dynamically adjusts a prefetching boundary according to the page size information. The invention realizes the support of long memory access stream prefetching crossing 4KB physical page boundaries, avoids high delay and high energy consumption expense caused by directly accessing TLB, and effectively defends side channel attack and cache information leakage caused by cross-page prefetching.

Inventors

LI DONGSHENG
ZHANG XIRAN
WU YE

Assignees

南京英麒智能科技有限公司

Dates

Publication Date: 20260508
Application Date: 20251225

Claims (9)

1. A data cache hardware prefetching device is characterized by comprising a safety cache module, a page information spreading module, a safety filter and a hardware prefetcher, wherein the safety cache module is connected with an L1 data cache in parallel and receives a memory access request of a CPU core, the page information spreading module is connected between a TLB and an MSHR of the L1 data cache and is used for extracting page size information from the TLB and transmitting the page size information to the L1 data cache MSHR, the safety filter is connected between the safety cache module and the L1 data cache and is used for monitoring data flow related to the memory access in the CPU core during submission, and the hardware prefetcher is connected to an output end of the L1 data cache MSHR and is used for receiving the miss request information marked by the page size and dynamically adjusting a prefetching boundary according to the page size information.
2. The data caching hardware prefetching apparatus of claim 1 wherein the secure cache module creates a time stamp for each speculatively executed instruction data during the CPU speculation phase, the instruction data for different time windows being isolated from each other to prevent data leakage across the time windows.
3. The data caching hardware prefetching device according to claim 1, wherein the page information spreading module internal encoder encodes the page size of the "page size information" according to the TLB information, writes the edited page size information into the MSHR extended bit, updates the MSHR page size valid and the MSHR page size data, sets the valid bit of the corresponding entry, calculates the prefetch boundary allowed by the page size flag in the MSHR entry according to the page size, and finally spreads the page size information and the prefetch boundary to the hardware prefetcher, and transmits the complete prefetch control information.
4. The data cache hardware prefetch of claim 1 wherein upon speculative load instruction commit, the security filter examines a hit level field stored in the load queue, if the hit level is 00, the security filter directly discards the commit reacquire request or the commit write request, if the hit level is not 00, allowing the reacquire or commit write operation to proceed normally, transferring data from the security cache to the L1 data cache, wherein a level of 00 indicates that the data is from the L1 data cache.
5. The data cache hardware prefetching apparatus of claim 1 wherein the security filter adds a 1-bit L2 write-back bit in each cache line of the L1 data cache, sets the value of the write-back bit according to the original hit level when data moves from the security cache to the L1 data cache, and subsequently when the cache line is knocked out of the L1 data cache, the security filter decides whether the data needs to be propagated to the L2 cache according to the write-back bit, thereby avoiding sending redundant write-back requests to the cache level where the data already exists.
6. The data cache hardware prefetching apparatus of claim 1 wherein said prefetch generating unit in said hardware prefetcher hardware generates prefetch requests based on the trained increment and confidence, including calculating confidence, target cache selection, prefetch distance adaptation, wherein prefetch distance adaptation comprises monitoring a delayed prefetch rate, increasing prefetch distance if a threshold is exceeded, generating prefetch requests for a plurality of incremental addresses.
7. The data caching hardware prefetching apparatus of claim 1 wherein when a load instruction is submitted, the hardware prefetcher performs a training process, using search window computation based on fetch delay, so that the time of submission training can learn a timely increment comparable to the time of access training, and after the training is completed, updating instruction pointer information in a history table, including recording a new commit time and address, updating increment history, refreshing coverage statistics, while cleaning load queue entries, releasing resources.
8. A hardware prefetching method applied to a data caching hardware prefetching device according to claim 1 is characterized in that when a speculative load request is generated, a CPU core accesses a secure cache module and an L1 data cache at the same time, if the secure cache module is not hit, a conventional cache hierarchy is queried sequentially, data response is directly filled into the secure cache module to bypass the L1 data cache, meanwhile, a security filter records a cache level of a data source, if the L1 data cache is not hit, a page information propagation module acquires page size information from a TLB and stores the page size information into an MSHR expansion bit, the page size information is then transmitted to a hardware prefetcher, the hardware prefetcher dynamically adjusts a prefetching boundary according to the page size information, and when a speculative instruction is submitted, the security filter determines whether to filter a write operation when redundancy is submitted according to the hit level recorded previously, and unnecessary access to the cache hierarchy is reduced.
9. The data caching hardware prefetching method of claim 8 wherein upon commit of the speculatively executed instructions, if the secure cache module hits, data of the committed memory instruction is transferred from the secure cache module to the L1 data cache via a commit-time write operation.

Description

Data caching hardware prefetching device and method Technical Field The present invention relates to the field of hardware prefetching technologies, and in particular, to a data cache hardware prefetching device and method. Background The CPU introduces a paging mechanism on memory management, and each program or task running on the CPU has its own segments, which are defined by segment descriptors. As the program executes, when the memory is to be accessed, the segment addresses are added with the offset, and the segment component outputs a linear address. In the simple piecewise mode, the linear address is the physical address. Under page memory management, i.e., dividing a memory space (e.g., 4 GB) into pages of the same size, the minimum unit of a page is typically 4KB. However, existing data caching hardware prefetchers generally do not allow prefetching beyond the 4KB physical page boundaries because the continuity of physical addresses cannot be guaranteed, i.e., consecutive addresses in virtual address space may be far apart in physical address space. Limiting the data cache hardware prefetchers to prefetching only for 4KB physical page space also limits their ability to speculate on long memory access streams. To implement secure prefetching outside of a 4KB physical page, direct access to the cache table (Translation Lookaside Buffer, TLB) and reverse address translation is required, which can introduce high latency and energy consumption overhead that prevents secure prefetching across a 4KB physical page boundary in a CPU microarchitecture design. Furthermore, prefetching across 4KB physical page boundaries is susceptible to security problems, as an attacker may use it to create side-channel attacks. Prefetchers cannot perceive access rights for particular pages, so cross-page prefetching may allow loading of data from pages that would otherwise be inaccessible to the prefetcher. Such a transient execution attack of spectra/Meltdown and its variants may result in information leakage in the cache hierarchy. Disclosure of Invention The invention aims to provide a data caching hardware prefetching device and a data caching hardware prefetching method, which are used for realizing the purpose of supporting the prefetching of long memory access streams crossing 4KB physical page boundaries, avoiding high delay and high energy consumption expense caused by directly accessing TLB (cache memory), and effectively defending side channel attack and cache information leakage caused by the cross-page prefetching. The hardware prefetching device for the data cache comprises a safety cache module, a page information spreading module, a safety filter and a hardware prefetcher, wherein the safety cache module is connected with an L1 data cache in parallel and receives a memory access request of a CPU core, the page information spreading module is connected between a TLB and an MSHR of the L1 data cache and is used for extracting page size information from the TLB and transmitting the page size information to the L1 data cache MSHR, the safety filter is connected between the safety cache module and the L1 data cache and is used for monitoring data flow related to the memory access in the CPU core during submission, and the hardware prefetcher is connected to an output end of the L1 data cache MSHR, receives miss request information marked by the page size and dynamically adjusts a prefetching boundary according to the page size information. Preferably, in the stage of the CPU speculation execution, the secure cache module creates a timestamp for each speculatively executed instruction data, and the instruction data in different time windows are isolated from each other, so that data leakage across the time windows is prevented. Preferably, the internal encoder of the page information spreading module encodes the page size of the page size information according to the TLB information, writes the edited page size information into the MSHR expansion bit, updates the MSHR page size valid and MSHR page size data, sets the valid bit of the corresponding entry, calculates the prefetch boundary allowed by the page size mark in the MSHR entry according to the page size, and finally spreads the page size information and the prefetch boundary to the hardware prefetcher to transmit complete prefetch control information. Preferably, when speculative load instructions commit, the security filter examines the hit level field stored in the load queue, if the hit level is 00, the security filter directly discards the re-fetch request at commit or the write-in-commit request, if the hit level is not 00, the write-in-commit operation is allowed to proceed normally, the data is transferred from the security cache to the L1 data cache, wherein a level of 00 indicates that the data is from the L1 data cache. Preferably, the security filter adds 1-bit L2 write-back bit in each cache line of the L1 data cache, when data moves from the s