CN-115309453-B - Cache access system supporting data prefetching of out-of-order processor
Abstract
The invention belongs to the technical field of integrated circuit design, and particularly relates to a cache access system supporting data prefetching of an out-of-order processor. The system specifically comprises a LOAD access information tracking and sorting module, a LOAD access address history buffer, a prefetcher and a target prefetch address buffer. The pre-fetcher realizes more accurate training and target pre-fetching address prediction by utilizing the sequential access information, and the effective target pre-fetching address output by the pre-fetcher is stored in a target pre-fetching address buffer for waiting for subsequent transmission, and the target pre-fetching address buffer is updated in real time to lose the untimely address so as to avoid transmitting useless pre-fetching addresses. The invention can improve the learning efficiency of access rules and the accuracy of address prediction, and reduce the occupation of the prefetch request to the resources of the cache system.
Inventors
- HAN JUN
- LIU XUDONG
Assignees
- 复旦大学
Dates
- Publication Date
- 20260505
- Application Date
- 20220714
Claims (3)
- 1. A cache access system supporting data prefetching of an out-of-order processor is characterized by comprising an address conversion system for converting out-of-order LOAD access information into sequential LOAD access information and a failure system for target prefetched addresses sent by a prefetcher, wherein the hardware structure comprises a LOAD access information tracking and sorting module, a LOAD access address history buffer, the prefetcher and a target prefetched address buffer, the LOAD access information tracking and sorting module converts out-of-order LOAD access information into sequential LOAD access information and then inputs the prefetcher, the prefetcher realizes more accurate training and target prefetched address prediction by utilizing the sequential access information, effective target prefetched addresses output by the prefetcher are stored in the target prefetched address buffer and wait for subsequent transmission, the target prefetched address buffer is updated in real time to fail untimely addresses so as to avoid useless prefetched addresses, and the cache access system comprises the following steps of: The LOAD access information tracking and sorting module is a circulation queue, wherein the circulation queue is provided with 3 position indexes, namely a queue head, a queue tail and current access table items positioned in the middle of the queue head and the queue tail, the queue head and the queue tail are consistent with the queue head and the queue tail of an address loading queue in a loading storage unit, the table item number is consistent with the table item number of the address loading queue in the loading storage unit, each table item of the circulation queue corresponds to the table item in the same position of the address loading queue, and because the address loading queue keeps the access information in sequence from the queue head to the queue tail, the circulation queue also has the access information in sequence from the queue head to the queue tail, the contents of each table item in the circulation queue comprise an LOAD address, an LOAD instruction PC, whether the address attribute can be fed forward, whether the LOAD address is lost or not, whether the LOAD address is hit or not, whether the LOAD address is directly loaded from the loading storage unit or not, whether the information of the LOAD address is obtained synchronously by the loading storage unit of the LOAD address of the LOAD storage unit, whether the LOAD address is lost or not is obtained from the first-level data storage unit, and whether the access information of the LOAD address is obtained from the corresponding to the buffer storage unit is obtained from the buffer storage unit when the LOAD address is sent to the corresponding to the write address of the circulation queue; The LOAD access address history buffer is a first-in first-out queue, the number of entries of the queue is related to the time required by the prefetcher from inputting access information to generating a predicted target prefetch address, and the first-in first-out queue stores a plurality of sequential access addresses which are recently given by the LOAD access information tracking and sorting module and are input to the prefetcher; The prefetcher is a data prefetching module and has different implementation modes according to different prefetching algorithms, the input and output interfaces of the prefetcher are unified, the input is the sequential access information given by the LOAD access information tracking and sorting module, the output is a predicted target prefetching address, and the prefetcher carries out pattern recognition and regular capture on the input LOAD access information according to a specific prefetching algorithm so as to further predict the target prefetching address; The target prefetch address buffer is a circular queue, wherein each table entry stores a target prefetch address, the target prefetch address generated by the prefetcher is inserted into the tail of the circular queue after judging as a valid address, each table entry storing the valid target prefetch address of the circular queue is possibly disabled externally before sending out a corresponding prefetch request, and the target prefetch address buffer preferentially selects the first valid table entry starting from the head of the queue to send out the corresponding prefetch request.
- 2. The cache access system supporting data prefetching for an out-of-order processor of claim 1 wherein said LOAD access information tracking ordering module wherein the flow of the conversion of out-of-order LOAD access information into sequential LOAD access information is as follows: (1) The LOAD access information tracking and sorting module acquires synchronous LOAD addresses, LOAD instruction PC, whether address attributes can be cached or not and whether feed-forward information can be fed-forward or not from a LOAD address queue in a LOAD storage unit; (2) After the LOAD address is sent to the first-level data cache to judge the missing or hit, writing the information whether the missing or hit exists into the corresponding table entry of the circular queue in the LOAD memory access information tracking and sorting module according to the position of the loading address queue of the LOAD address in the loading and storing unit; (3) The LOAD access information tracking and sorting module is characterized in that a circulating queue in the LOAD access information tracking and sorting module is provided with 3 position indexes, namely a queue head, a queue tail and a current access table item positioned in the middle of the queue head and the queue tail, wherein the queue head and the queue tail are consistent with the queue head and the queue tail of an address queue loaded from a loading storage unit; (4) Every cycle, the LOAD access information tracking and sorting module checks whether an address hit mark, an address miss mark and an address which can be fed forward mark exist in the current access table entry, if at least any one mark of the 3 marks exists, the LOAD address stored in the current access table entry and the access information of the LOAD instruction PC are sent to the LOAD access address history buffer and the prefetcher, and the current access table entry position is updated.
- 3. The cache access system supporting out-of-order processor data prefetching according to claim 1 wherein the invalidation of the target prefetch address from the prefetcher in the target prefetch address buffer is as follows: (1) The prefetcher generates a new target prefetching address, judges whether the address is valid from two aspects, searches whether the same address exists in the effective table entries in the LOAD access memory history buffer and the LOAD access memory information tracking and sorting module or not according to the first aspect, and if so, indicates that the target prefetching address is most likely to be in a cache after being verified by a processor in recent time, the corresponding data is invalid; (2) The target prefetching address generated by the prefetcher is judged to be an effective address and then is inserted into the queue tail of the target prefetching address buffer, and each cycle, whether the target prefetching address in each table item of the target prefetching address buffer is the same as a hit address or a miss address given by the current level data cache or not is checked, if the address stored in a certain table item is the same as the hit address, the address is accessed, the address stored in the table item is not timely, the table item is set as an invalid table item, and the table item is not used for prefetching.
Description
Cache access system supporting data prefetching of out-of-order processor Technical Field The invention belongs to the technical field of integrated circuit design, and particularly relates to a cache access system supporting data prefetching of an out-of-order processor. Background Out-of-order execution and out-of-order memory access are common techniques for improving instruction execution efficiency and memory access efficiency in modern out-of-order processors, and by changing the execution order and memory access order of instructions, the overall instruction execution time and memory access time are reduced. On the other hand, due to the limitations of the complexity of application programs, the cache size and the like, when the method is applied to some applications with dense access, the cache miss rate is higher, the access is gradually a large performance bottleneck of a modern processor, the cached data prefetching technology is an effective way for reducing the cache miss rate and improving the access efficiency, and an advanced high-performance processor can adopt a high-efficiency data prefetcher to improve the access performance. The prefetching technology for prefetching data is called a data prefetcher, the data prefetching generally prefetches only LOAD data serving as a main bottleneck of access, the hardware-implemented data prefetcher processes input access addresses and other access information mainly through hardware according to an internally-implemented prefetching algorithm, attempts to capture and restore access rules of an application program, namely rules among the access addresses, predicts future access addresses by utilizing summarized access rules, and finally "prefetches" corresponding data of the predicted access addresses into a cache, and if the prediction is accurate, the original cache miss is avoided. Since the data prefetcher needs to capture and restore the access rules of the application, this requires that the access addresses received by the data prefetcher are sequential. However, in the out-of-order processor, the memory addresses are out-of-order, and using the out-of-order memory addresses greatly affects the capturing and recovering of the memory rules by the data prefetcher, thereby affecting the prediction accuracy. In addition, the prefetch request sent by the data prefetcher needs to occupy part of the resources of the cache system, besides address prediction errors, the accurate address prediction can cause invalid consumption of the resources and even influence the processing of normal access requests. Disclosure of Invention In order to overcome the defects of the prior art, the invention provides a cache access system which supports data prefetching of an out-of-order processor and can filter untimely prefetching requests. In the invention, a cache access system supporting data prefetching of an out-of-order processor mainly comprises an address conversion system for changing out-of-order LOAD access information into sequential LOAD access information and a failure system for target prefetching addresses sent by a prefetcher. The specific hardware structure comprises a LOAD access information tracking and sorting module, a LOAD access address history buffer, a prefetcher and a target prefetch address buffer. The prefetch can realize more accurate training and target prefetch address prediction by using the sequential access information, the effective target prefetch address output by the prefetch is stored in a target prefetch address buffer for waiting for subsequent transmission, and the target prefetch address buffer is updated in real time to lose the untimely address so as to avoid transmitting useless prefetch addresses. The invention realizes the ordering of the disordered access addresses in the first-level data cache by utilizing the LOAD access ordering information in the LOAD storage queue so as to realize more accurate training and prediction of the data prefetcher and filter untimely prefetching requests through the access address history in a short period. The invention provides a cache access system for prefetching data of a scrambling processor, which comprises a LOAD access information tracking and sorting module, a LOAD access address history buffer, a prefetcher and a target prefetch address buffer, wherein the implementation hardware comprises the following components: The LOAD access information tracking and sorting module is a circular queue, the circular queue is provided with 3 special position indexes, namely a queue head, a queue tail and current access table entries positioned in the middle of the queue head and the queue tail, the queue head and the queue tail are consistent with the queue head and the queue tail of a loading address queue in a loading storage unit, the table entries of the circular queue are consistent with the table entries of the loading address queue in the loading storage unit, each table entry of th