EP-4462243-B1 - SYSTEMS AND METHODS FOR PREFETCHING DATA VIA A HOST-ACCESSIBLE PREFETCHER QUEUE

EP4462243B1EP 4462243 B1EP4462243 B1EP 4462243B1EP-4462243-B1

Inventors

ZHANG, TONG
LI, ZONGWANG
ZHANG, DA
PITCHUMANI, REKHA
KI, YANG SEOK

Dates

Publication Date: 20260506
Application Date: 20240425

Claims (14)

A computing system comprising: a storage device (102) including a first queue (206a), a first storage medium (120), a second storage medium (118); a processor (106) configured to communicate with the storage device (102); and a memory (108) coupled to the processor (106), the memory (108) storing instructions that, when executed by the processor (106), cause the processor (106) to: be configured to access the first queue that has been exposed as a mapped queue to said processor (106) by the storage device (102) ; execute (700) a first command generated by a first application (600a), wherein the execution of the first command causes the processor to access the mapped queue and store (702), into the mapped queue, a first address associated with first data, wherein the storage device (102) is configured to retrieve (704) the first address from the first queue (206a), retrieve (706) the first data from the first storage medium (120) based on the first address, and store (708) the first data to the second storage medium (118); and execute a second command generated by a second application (600b), wherein the execution of the second command causes the processor (106) to transmit a request to the storage device (102) to retrieve the first data from the second storage medium (118).
The system of claim 1, wherein the instructions further cause the processor (106) to: based on storing the first address in the first queue (206a), modify a pointer (218, 220) identifying a location in the first queue (206a).
The system of claim 1, wherein the first command includes a command to prefetch data stored in a location associated with the first address.
The system of any one of claims 1 to 3, wherein the first storage medium (120) includes non-volatile memory (108), and the second storage medium (118) includes volatile memory (108).
The system of any one of claims 1 to 4, wherein the first queue (206a) is mapped to a user space (410) associated with an application (112), wherein the first command is executed in the user space (410).
The system of claim 1, wherein the instructions further cause the processor (106) to: execute a second command for moving second data stored in the storage device (102); based on the second command, store, into the first queue (206a), a second address associated with the second data, wherein the storage device (102) is configured to retrieve the second address from the first queue (206a), retrieve the second data from the first storage medium (120) based on the second address, and store the second data to the second storage medium (118).
The system of claim 6, wherein the first command includes a first call to an operating system, and the second command includes a second call to the operating system, wherein the operating system is configured to manage storing of the first address and the second address into the first queue (206a).
The system of claim 6 or 7, wherein the first queue (206a) is allocated to the first application (600a), and a second queue (206b) is allocated to the second application (600b).
The system of claim 1, wherein the processor (106) is configured to: modify a pointer (218, 220) associated with the first queue (206a), based on the storing of the first address into the first queue (206a.
The system of any one of claims 1 to 9, wherein the instructions further cause the processor to: identify a value for prefetching data; run an application; measure performance of the application; modify the value based on the performance; and determine that the performance satisfies a criterion.
The system of claim 10, wherein the value is a stride of a second memory block to be prefetched, relative to a first memory block used by the application (112).
The system of claim 10 or 11, wherein the value is identified in a prefetch command included in the application.
The system of claim 12, wherein the first address identified based on the value is stored in the first queue (206a) based on the prefetch command.
A method comprising: accessing, by a processor (106) of a computing system having a memory (108) coupled to the processor (106), a first queue (206a) that has been exposed as a mapped queue to said processor (106) by the storage device (102), wherein the first queue (206a) is included in a storage device (102) having a first storage medium (120) and a second storage medium (118); executing (700), by the processor (106), a first command generated by a first application (600a) for accessing the mapped queue, and storing (702) into the mapped queue, a first address associated with first data, wherein the storage device (102) is configured to retrieve (704) the first address from the first queue (206a), retrieve (706) the first data from the first storage medium (120) based on the first address, and store (708) the first data to the second storage medium (118); and executing, by the processor (106), a second command generated by a second application (600b), wherein the executing of the second command causes the processor to transmit a request to the storage device (102) to retrieve the first data from the second storage medium (118).

Description

FIELD One or more aspects of embodiments according to the present disclosure relate to storage devices, and more particularly to prefetching data stored in a storage device. BACKGROUND An application may interact with a storage or memory device (collectively referenced as storage device) for reading (or loading) and writing (or storing) data. Latencies are generally involved in accessing the storage device. The type of latency involved may depend on the storage medium included in the storage device. Certain storage media have lower latencies than other storage media. Thus, it may be desirable to manage the storing of data in the storage device so as to improve overall system performance and responsiveness. The above information disclosed in this Background section is only for enhancement of understanding of the background of the present disclosure, and therefore, it may contain information that does not form prior art. US 2021/224200 A1 discloses computer program product, system, and method for staging data from storage to a fast cache tier of a multi-tier cache in a non-adaptive sector caching mode in which data staged in response to a read request is limited to track sectors required to satisfy the read request. Data is also staged from storage to a slow cache tier of the multi-tier cache in a selected adaptive caching mode of a plurality of adaptive caching modes available for staging data of tracks. Adaptive caching modes are selected for the slow cache tier as a function of historical access ratios. Prestage requests for the slow cache tier are enqueued in one of a plurality of prestage request queues of various priority levels as a function of the selected adaptive caching mode and historical access ratios. Yoon-Young Lee et al: "Table-comparison prefetching in VIA-based parallel file system", doi:10.1109/CLUSTR.2001.959971, concerns table-comparison prefetching in VIA-based parallel file system. US 2006/179238 A1 discloses: In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window. Han Kyuhwa et al: "Command queue-aware host I/O stack for mobile flash storage", doi:10.1016/J.SYSARC.2020.101758, concerns command queue-aware host I/O stack for mobile flash storage. WO 2020/172693 A2 relates to technology for pre-fetching data. An apparatus comprises a processor core, pre-fetch logic, and a memory hierarchy. The pre-fetch logic is configured to generate cache pre-fetch requests for a program instruction identified by a program counter. The pre-fetch logic is configured to track one or more statistics with respect to the cache pre-fetch requests. The pre-fetch logic is configured to link the one or more statistics with the program counter. The pre-fetch logic is configured to determine a degree of the cache pre-fetch requests for the program instruction based on the one or more statistics. The memory hierarchy comprises main memory and a hierarchy of caches. The memory hierarchy further comprises a memory controller configured to pre-fetch memory blocks identified in the cache pre-fetch requests from a current level in the memory hierarchy into a higher level of the memory hierarchy. US 6 502 157 B1 discloses a bridge system and method for prefetching data to return to a read request from an agent. The bridge system includes at least one memory device including a counter indicating a number of prefetch operations to perform to prefetch all the requested data, a first buffer capable of storing prefetch requests, and a second buffer capable of storing read data. Control logic implemented in the bridge system includes means for queuing at least one prefetch operation in the first buffer while the counter is greater than zero. The control logic then executes a queued prefetch operation, subsequently receives the prefetched data, and stores the prefetched data in the second buffer. The stored prefetched data is returned to the requesting agent. US 2019/129854 A1 discloses a computing device including a processor and a non-volatile dual in-line memory module (NVDIMM) connected to the processor. The NVDIMM includes a first