US-12625808-B2 - Cache memory supporting data operations with parametrized latency and throughput

US12625808B2US 12625808 B2US12625808 B2US 12625808B2US-12625808-B2

Abstract

The present disclosure relates to a cache memory and methods that handle data forwarding from the cache memory to an action block to perform an action on the data. The action block performs an action on the data and outputs modified data in response to performing the action. The cache memory and methods use a latency parameter for data forwarding to prevent data hazards from occurring and to meet timing requirements and performance requirements of the cache memory.

Inventors

Ahmed ABDELSALAM
Vishalkumar Shantilal GONDALIYA
Anshuman Verma
Robert Groza, Jr.
Dongwook Lee
Ezzeldin Hamed

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260512
Application Date: 20230214

Claims (8)

1 . A method, comprising: receiving, at a cache memory, a cache input request for data; determining the data is in the cache memory; providing, in response to the cache input request, the data to an action block based on a latency parameter of an action performed by the action block, wherein the latency parameter identifies a number of clock cycles to complete the action by the action block; and receiving, from the action block, modified data in response to the action block performing the action on the data, wherein the modified data is output in a plurality of pipeline stages based on a cycles per operation parameter that identifies a throughput of the action block performing the action and a number of pipeline stages in the plurality of pipeline stages is equal to the number of clock cycles of the latency parameter.
2 . The method of claim 1 , wherein providing the data to the action block is further based on a cycles per operation parameter that identifies a throughput of the action block for performing the action.
3 . The method of claim 1 , further comprising: receiving a second cache input request for the data; determining a pipeline stage correlation between the cache input request and the second cache input request, wherein the pipeline stage correlation is a number of clock cycles between receiving the cache input request and receiving the second cache input request; and using the pipeline stage correlation and the latency parameter to identify a pipeline stage from the plurality of pipeline stages for reading the modified data from for the second cache input request.
4 . The method of claim 3 , wherein the plurality of pipeline stages are output from the action block at different clock cycles.
5 . The method of claim 3 , wherein the pipeline stage correlation is a number of clock cycles between receiving the cache input request and receiving the second cache input request.
6 . The method of claim 3 , further comprising: providing, in response to the second cache input request, the modified data to the action block to perform the action on the modified data.
7 . The method of claim 3 , wherein a number of pipeline stages to include in the plurality of pipeline stages is based on the latency parameter.
8 . The method of claim 1 , wherein the latency parameter and the number of clock cycles to complete the action are modified based on different actions the action block applies to the data.

Description

BACKGROUND Cache memories are small and fast memories that are widely used in different computing systems. The main purpose of utilizing cache memories is to bring data from main memories closer to processing units. However, different systems require operations or actions to be applied on the data read from the cache before passing it to processing unit. Although the operations or actions may be simple, complex logic around these operations (e.g., forwarding logic, handling data hazards, etc.) are required to meet proper timing and performance requirements. BRIEF SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Some implementations relate to a method. The method includes receiving, at a cache memory, a cache input request for data. The method includes determining the data is in the cache memory. The method includes providing, in response to the cache input request, the data to an action block based on a latency parameter of an action performed by the action block. Some implementations relate to a cache memory. The cache memory includes a tag manager component that receives a cache input request for data and determines that the data is in the cache memory; and a data manager component that provides, in response to the cache input request, the data to an action block based on a latency parameter of an action performed by the action block. Some implementations relate to a method. The method includes receiving, at a cache memory, a cache input request for data. The method includes determining the data is in a main memory. The method includes providing a read request for the data in the main memory. The method includes receiving, in response to the read request, the data in the cache memory. The method includes reading the cache input request that corresponds to the data. The method includes providing, in response to the cache input request, the data to an action block based on a latency parameter of an action performed by the action block and a cycles per operation parameter of the action block. The method includes receiving, from the action block, modified data in response to the action block performing the action. Some implementations relate to a cache memory. The cache memory includes a tag manager component that receives a cache input request for data and determines that the data is in in a main memory; a read request component that provides a read request for the data in the main memory; and a data manager component that receives, in response to the read request, the data in the cache memory; reads the cache input request that corresponds to the data; provides, in response to the cache input request, the data to an action block based on a latency parameter of an action performed by the action block and a cycles per operation parameter of the action block; and receives, from the action block, modified data in response to the action block performing the action. Some implementations relate to a method. The method includes receiving modified data from an action block based on a latency parameter that identifies a number of clock cycles for the action block to perform an action on data. The method includes sending a write command to write the modified data to main memory in response to receiving a write request from the data manager component. Some implementations relate to a cache memory. The cache memory includes a data manager component that receives modified data from an action block based on a latency parameter that identifies a number of clock cycles for the action block to perform an action on data; and a write back component in communication with the data manager component that sends a write command to write the modified data to main memory in response to receiving a write request from the data manager component. Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the disclosure may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present disclosure will become more fully apparent from the following description and appended claims or may be learned by the practice of the disclosure as set forth hereinafter. BRIEF DESCRIPTION OF THE DRAWINGS In order to describe the manner in which the above-recited and other features of the disclosure can be obtained, a more particular description will be rendered by reference to specific implementations thereof which are illustrated in the appended drawings. For better understanding, the like elements have been desi