US-12625623-B2 - Multi-level memory system power management apparatus and method
Abstract
A multi-level memory architecture scheme to dynamically balance a number of parameters such as power, thermals, cost, latency and performance for memory levels that are progressively further away from the processor in the platform based on how applications are using memory levels that are further away from processor cores. In some examples, the decision making for the state of the far memory (FM) is decentralized. For example, a processor power management unit (p-unit), near memory controller (NMC), and/or far memory host controller (FMHC) makes decisions about the power and/or performance state of the FM at their respective levels. These decisions are coordinated to provide the most optimum power and/or performance state of the FM for a given time. The power and/or performance state of the memories adaptively change to changing workloads and other parameters even when the processor(s) is in a particular power state.
Inventors
- CHIA-HUNG KUO
- Anoop Mukker
- Eng Hun Ooi
- Avishay Snir
- Shrinivas Venkatraman
- Kuan Hua Tan
- Wai Ben Lin
Assignees
- INTEL CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20201219
Claims (12)
- 1 . An apparatus comprising: a plurality of processing cores; a first memory controller coupled to one or more first memory modules via a first link; a second memory controller coupled to one or more second memory modules via a second link and a second-memory module control circuit device, wherein the one or more second memory modules are second level or higher, writeable non-volatile memory modules; and a power management unit coupled to the plurality of processing cores, the first memory controller, and the second memory controller, wherein the power management unit is to determine power and/or performance policy and boundary conditions for the apparatus, and to communicate a power state for the first and/or second links via the first memory controller and/or the second memory controller, wherein the second memory controller manages power of the one or more second memory modules via the second link based on a dynamic profile of workload fed to the second memory controller from the power management unit, and the second-memory module control circuit device manages a second memory module power, performance or thermal state based on second memory module parameters.
- 2 . The apparatus of claim 1 , wherein the second-memory device has precedence over the second memory controller and/or the power management unit to decide the power state of the second link.
- 3 . The apparatus of claim 1 , wherein the second memory controller includes a timer to determine exit latency from a power state of the second link, wherein the exit latency is considered by the second memory controller to determine a power state of the second link.
- 4 . The apparatus of claim 1 , wherein the power management unit receives memory access pattern hints for an operating system, and provides the memory access pattern hints to the second memory controller, wherein the second memory controller considers the memory access pattern hints to determine a power state of the second link.
- 5 . The apparatus of claim 1 , wherein the power and/or performance policy includes Hour of battery life, and quality of service.
- 6 . The apparatus of claim 1 , wherein the boundary conditions include power envelope, thermal limit, and maximum supply current.
- 7 . The apparatus of claim 1 , wherein the first link is a double data rate link, and wherein the first memory modules comprise dynamic random-access memory.
- 8 . The apparatus of claim 1 , wherein the second link is a peripheral component interface express link, wherein the second memory modules have slower exit latency than an exit latency of the first memory modules.
- 9 . The apparatus of claim 1 , wherein power state of the first and/or second links is decoupled from power states of the plurality of processing cores.
- 10 . A system comprising: far memory modules; near memory modules; a processor coupled to the far memory modules and the near memory modules; and a wireless device to allow the processor to communicate with another device, wherein the processor includes: a plurality of processing cores; a near memory controller coupled to the near memory modules via a first link; a far memory controller coupled to the far memory modules via a second link; and a power management unit coupled to the plurality of processing cores, the near memory controller, and the far memory controller, wherein the power management unit is to determine power and/or performance policy and boundary conditions for the processor, and to communicate a power state for the first and/or second links via the near memory controller and/or the far memory controller, wherein the far memory controller manages power of the far memory modules via the second link and based on a dynamic profile of workload fed to a memory control circuit device, wherein the memory control circuit device is coupled to the far memory modules and the far memory controller.
- 11 . The system of claim 10 , wherein the memory device has precedence over the far memory controller and/or the power management unit to decide the power state of the second link.
- 12 . The system of claim 10 , wherein the far memory controller includes a timer to determine exit latency from a power state of the second link, wherein the exit latency is considered by the far memory controller to determine a power state of the second link.
Description
BACKGROUND Current memory architectures, where power state of a memory is tightly coupled with a processor and/or system-on-chip (SoC) power state, work well for a single-level memory. Here, a single-level memory is a memory which is at a hierarchy above a processor cache. For example, dynamic random-access memory (DRAM) in an SoC which behaves as a main memory for a processor is a single-level memory. As memory architectures evolve to expand memory beyond the single-level memory (e.g., DRAM) to much denser two-level memory (2LM) with second tier of memory or higher, platforms may not afford tightly coupled power state of the expansion memory to a processor activity state. An example of higher latency storage device is a hard disk drive (HDD), non-volatile off-die memory such as 3Dxpoint™ by Intel Corporation of California, etc. One reason that platforms may not afford tightly coupled power state of the 2LM to a processor activity state is due to significantly higher additive power of the connecting interfaces and the 2LM, and costly thermal solutions to cool down the 2LM. Further, the 2LM may not be arbitrarily placed in a low power state because of the latency, performance and energy penalties associated with the exiting low power state when the processor or IO devices have to access the 2LM. BRIEF DESCRIPTION OF THE DRAWINGS The embodiments of the disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the disclosure, which, however, should not be taken to limit the disclosure to the specific embodiments, but are for explanation and understanding only. FIG. 1 illustrates a memory management subsystem, in accordance with some embodiments. FIG. 2 illustrate a coordinated power management system. FIG. 3 illustrates a decoupled power management system, in accordance with some embodiments. FIG. 4 illustrates a flowchart of decoupled power management system for Hour of battery life (HOBL) and/or workloads with quality-of-service (QoS) requirements, in accordance with some embodiments. FIG. 5 illustrates a smart device or a computer system or a SoC (System-on-Chip) with decoupled power management system to optimize power thermals and latency, in accordance with some embodiments. DETAILED DESCRIPTION Some embodiments describe a multi-level memory architecture scheme to dynamically balance a number of parameters such as power, thermals, cost, latency and performance for memory levels that are progressively further away from the processor in the platform based on how applications are using memory levels that are further away from processor cores. The memory levels include a range of levels extending from a nearest memory level to a most distant memory level. The closest memory level is the cache, followed by level-2 cache, main memory, and so on. Here the most distant memory level is generally referred to as the Far Memory (FM). Various embodiments provide a scheme which comprises a combination of hardware and software to manage power state level of the FM in view of an analysis of the number of parameters. Some embodiments, provide a system solution to manage power, performance, and latency state of the FM sub-system through a combination of system-level hardware and software solutions that create a closed loop architecture where decisions are dynamically adjusted based on current workload needs, access profiles, system and/or device thermal state. In some embodiments, the decision making for the state of the FM is decentralized. For example, a processor power management unit (p-unit), near memory controller (NMC), and/or far memory host controller (FMHC) makes decisions about the power and/or performance state of the FM at their respective levels. These decisions are coordinated to provide the most optimum power and/or performance state of the FM for a given time. For example, each individual component (e.g., p-unit, NMC, FMHC) makes its decisions based on the information available at its level in a pipeline and co-ordinates the decisions of the components below it in the pipeline. In a pipeline, p-unit is at the lowest level while software running on an Operating system (OS) is at the highest level of the pipeline. Therefore, software is also treated as one of the “components” in this solution as it can receive information from the applications and users directly and, therefore, can provide a higher level of coordination. Other levels in the pipeline include the NMC, FCHC, firmware, and OS. In some embodiments, an apparatus is provided which includes a plurality of processing cores. The processor cores can be symmetric or asymmetric. The apparatus further comprises a first memory controller (e.g., near memory controller) coupled to one or more first memory modules via a first link (e.g., Double Data Rate (DDR) or Low Power (DDR)). In some embodiments, the apparatus a second memory controller (e.g., a far memory controller) c