US-12619570-B2 - Memory pooling and sharing enabling scalable LLM inference over scaleup AI fabrics

US12619570B2US 12619570 B2US12619570 B2US 12619570B2US-12619570-B2

Abstract

Modern datacenters require efficient mechanisms for memory resource sharing and utilization across distributed computing environments. Some of the disclosed embodiments introduce systems and methods incorporating a Resource Provisioning Unit (RPU) that performs host-to-host physical address translations, enabling external hosts to access memory resources utilizing CXL protocols. The system includes a processor coupled to DRAM, an MMU for virtual-to-physical address mapping, and a CXL device for host communication. The RPU enables hosts to access the DRAM utilizing messages conforming to CXL protocols, including CXL.mem with Host-managed Device Memory (HDM) regions and CXL.io with Transaction Layer Packets. Some embodiments support multiple hosts and CXL memory expanders utilizing additional CXL devices and root ports. The embodiments enable flexible memory architecture, improved resource utilization, and scalable memory sharing suitable for AI workloads, cloud computing, and next-generation datacenter deployments.

Inventors

Ronen Aharon Hyatt
Gaya Opal Hyatt
Ethan Sharon Hyatt

Assignees

UNIFABRIX LTD.

Dates

Publication Date: 20260505
Application Date: 20251028

Claims (20)

1 . A system, comprising: a processor comprising a coherent interconnect; the processor is coupled to at least 64 GB of memory and is configured to utilize physical addresses within a Host Physical Address (HPA) space to access the memory, and to execute an operating system (OS) that utilizes a virtual address space; a memory management unit (MMU) configured to enable access to the memory based on mapping addresses within the virtual address space to physical addresses within the HPA space; a resource provisioning unit (RPU) comprising a Compute Express Link (CXL) device configured to communicate with an entity according to a protocol based on CXL; and wherein the RPU is further coupled to the coherent interconnect and configured to perform host-to-host physical address translations, whereby the host-to-host physical address translations enable the entity to access the memory via the CXL device.
2 . The system of claim 1 , wherein the entity utilizes a second HPA space, and the host-to-host physical address translations translate physical addresses within the second HPA space to physical addresses within the HPA space.
3 . The system of claim 2 , further comprising a CXL Root Port configured to communicate with a CXL memory expander that utilizes a Device Physical Address (DPA) space; and wherein at least one of the operating system, system firmware, or the memory expander is configured to map between physical addresses within the HPA space and physical addresses within the DPA space, which enable the entity to utilize the memory and/or the CXL memory expander.
4 . The system of claim 3 , wherein the RPU further comprises a second CXL device configured to communicate with a second entity utilizing a second protocol based on CXL, whereby the second entity utilizes a third HPA space; and wherein the RPU is further configured to translate physical addresses within the third HPA space to physical addresses within the HPA space, which enable the second entity to utilize the CXL memory expander.
5 . The system of claim 2 , wherein the RPU further comprises a second CXL device configured to communicate with a second entity utilizing a second protocol based on CXL, whereby the second entity utilizes a third HPA space, and the RPU is further configured to translate physical addresses within the third HPA space to physical addresses within the HPA space, which enable the second entity to utilize the memory.
6 . The system of claim 5 , wherein the entity comprises a host coupled to the processor via at least one of a CXL root port or a CXL switch, and the second protocol based on CXL is different from the protocol based on CXL.
7 . The system of claim 2 , wherein the processor comprises a Modified CPU or GPU (MxPU), the memory comprises dynamic random-access memory (DRAM), and the RPU enables the entity to utilize more than 250 GB of the DRAM.
8 . The system of claim 1 , wherein the memory comprises dynamic random-access memory (DRAM) that is coupled via memory channels to the processor, and the CXL device comprises a Global Fabric-Attached Memory (G-FAM) Device (GFD).
9 . The system of claim 1 , wherein the protocol based on CXL utilizes CXL.mem semantics, and the CXL device exposes at least one Host-managed Device Memory (HDM) address region to the entity.
10 . The system of claim 1 , wherein the protocol based on CXL utilized CXL.io semantics, and the host-to-host physical address translation translates from physical addresses carried in CXL.io UIOMRd Transaction Layer Packets (TLPs) received from the entity to physical addresses within the HPA space.
11 . The system of claim 1 , wherein the processor comprises multiple cores, from which at least one is a hidden core; and wherein the RPU is further configured to utilize the hidden core for internal tasks, wherein the internal tasks comprise at least one of internal firmware processing, CXL Fabric Manager (FM) API processing, processing in memory (PIM), near-memory processing, or housekeeping tasks.
12 . The system of claim 11 , wherein the hidden core is isolated from user access and visibility, providing user-infrastructure isolation.
13 . The system of claim 1 , wherein the processor comprises multiple cores, from which at least one is hidden and is utilized for collection of memory telemetry.
14 . The system of claim 1 , wherein the processor comprises multiple cores, from which at least one is a hidden core utilized for secure key storage and management for encrypting and decrypting data transmitted according to the protocol based on CXL, leveraging user-infrastructure isolation provided by the hidden core.
15 . The system of claim 14 , further comprising a hardware-accelerated cryptographic engine, wherein the hidden core is configured to utilize the hardware-accelerated cryptographic engine for performing at least part of the cryptographic operations on the data transmitted according to the protocol based on CXL.
16 . The system of claim 14 , wherein the hidden core enables support for confidential computing over memory exposed by the RPU via the CXL device; whereby confidential computing performs computation within a secure isolated environment to protect data in use.
17 . The system of claim 1 , wherein the processor comprises multiple cores, from which at least one core is a hidden core; and wherein the RPU is further configured to utilize the hidden core for error handling and/or correction tasks within a memory pool comprising the memory, enhancing data integrity and reliability.
18 . The system of claim 17 , wherein the error handling and/or correction tasks further comprise predictive failure analysis (PFA) operations, configured to predict and handle imminent failure of memory components within the memory pool, thereby preempting potential data loss and system downtime.
19 . The system of claim 1 , wherein the memory comprises dynamic random-access memory (DRAM), and the processor comprises multiple cores, from which at least one core is a hidden core; and wherein the RPU is further configured to utilize the hidden core for controlling or managing memory access scheduling within a memory pool comprising the DRAM, to improve memory utilization and throughput.
20 . The system of claim 1 , wherein the processor comprises multiple cores, from which at least one core is a hidden core; and wherein the RPU is further configured to utilize the hidden core for managing security protocols within a memory pool comprising the memory, including data encryption and/or access controls.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to: U.S. Provisional Patent Application No. 63/895,053, filed Oct. 7, 2025; U.S. Provisional Patent Application No. 63/874,393, filed Sep. 2, 2025; U.S. Provisional Patent Application No. 63/856,653, filed Aug. 3, 2025; U.S. Provisional Patent Application No. 63/826,342, filed Jun. 18, 2025; U.S. Provisional Patent Application No. 63/811,859, filed May 25, 2025; U.S. Provisional Patent Application No. 63/784,089, filed Apr. 5, 2025; U.S. Provisional Patent Application No. 63/752,940, filed Feb. 3, 2025; U.S. Provisional Patent Application No. 63/743,658, filed Jan. 10, 2025; And U.S. Provisional Patent Application No. 63/734,031, filed Dec. 13, 2024. This Application is also a Continuation of U.S. patent application Ser. No. 19/017,420, filed Jan. 11, 2025, which is a Continuation-In-Part of U.S. patent application Ser. No. 18/981,443, filed Dec. 13, 2024, U.S. patent application Ser. No. 19/017,420 claims priority to: U.S. Provisional Patent Application No. 63/719,640, filed 12 Nov. 2024; U.S. Provisional Patent Application No. 63/701,554, filed 30 Sep. 2024; U.S. Provisional Patent Application No. 63/695,957, filed 18 Sep. 2024; U.S. Provisional Patent Application No. 63/678,045, filed 31 Jul. 2024; U.S. Provisional Patent Application No. 63/652,165, filed 27 May 2024; and U.S. Provisional Patent Application No. 63/641,404, filed 1 May 2024, U.S. patent application Ser. No. 18/981,443 claims priority to U.S. Provisional Patent Application No. 63/609,833, filed 13 Dec. 2023. BACKGROUND Modern datacenters face unprecedented challenges in memory resource utilization and sharing as workloads become increasingly memory-intensive and distributed. Applications spanning artificial intelligence (AI), machine learning (ML), Large Language Model (LLM) inference, database analytics, and virtualized environments require flexible access to large memory pools that may exceed the capacity limitations of individual servers, whether the servers are CPU-based, GPU-based, or accelerator-based. These evolving demands have driven the development of memory disaggregation technologies that decouple memory resources from compute nodes, enabling more efficient utilization of datacenter infrastructure. Compute Express Link (CXL) has emerged as a promising interconnect technology for memory expansion and pooling, providing protocols such as CXL.io, CXL.mem, and CXL.cache that enable high-bandwidth, low-latency communication between processors and memory devices. CXL allows hosts to access memory resources beyond their local physical limitations through standardized interfaces and protocols. However, current CXL implementations face challenges when hosts need to share memory resources, particularly in scenarios requiring physical address space isolation and translation between different physical address spaces. Traditional memory architectures bind memory resources tightly to specific processors, creating inefficiencies when workloads have varying memory requirements. While CXL enables memory expansion utilizing device attachment, existing solutions typically require each host to manage its own view of memory resources without efficient mechanisms for sharing memory pools among hosts. This limitation becomes particularly apparent in multi-tenant environments, containerized applications, and distributed computing scenarios wherein different hosts may benefit from accessing shared memory resources. Moreover, address translation mechanisms in current systems primarily focus on virtual-to-physical translations within a single host domain through Memory Management Units (MMUs). When hosts attempt to access shared memory resources, the lack of host-to-host physical address translation capabilities creates barriers to memory sharing. Hosts operate within their own Host Physical Address (HPA) spaces, and coordinating access to shared resources across these disparate physical address spaces remains challenging. SUMMARY Some of the disclosed embodiments introduce novel system-level architectural solutions leveraging RPUs to enable dynamic memory sharing and pooling across multiple hosts in datacenter environments. These embodiments provide host-to-host physical address translation capabilities that allow different hosts to access shared memory resources through protocols based on CXL, overcoming traditional boundaries between isolated Host Physical Address (HPA) spaces. By implementing RPUs with CXL devices coupled to processor coherent interconnects, the embodiments enable memory sharing between hosts while maintaining address space isolation and security. Some embodiments optionally support Multi-Headed Device (MHD) configurations, enabling multiple hosts to simultaneously access the same memory resources through separate CXL Endpoints. The embodiments address challenges in memory disaggregation and resource utilization for contemporary and future workloads including AI