KR-20260068040-A - Heterogeneous Hybrid Computing System with a CXL-based Unified Shared Memory Pool
Abstract
The present invention discloses a heterogeneous hybrid computing system in which a CPU, GPU, and QPU share a unified shared memory pool via an interconnect bus based on the Compute Express Link (CXL) protocol. The QPU includes a quantum interface controller that functions as a CXL endpoint and executes quantum work descriptors directly by reading them from the instruction staging area of the shared memory pool via the CXL.mem protocol without the intermediation of a dedicated quantum control controller. The shared memory pool is logically divided into an instruction staging area, a result return area, and a quantum preprocessing area by an access control table of a CXL switch, and the preprocessing results of the GPU are directly transmitted to the QPU via a zero-copy path through an address pointer. Additionally, it includes a quantum work descriptor pre-fetch mechanism linked to qubit coherence remaining time information and a memory pool dynamic reallocation function based on the quantum computation load ratio.
Inventors
- 안범주
Assignees
- 안범주
Dates
- Publication Date
- 20260513
- Application Date
- 20260426
Claims (10)
- In a system that executes quantum-classical hybrid operations based on user commands, A host central processing unit (CPU) that schedules all computation tasks; A graphics processing unit (GPU) that performs data parallel processing; A quantum processing unit (QPU) that performs quantum operations; and A unified shared memory pool that is commonly accessible by the CPU, GPU, and QPU; comprising, The above CPU, GPU, and QPU share the above shared memory pool with each other through an interconnect bus based on the CXL (Compute Express Link) protocol, and The CPU records an instruction set for executing quantum operations in the shared memory pool, and A heterogeneous hybrid computing system characterized in that the above QPU performs operations by directly reading the work instructions from the shared memory pool through the above CXL protocol.
- In paragraph 1, The above shared memory pool includes a Classical Memory area for the operation of the above QPU, and In the above classical control memory area, Control pulse sequence data for qubit manipulation; Weight data used for computation by the error correction analog NPU inside the above QPU; and A heterogeneous hybrid computing system characterized by the fact that the state result value of a qubit measured from the above QPU is temporarily stored.
- In paragraph 2, The above QPU maintains the weight data received from the shared memory pool by caching it in an internal local SRAM or register, and A heterogeneous hybrid computing system characterized by the analog NPU stacked in three dimensions (3D) inside the QPU correcting errors in the qubit array in real time using the cached weight data without accessing external memory.
- In paragraph 1, The above QPU includes a quantum interface controller that functions as a CXL Endpoint, and The above quantum interface controller loads the work instruction by directly addressing a specific address area of the shared memory pool through the CXL.mem protocol, and A heterogeneous hybrid computing system characterized in that a dedicated quantum control controller is not interposed between the above-mentioned quantum interface controller and the above-mentioned shared memory pool.
- In paragraph 4, The above shared memory pool includes a plurality of memory regions logically partitioned by a CXL switch, The above plurality of memory regions are, Instruction Staging Region where the above CPU has exclusive write rights and records the above work instructions; A Result Return Region in which the above QPU has exclusive write rights and records the qubit measurement result value; and The above CPU and GPU jointly have read and write rights, and the system includes a Classical Preprocessing Region where preprocessed data to be used as computational input for the above QPU is stored; A heterogeneous hybrid computing system characterized in that the quantum interface controller of the above QPU detects whether a new work instruction is written by polling only the address range of the instruction staging area.
- In paragraph 5, The above CXL switch includes an Access Control Table (ACT) that hardware-enforces access rights to each memory area on a device ID basis, and When the CPU completes writing a work instruction to the instruction staging area, the CXL switch issues a cache invalidation signal to the quantum interface controller of the QPU for the area, A heterogeneous hybrid computing system characterized by ensuring that the above QPU always obtains the latest data when reading work instructions from the instruction staging area.
- In paragraph 5, The work instructions that the above CPU records in the instruction staging area are, Gate sequence data constituting the quantum circuit to be executed; Qubit index information to which each gate operation will be applied; A destination address within the result return area to record the result value after the operation is completed; and It is written as a standardized Quantum Task Descriptor (QTD) structure including a Completion Flag field for issuing an interrupt to the CPU when the above QPU completes the operation, and A heterogeneous hybrid computing system characterized in that the quantum interface controller of the above QPU parses the above QTD structure and directly applies the corresponding gate sequence to the qubit array.
- In paragraph 5, The above GPU directly writes the computationally completed data to the above classical preprocessing area, and The CPU inserts the address pointer of the corresponding data within the classical preprocessing area into the QTD structure of the instruction staging area, and The quantum interface controller of the above QPU refers to the address pointer to directly read data from the classical preprocessing region without additional memory copy and use it as input for quantum computation, A heterogeneous hybrid computing system characterized in that data consistency between the time of completion of writing of the above GPU and the time of reading of the above QPU is guaranteed by the coherence protocol of the above CXL switch.
- In Paragraph 7, The CPU dynamically rearranges the execution order of the gate sequence within the QTD structure recorded in the instruction staging area based on the qubit coherence remaining time information received from the QPU, and The quantum interface controller of the above QPU pre-fetches the instruction staging region via the CXL.mem protocol and pre-loads the next QTD structure before the processing of the currently executing QTD structure is completed, A heterogeneous hybrid computing system characterized by enabling successive quantum gate operations to be performed without delay before qubit decoherence occurs.
- In paragraph 5, The above CPU monitors the computational load ratio of the currently executing quantum-classical hybrid task in real time, and If the load ratio of the above quantum operation exceeds a predetermined threshold, the CPU issues a memory reconfiguration command to the CXL switch to dynamically reallocate a portion of the above classical preprocessing area to the instruction staging area, and If the load ratio of the above quantum operation returns to below the above threshold, the CPU returns the reallocated area to the classical preprocessing area, A heterogeneous hybrid computing system characterized in that, during the above dynamic reallocation process, the ongoing memory access of the GPU is protected by the transaction ordering mechanism of the CXL switch.
Description
Heterogeneous Hybrid Computing System with a CXL-based Unified Shared Memory Pool Heterogeneous Hybrid Computing System with a CXL-based Unified Shared Memory Pool The present invention relates to a quantum-classical hybrid computing system, and more specifically, to a heterogeneous hybrid computing system in which a central processing unit (CPU), a graphics processing unit (GPU), and a quantum processing unit (QPU) commonly access a unified memory pool through an interconnect bus based on the CXL (Compute Express Link) protocol, wherein the QPU directly reads and executes quantum work instructions from the shared memory pool via the CXL.mem protocol without the intermediary of a dedicated quantum control controller. As quantum computing technology rapidly matures, there is a rapidly increasing demand for so-called Quantum-Classical Hybrid computing systems that operate classical computing resources (CPUs, GPUs) and quantum processing units (QPUs) within a single integrated computing environment. These hybrid systems are attracting attention as a means to provide a substantial computational advantage for NP-hard problems that are difficult to solve in polynomial time using classical computing alone, such as constraint optimization, quantum chemistry simulations, quantum machine learning, and financial risk calculation. However, current quantum-classical hybrid computing systems suffer from fundamental interconnect structural limitations. Conventional representative systems, such as specific hybrid quantum-classical computing platforms, employ a structure in which a dedicated quantum system controller (e.g., a controller dedicated to microwave pulse generation) is placed as a separate hardware layer to combine the QPU with classical computing resources, with the dedicated controller acting as an intermediary between the CPU and the QPU. In this structure, the CPU must pass through a multi-stage path in which it transmits work instructions to the dedicated controller, which then converts them into analog control signals suitable for the QPU and applies them. This structure involving a dedicated controller inevitably introduces additional latency in the instruction transmission path from the CPU to the QPU, causing increased system configuration complexity and infrastructure costs. Furthermore, in conventional hybrid computing systems, the CPU, GPU, and QPU each utilize separate, independent memory spaces. The GPU possesses its own dedicated device memory (DRAM or HBM), the QPU uses the local memory of a dedicated controller, and the CPU uses host DRAM. In this architecture, an explicit memory copy operation (DMA Transfer, Memcpy) via the CPU host memory is mandatory to transfer data preprocessed by the GPU to the QPU's computational input. This unnecessary memory copying causes increased latency and wasted memory bandwidth, becoming a critical bottleneck, particularly in applications such as Quantum Error Correction (QEC) that require a feedback loop of within a few microseconds between the classical decoder and the QPU. In addition, in conventional systems, quantum circuit instructions applied to the QPU are transmitted by the CPU to a dedicated controller through software layers (driver, runtime, compiler). This software stack contains various indeterminate delay factors, such as operating system scheduling delays, interrupt processing overhead, and serialization costs, making it difficult to reliably complete quantum gate operation sequences within the physical constraint of the qubit decoherence time (on the level of several microseconds to several milliseconds). CXL (Compute Express Link) is an open high-speed interconnect standard that implements three sub-protocols—CXL.io, CXL.cache, and CXL.mem—on top of the PCIe 6.0 physical layer, supporting cache coherent memory sharing and resource pooling between CPUs and accelerators. Since the launch of the CXL Consortium, the application of CXL to classical accelerators such as GPUs, FPGAs, and memory buffers has been extensively studied; however, a structure in which a QPU is integrated into the same shared memory pool as classical devices as a CXL Endpoint (EP), and the QPU directly reads quantum work instructions from that pool via the CXL.mem protocol without a dedicated controller, has not been specifically disclosed in the prior art. FIG. 1 is a system block diagram showing the overall configuration of a CXL-based heterogeneous hybrid computing system according to one embodiment of the present invention. FIG. 2 is a hierarchical diagram showing the protocol layer structure of a CXL interconnect bus and the endpoint connection relationship of each device according to an embodiment of the present invention. FIG. 3 is a block diagram showing the internal structure of a CXL endpoint of a QPU and the configuration of a quantum interface controller according to one embodiment of the present invention. FIG. 4 is a memory map showing the logical area partitioni