US-12625651-B1 - Application programming interface to bind memory to shared virtual memory

US12625651B1US 12625651 B1US12625651 B1US 12625651B1US-12625651-B1

Abstract

Apparatuses, systems, and techniques to facilitate memory management. In at least one embodiment, an application programming interface is performed to enable access to shared virtual memory by a plurality of processors.

Inventors

James Christopher Beyer
Paul J. Sidenblad
Vyas Venkataraman
Chetan Gokhale
Cory Perry
Ying Liang
Harold Carter Edwards

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260512
Application Date: 20220404

Claims (20)

1 . One or more processors, comprising: circuitry to: in response to an application programming interface (API) call, cause each of a plurality of processors to bind a portion of physical memory to an indicator of a shared memory location provided to the API; and enable access to the shared memory location when at least a number of processors allocate physical memory for use by the plurality of processors, the number indicated by an input parameter to the API.
2 . The one or more processors of claim 1 , wherein the shared memory location is multicast memory.
3 . The one or more processors of claim 1 , wherein: a first processor of the plurality of processors is to be on a first device; and a second processor of the plurality of processors is to be on a second device.
4 . The one or more processors of claim 1 , wherein the location of the shared memory location is to be shared between the plurality of processors.
5 . The one or more processors of claim 1 , wherein the circuitry is to perform a memory manager to coordinate the shared memory location between the plurality of processors.
6 . The one or more processors of claim 1 , wherein at least one processor of the plurality of processors is to designate a physical memory location that corresponds to the shared memory location.
7 . The one or more processors of claim 1 , wherein the circuitry comprises a switch to route a location of the shared memory location to different physical memory locations located on different processors.
8 . The one or more processors of claim 1 , wherein at least one processor of the plurality of processors is a graphics processing unit (GPU).
9 . The one or more processors of claim 1 , wherein: a first processor of the plurality of processors is of a first node of a compute cluster; a second processor of the plurality of processors is of a second node of the compute cluster; and the first processor is to access memory of the second processor using the shared memory location.
10 . The one or more processors of claim 1 , wherein the circuitry is to further allow at least one of the processors of the plurality of processors to use a virtual memory address of the shared memory location, to which the one or more processors can write data to cause the at least one of the processors of the plurality of processors to store the data in physical memory.
11 . The one or more processors of claim 1 , wherein one or more parameters to the API indicate an allocation size for physical memory corresponding to the shared memory location.
12 . A computer-implemented method, comprising: in response to an application programming interface (API) call comprising a handle indicative of a shared memory location: causing a plurality of processors to each bind a portion of physical memory to the handle; and enabling access to the shared memory location when at least a number of processors allocate physical memory for use by the plurality of processors, the number indicated by an input parameter to the API.
13 . The computer-implemented method of claim 12 , wherein the shared memory location is multicast memory.
14 . The computer-implemented method of claim 12 , wherein: a first processor of the plurality of processors is of a first node of a compute cluster; and a second processor of the plurality of processors is of a second node of the compute cluster.
15 . The computer-implemented method of claim 12 , further comprising: sharing a location of the shared memory location between the plurality of processors using the handle.
16 . The computer-implemented method of claim 12 , further comprising: performing a memory manager to coordinate the shared memory location between the plurality of processors.
17 . The computer-implemented method of claim 12 , further comprising, in response to the API call: selecting a processor of the plurality of processors; allocating a physical memory location in memory of the processor; and mapping the physical memory location to a location of the shared memory location.
18 . The computer-implemented method of claim 12 , wherein at least one processor of the plurality of processors is a graphics processing unit (GPU).
19 . The computer-implemented method of claim 12 , further comprising: routing a location of the shared memory location to different physical memory locations located on different devices.
20 . The computer-implemented method of claim 12 , further comprising: performing one or more compute operations using the shared memory location.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application incorporates by reference for all purposes the full disclosures of co-pending U.S. patent application Ser. No. 17/712,991, filed concurrently herewith, entitled “APPLICATION PROGRAMMING INTERFACE TO ALLOCATE SHARED VIRTUAL MEMORY”, co-pending U.S. patent application Ser. No. 17/715,021, filed concurrently herewith, entitled “APPLICATION PROGRAMMING INTERFACE TO ALLOCATE MEMORY FOR SHARED VIRTUAL MEMORY”, and co-pending U.S. patent application Ser. No. 17/715,054, filed concurrently herewith, entitled “APPLICATION PROGRAMMING INTERFACE TO DEALLOCATE SHARED VIRTUAL MEMORY”. FIELD At least one embodiment pertains to processing resources used to execute one or more CUDA programs. For example, at least one embodiment pertains to processing resources used to execute one or more CUDA programs that share virtual memory between processors, designate physical memory for shared virtual memory, allocate physical memory for shared virtual memory, enable access to shared virtual memory, undesignate physical memory associated with shared virtual memory, and deallocate physical memory associated with shared virtual memory. BACKGROUND Performing computational operations can use significant memory, time, or computing resources. For example, a graphics processing unit (GPU) cluster can have several nodes (e.g., servers) where each node has several GPUs. Often, a GPU can only directly access its own memory or can only access memory from other GPUs on the same node. In such cases, if a GPU needs to send memory contents to other GPUs on other nodes, an expensive operation must be performed to copy memory between one or more GPUs. This can be complicated for programmers in addition to being an inefficient use of resources. An amount of memory, time, or computing resources used to perform computation operations can be improved by sharing memory between processors using virtual memory. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an example computer system where processor memory is shared between processors, in accordance with at least one embodiment; FIG. 2 illustrates an example computer system where processor memory is shared between nodes of a compute cluster, in accordance with at least one embodiment; FIG. 3 illustrates an example computer system where a switch of a compute cluster node accesses memory of processors of a compute cluster node, in accordance with at least one embodiment; FIG. 4 illustrates an example computer system where processor memory is shared between processors using a memory handle, in accordance with at least one embodiment; FIG. 5 illustrates an example computer system where nodes of a compute cluster communicate using links, in accordance with at least one embodiment; FIG. 6 illustrates an example computer system where memory is allocated for shared virtual memory, in accordance with at least one embodiment; FIG. 7 illustrates an example computer system where virtual addresses are created for memory allocated for shared virtual memory, in accordance with at least one embodiment; FIG. 8 illustrates an example computer system where a shared memory map is created for shared virtual memory, in accordance with at least one embodiment; FIG. 9 illustrates an example computer system where processors access virtual memory, in accordance with at least one embodiment; FIG. 10 illustrates an example computer system where processors access shared virtual memory, in accordance with at least one embodiment; FIG. 11 illustrates an example process for sharing processor memory between processors, in accordance with at least one embodiment; FIG. 12 illustrates an example process for determining sharing for processor memory shared between processors, in accordance with at least one embodiment; FIG. 13 illustrates an example process for providing resources for processor memory shared between processors, in accordance with at least one embodiment; FIG. 14 illustrates an example computer system where memory is shared between processors, in accordance with at least one embodiment; FIG. 15 illustrates an example computer system where multicast memory is shared between processors, in accordance with at least one embodiment; FIG. 16 illustrates an example application programming interface (API) to cause shared virtual memory to be allocated for use by a plurality of processors, in accordance with at least one embodiment; FIG. 17 illustrates an example application programming interface (API) to cause physical memory corresponding to shared virtual memory to be designated for use by a plurality of processors, in accordance with at least one embodiment; FIG. 18 illustrates an example application programming interface (API) to enable access to shared virtual memory by a plurality of process, in accordance with at least one embodiment; FIG. 19 illustrates an example application programming interface (API) to cause physical memory corresponding to shared virtual memory