US-12619477-B2 - Application programming interface to cause graph code to wait on a semaphore

US12619477B2US 12619477 B2US12619477 B2US 12619477B2US-12619477-B2

Abstract

Apparatuses, systems, and techniques to facilitate graph code synchronization between application programming interfaces. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause graph code to wait on a semaphore used by another API.

Inventors

David Anthony Fontaine
Jason David Gaiser
Steven Arthur Gurfinkel
Sally Tessa Stevenson
Vladislav Zhurba
Stephen Anthony Bernard Jones

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260505
Application Date: 20211213

Claims (20)

1 . A processor, comprising: one or more circuits to perform a first application programming interface (API) to generate one or more nodes in a graph code, wherein the one or more nodes causes the graph code to wait on a semaphore used by another API.
2 . The processor of claim 1 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs).
3 . The processor of claim 1 , wherein the first API is to update the one or more nodes in the graph code based, at least in part, on an update signal node parameter API.
4 . The processor of claim 1 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs), and the first API is to add the one or more nodes, wherein the one or more nodes is a semaphore wait node, to the graph code based, at least in part, on a parameter that specifies a graph to which to add the semaphore wait node.
5 . The processor of claim 1 , wherein the semaphore is to be allocated by the other API, and the first API is to add the one or more nodes, wherein the one or more nodes is a semaphore wait node, to the graph code that is to perform a wait operation based, at least in part, on the semaphore, when the semaphore wait node is performed.
6 . The processor of claim 1 , wherein the first API is to add the one or more nodes, wherein the one or more nodes is a semaphore wait node, to the graph code, the one or more circuits are to perform a second API to set one or more parameters of the semaphore wait node, and the other API is a third API.
7 . The processor of claim 1 , wherein the graph code is executable graph code, and the first API is to set one or more parameters of the one or more nodes in the executable graph code.
8 . The processor of claim 1 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs), the other API is a graphics rendering API, and the semaphore is a counting semaphore.
9 . A system, comprising: one or more processors to perform a first application programming interface (API) to generate one or more nodes in a graph code, wherein the one or more nodes causes the graph code to wait on a semaphore used by another API; and one or more memories to store the graph code.
10 . The system of claim 9 , wherein the semaphore is to be allocated by the other API, and the first API is to set one or more parameters of the one or more nodes in the graph code that is to perform one or more wait operations based, at least in part, on the semaphore.
11 . The system of claim 9 , wherein the semaphore is to be allocated by the other API.
12 . The system of claim 9 , wherein the first API is to update the one or more nodes in the graph code based, at least in part, on an update signal node parameter API.
13 . The system of claim 9 , wherein the other API is to allocate the semaphore and the one or more memories are to store the semaphore.
14 . The system of claim 9 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs), the other API is to allocate the semaphore, and the other API is to use code not included in the graph code.
15 . A non-transitory machine-readable medium having stored thereon a first application programming interface (API), which if performed by one or more processors, is to generate one or more nodes in a graph code, wherein the one or more nodes causes the graph code to at least wait on a semaphore used by another API.
16 . The non-transitory machine-readable medium of claim 15 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs), and the semaphore is a binary semaphore to be allocated by the other API.
17 . The non-transitory machine-readable medium of claim 15 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs), and the semaphore is a counting semaphore to be allocated by the other API.
18 . The non-transitory machine-readable medium of claim 15 , wherein the semaphore is to be allocated by the other API based, at least in part, on code not included in the graph code, and the first API is to add the one or more nodes to the graph code that is to perform a wait operation based, at least in part, on a value of the semaphore.
19 . The non-transitory machine-readable medium of claim 15 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs), and the semaphore is a binary semaphore.
20 . The non-transitory machine-readable medium of claim 15 , wherein the graph code is to be performed, at least in part, by one or more graphics processing units (GPUs), and the semaphore is a counting semaphore.

Description

TECHNICAL FIELD At least one embodiment pertains to processing resources used to execute one or more programs written for a parallel computing platform and application interface. For example, at least one embodiment pertains to processors or computing systems that perform an application programming interface (API) according to various novel techniques described herein. BACKGROUND Performing computational operations using code from a first API and code from another API can use significant time, power, or computing resources. The amount of time, power, or computing resources can be improved. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram that illustrates a computing environment, according to at least one embodiment; FIG. 2 illustrates a diagram of a graph with semaphore nodes, according to at least one embodiment; FIG. 3 illustrates a diagram of an add semaphore signal node API call, according to at least one embodiment; FIG. 4 illustrates a diagram of a set semaphore signal node parameters API call, according to at least one embodiment; FIG. 5 illustrates a diagram of a get semaphore signal node parameters API call, according to at least one embodiment; FIG. 6 illustrates a diagram of an update executable graph semaphore signal node parameters API call, according to at least one embodiment; FIG. 7 illustrates a diagram of an add semaphore wait node API call, according to at least one embodiment; FIG. 8 illustrates a diagram of a set semaphore wait node parameters API call, according to at least one embodiment; FIG. 9 illustrates a diagram of a get semaphore wait node parameters API call, according to at least one embodiment; FIG. 10 illustrates a diagram of an update executable graph semaphore wait node parameters API call, according to at least one embodiment; FIG. 11 is a flowchart of a technique of adding and updating a semaphore signal node, according to at least one embodiment; FIG. 12 is a flowchart of a technique of adding and updating a semaphore wait node, according to at least one embodiment; FIG. 13 illustrates an exemplary data center, in accordance with at least one embodiment; FIG. 14 illustrates a processing system, in accordance with at least one embodiment; FIG. 15 illustrates a computer system, in accordance with at least one embodiment; FIG. 16 illustrates a system, in accordance with at least one embodiment; FIG. 17 illustrates an exemplary integrated circuit, in accordance with at least one embodiment; FIG. 18 illustrates a computing system, according to at least one embodiment; FIG. 19 illustrates an APU, in accordance with at least one embodiment; FIG. 20 illustrates a CPU, in accordance with at least one embodiment; FIG. 21 illustrates an exemplary accelerator integration slice, in accordance with at least one embodiment; FIGS. 22A-22B illustrate exemplary graphics processors, in accordance with at least one embodiment; FIG. 23A illustrates a graphics core, in accordance with at least one embodiment; FIG. 23B illustrates a GPGPU, in accordance with at least one embodiment; FIG. 24A illustrates a parallel processor, in accordance with at least one embodiment; FIG. 24B illustrates a processing cluster, in accordance with at least one embodiment; FIG. 24C illustrates a graphics multiprocessor, in accordance with at least one embodiment; FIG. 25 illustrates a graphics processor, in accordance with at least one embodiment; FIG. 26 illustrates a processor, in accordance with at least one embodiment; FIG. 27 illustrates a processor, in accordance with at least one embodiment; FIG. 28 illustrates a graphics processor core, in accordance with at least one embodiment; FIG. 29 illustrates a PPU, in accordance with at least one embodiment; FIG. 30 illustrates a GPC, in accordance with at least one embodiment; FIG. 31 illustrates a streaming multiprocessor, in accordance with at least one embodiment; FIG. 32 illustrates a software stack of a programming platform, in accordance with at least one embodiment; FIG. 33 illustrates a CUDA implementation of a software stack of FIG. 32, in accordance with at least one embodiment; FIG. 34 illustrates a ROCm implementation of a software stack of FIG. 32, in accordance with at least one embodiment; FIG. 35 illustrates an OpenCL implementation of a software stack of FIG. 32, in accordance with at least one embodiment; FIG. 36 illustrates software that is supported by a programming platform, in accordance with at least one embodiment; FIG. 37 illustrates compiling code to execute on programming platforms of FIGS. 32-35, in accordance with at least one embodiment; FIG. 38 illustrates in greater detail compiling code to execute on programming platforms of FIGS. 32-35, in accordance with at least one embodiment; FIG. 39 illustrates translating source code prior to compiling source code, in accordance with at least one embodiment; FIG. 40A illustrates a system configured to compile and execute CUDA source code using different types of processing units, in