US-12625744-B2 - Application programming interface to cause graph code to wait on a semaphore
Abstract
Apparatuses, systems, and techniques to facilitate graph code synchronization between application programming interfaces. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause graph code to wait on a semaphore used by another API.
Inventors
- David Anthony Fontaine
- Jason David Gaiser
- Steven Arthur Gurfinkel
- Sally Tessa Stevenson
- Vladislav Zhurba
- Stephen Anthony Bernard Jones
Assignees
- NVIDIA CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20240617
Claims (18)
- 1 . A processor comprising: one or more core complexes, wherein the one or more core complexes include one or more central processing unit (CPU) cores; one or more graphics complexes, wherein the one or more graphics complexes include one or more compute units (CUs); an L2 cache; one or more fabric interconnects; a memory controller; and one or more input/output (I/O) interfaces; at least one comprising a peripheral component interconnect express (PCIe) interface; wherein: the processor is to perform an application program interface (API) to create an external semaphore wait node to be added to a graph, wherein the API call is to include: a graph identifier parameter; a node identifier parameter; a dependencies parameter; a number of dependencies parameter; and a node parameters parameter; and the external semaphore wait node, if performed, is to cause the graph to wait on a semaphore used by another API.
- 2 . The processor of claim 1 , wherein the graph identifier parameter is to specify the graph to which to add the external semaphore wait node.
- 3 . The processor of claim 1 , wherein the node identifier parameter is an outgoing parameter to return a pointer to a location of the external semaphore wait node in memory.
- 4 . The processor of claim 1 , wherein the dependencies parameter is to specify dependencies of the external semaphore wait node.
- 5 . The processor of claim 1 , wherein the number of dependencies parameter is to specify a number of dependencies of the external semaphore wait node.
- 6 . The processor of claim 1 , wherein the node parameters parameter is to specify parameters for the external semaphore wait node.
- 7 . A system comprising: memory; and a processor comprising: one or more core complexes, wherein the one or more core complexes include one or more central processing unit (CPU) cores; one or more graphics complexes, wherein the one or more graphics complexes include one or more compute units (CUs); an L2 cache; one or more fabric interconnects; a memory controller; and one or more input/output (I/O) interfaces; at least one comprising a peripheral component interconnect express (PCIe) interface; wherein: the processor is to perform an application program interface (API) to create an external semaphore wait node to be added to a graph, wherein the API call is to include: a graph identifier parameter; a node identifier parameter; a dependencies parameter; a number of dependencies parameter; and a node parameters parameter; and the external semaphore wait node, if performed, is to cause the graph to wait on a semaphore used by another API.
- 8 . The system of claim 7 , wherein the graph identifier parameter is to specify the graph to which to add the external semaphore wait node.
- 9 . The system of claim 7 , wherein the node identifier parameter is an outgoing parameter to return a pointer to a location of the external semaphore wait node in memory.
- 10 . The system of claim 7 , wherein the dependencies parameter is to specify dependencies of the external semaphore wait node.
- 11 . The system of claim 7 , wherein the number of dependencies parameter is to specify a number of dependencies of the external semaphore wait node.
- 12 . The system of claim 7 , wherein the node parameters parameter is to specify parameters for the external semaphore wait node.
- 13 . A method comprising: performing an application program interface (API) to create an external semaphore wait node to be added to a graph, wherein the API call is to include: a graph identifier parameter; a node identifier parameter; a dependencies parameter; a number of dependencies parameter; and a node parameters parameter; wherein the external semaphore wait node, if performed, is to cause the graph to wait on a semaphore used by another API; and wherein the API is to be performed by a processor comprising: one or more core complexes, wherein the one or more core complexes include one or more central processing unit (CPU) cores; one or more graphics complexes, wherein the one or more graphics complexes include one or more compute units (CUs); an L2 cache; one or more fabric interconnects; a memory controller; and one or more input/output (I/O) interfaces; at least one comprising a peripheral component interconnect express (PCIe) interface.
- 14 . The method of claim 13 , wherein the graph identifier parameter is to specify the graph to which to add the external semaphore wait node.
- 15 . The method of claim 13 , wherein the node identifier parameter is an outgoing parameter to return a pointer to a location of the external semaphore wait node in memory.
- 16 . The method of claim 13 , wherein the dependencies parameter is to specify dependencies of the external semaphore wait node.
- 17 . The method of claim 13 , wherein the number of dependencies parameter is to specify a number of dependencies of the external semaphore wait node.
- 18 . The method of claim 13 , wherein the node parameters parameter is to specify parameters for the external semaphore wait node.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS The present application is a continuation of U.S. patent application Ser. No. 17/549,620, entitled “APPLICATION PROGRAMMING INTERFACE TO CAUSE GRAPH CODE TO WAIT ON A SEMAPHORE” and filed on Dec. 13, 2021, the entire contents of which are incorporated herein by reference for all purposes. TECHNICAL FIELD At least one embodiment pertains to processing resources used to execute one or more programs written for a parallel computing platform and application interface. For example, at least one embodiment pertains to processors or computing systems that perform an application programming interface (API) according to various novel techniques described herein. BACKGROUND Performing computational operations using code from a first API and code from another API can use significant time, power, or computing resources. The amount of time, power, or computing resources can be improved. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram that illustrates a computing environment, according to at least one embodiment; FIG. 2 illustrates a diagram of a graph with semaphore nodes, according to at least one embodiment; FIG. 3 illustrates a diagram of an add semaphore signal node API call, according to at least one embodiment; FIG. 4 illustrates a diagram of a set semaphore signal node parameters API call, according to at least one embodiment; FIG. 5 illustrates a diagram of a get semaphore signal node parameters API call, according to at least one embodiment; FIG. 6 illustrates a diagram of an update executable graph semaphore signal node parameters API call, according to at least one embodiment; FIG. 7 illustrates a diagram of an add semaphore wait node API call, according to at least one embodiment; FIG. 8 illustrates a diagram of a set semaphore wait node parameters API call, according to at least one embodiment; FIG. 9 illustrates a diagram of a get semaphore wait node parameters API call, according to at least one embodiment; FIG. 10 illustrates a diagram of an update executable graph semaphore wait node parameters API call, according to at least one embodiment; FIG. 11 is a flowchart of a technique of adding and updating a semaphore signal node, according to at least one embodiment; FIG. 12 is a flowchart of a technique of adding and updating a semaphore wait node, according to at least one embodiment; FIG. 13 illustrates an exemplary data center, in accordance with at least one embodiment; FIG. 14 illustrates a processing system, in accordance with at least one embodiment; FIG. 15 illustrates a computer system, in accordance with at least one embodiment; FIG. 16 illustrates a system, in accordance with at least one embodiment; FIG. 17 illustrates an exemplary integrated circuit, in accordance with at least one embodiment; FIG. 18 illustrates a computing system, according to at least one embodiment; FIG. 19 illustrates an APU, in accordance with at least one embodiment; FIG. 20 illustrates a CPU, in accordance with at least one embodiment; FIG. 21 illustrates an exemplary accelerator integration slice, in accordance with at least one embodiment; FIGS. 22A-22B illustrate exemplary graphics processors, in accordance with at least one embodiment; FIG. 23A illustrates a graphics core, in accordance with at least one embodiment; FIG. 23B illustrates a GPGPU, in accordance with at least one embodiment; FIG. 24A illustrates a parallel processor, in accordance with at least one embodiment; FIG. 24B illustrates a processing cluster, in accordance with at least one embodiment; FIG. 24C illustrates a graphics multiprocessor, in accordance with at least one embodiment; FIG. 25 illustrates a graphics processor, in accordance with at least one embodiment; FIG. 26 illustrates a processor, in accordance with at least one embodiment; FIG. 27 illustrates a processor, in accordance with at least one embodiment; FIG. 28 illustrates a graphics processor core, in accordance with at least one embodiment; FIG. 29 illustrates a PPU, in accordance with at least one embodiment; FIG. 30 illustrates a GPC, in accordance with at least one embodiment; FIG. 31 illustrates a streaming multiprocessor, in accordance with at least one embodiment; FIG. 32 illustrates a software stack of a programming platform, in accordance with at least one embodiment; FIG. 33 illustrates a CUDA implementation of a software stack of FIG. 32, in accordance with at least one embodiment; FIG. 34 illustrates a ROCm implementation of a software stack of FIG. 32, in accordance with at least one embodiment; FIG. 35 illustrates an OpenCL implementation of a software stack of FIG. 32, in accordance with at least one embodiment; FIG. 36 illustrates software that is supported by a programming platform, in accordance with at least one embodiment; FIG. 37 illustrates compiling code to execute on programming platforms of FIGS. 32-35, in accordance with at least one embodiment; FIG. 38 illustrates in greater detail compiling code to execute o