US-12625746-B1 - Graph work submission ordering

US12625746B1US 12625746 B1US12625746 B1US 12625746B1US-12625746-B1

Abstract

Apparatuses, systems, and techniques to perform graph nodes. In at least one embodiment, a processor comprises one or more circuits to perform an API to cause one or more first graph nodes to be performed independently with respect to two or more second graph nodes, which have a dependency relationship with respect to each other.

Inventors

Houston Thompson Hoffman
David Anthony Fontaine

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260512
Application Date: 20221031

Claims (20)

1 . One or more processors, comprising: circuitry to, in response to an application programming interface (API) call, cause one or more first graph nodes to be performed independently with respect to two or more second graph nodes, the two or more second graph nodes having a dependency relationship with respect to each other.
2 . The one or more processors of claim 1 , wherein performing a node independently comprises performing the node in an order determined based on one or more factors other than graph topology.
3 . The one or more processors of claim 1 , wherein the circuitry is to, in response to the API call, control an order in which the one or more first graph nodes are to be performed relative to the two or more second graph nodes based, at least in part, on an order in which the one or more first graph nodes were defined relative to the two or more second graph nodes.
4 . The one or more processors of claim 1 , wherein the one or more first graph nodes or the two or more second graph nodes correspond to operations performed by a neural network.
5 . The one or more processors of claim 1 , wherein the one or more first graph nodes are to be performed in an order determined based, at least in part, on an order in which the one or more first graph nodes were defined using the API.
6 . The one or more processors of claim 1 , wherein an API is to cause the one or more first graph nodes to be performed based, at least in part, on graph topology.
7 . The one or more processors of claim 1 , wherein the API comprises a function to define a graph node, and wherein the one or more first graph nodes are performed in an order determined based, at least in part, on an order in which the function to define the graph node was called.
8 . A system, comprising: one or more processors to, in response to an application programming interface (API) call, cause one or more first graph nodes to be performed independently with respect to two or more second graph nodes, the two or more second graph nodes having a dependency relationship with respect to each other.
9 . The system of claim 8 , wherein the one or more processors are to cause one or more first graph nodes to be performed based, at least in part, on identifying order in which one or more first graph nodes were defined.
10 . The system of claim 8 , wherein the one or more processors cause one or more first graph nodes to be executed based, at least in part, on an order in which the one or more first graph nodes were defined using the API.
11 . The system of claim 8 , wherein the one or more processors are to: identify one or more nodes which may be performed based, at least in part, on determining that the one or more first graph nodes do not have unresolved dependencies; and perform the identified one or more nodes.
12 . The system of claim 8 , wherein the one or more processors are to determine an order for performing the one or more first graph nodes based, at least in part, on at least one of node creation order, dependencies, or heuristics.
13 . The system of claim 8 , wherein an order of performing graph nodes with equivalent dependencies is based, at least in part, on an order in which the graph nodes with equivalent dependencies were declared.
14 . The system of claim 8 , wherein the one or more processors are to use one or more APIs to determine an order to perform graph nodes and output information indicative of the order in which to perform one or more graph nodes.
15 . A method, comprising: causing, in response to an application programming interface (API) call, one or more first graph nodes to be performed independently with respect to two or more second graph nodes, the two or more second graph nodes having a dependency relationship with respect to each other.
16 . The method of claim 15 , further comprising performing graph nodes with equivalent dependencies in an order based, at least in part, on an order in which the graph nodes with equivalent dependencies were declared.
17 . The method of claim 15 , further comprising: determining an order to perform the one or more first graph nodes based, at least in part on creation order and graph topology; and outputting information indicative of the order in which to perform the one or more first graph nodes.
18 . The method of claim 15 , further comprising performing an API to cause the one or more first graph nodes to be performed based, at least in part, on an order that the one or more first graph nodes were defined relative to the two or more second graph nodes.
19 . The method of claim 15 , further comprising computing output of a neural network based, at least in part, on performing the one or more first graph nodes and the two or more second graph nodes.
20 . The method of claim 15 , further comprising determining an order for performing the one or more first graph nodes based, at least in part, on at least one of creation order, dependencies, or heuristics.

Description

TECHNICAL FIELD At least one embodiment pertains to graph work submission ordering based, at least in part, on node creation order. For example, a processor comprising one or more circuits to perform an API to cause one or more first graph nodes to be performed independently with respect to two or more second graph nodes, which have a dependency relationship with respect to each other. In at least one embodiment, a processor implements various novel techniques illustrated herein. BACKGROUND Ordering nodes can use significant memory, time, or computing resources. An amount of memory, time, accuracy, or computing resources used to perform graph nodes can be improved. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 illustrates an example system of graph work submission ordering based, at least in part, on node creation order, according to at least one embodiment; FIG. 2 illustrates an example method of graph work submission ordering including enhanced dependencies, according to at least one embodiment; FIG. 3 illustrates an example method of graph work submission ordering based, at least in part, on node creation order, according to at least one embodiment; FIG. 4 illustrates example logic for sorting based, at least in part, on node creation and enhanced dependencies, according to at least one embodiment; FIG. 5 illustrates an example system of graph work submission ordering to include applying topological requirements, according to at least one embodiment; FIG. 6 illustrates an example process of performing an API to cause one or more graph nodes to be performed independently with respect to two or more other graph nodes that have a dependency relationship with respect to each other, according to at least one embodiment; FIG. 7A illustrates logic, according to at least one embodiment; FIG. 7B illustrates logic, according to at least one embodiment; FIG. 8 illustrates training and deployment of a neural network, according to at least one embodiment; FIG. 9 illustrates an example data center system, according to at least one embodiment; FIG. 10A illustrates an example of an autonomous vehicle, according to at least one embodiment; FIG. 10B illustrates an example of camera locations and fields of view for the autonomous vehicle of FIG. 10A, according to at least one embodiment; FIG. 10C is a block diagram illustrating an example system architecture for the autonomous vehicle of FIG. 10A, according to at least one embodiment; FIG. 10D is a diagram illustrating a system for communication between cloud-based server(s) and the autonomous vehicle of FIG. 10A, according to at least one embodiment; FIG. 11 is a block diagram illustrating a computer system, according to at least one embodiment; FIG. 12 is a block diagram illustrating a computer system, according to at least one embodiment; FIG. 13 illustrates a computer system, according to at least one embodiment; FIG. 14 illustrates a computer system, according to at least one embodiment; FIG. 15A illustrates a computer system, according to at least one embodiment; FIG. 15B illustrates a computer system, according to at least one embodiment; FIG. 15C illustrates a computer system, according to at least one embodiment; FIG. 15D illustrates a computer system, according to at least one embodiment; FIGS. 15E and 15F illustrate a shared programming model, according to at least one embodiment; FIG. 16 illustrates exemplary integrated circuits and associated graphics processors, according to at least one embodiment; FIGS. 17A-17B illustrate exemplary integrated circuits and associated graphics processors, according to at least one embodiment; FIGS. 18A-18B illustrate additional exemplary graphics processor logic according to at least one embodiment; FIG. 19 illustrates a computer system, according to at least one embodiment; FIG. 20A illustrates a parallel processor, according to at least one embodiment; FIG. 20B illustrates a partition unit, according to at least one embodiment; FIG. 20C illustrates a processing cluster, according to at least one embodiment; FIG. 20D illustrates a graphics multiprocessor, according to at least one embodiment; FIG. 21 illustrates a multi-graphics processing unit (GPU) system, according to at least one embodiment; FIG. 22 illustrates a graphics processor, according to at least one embodiment; FIG. 23 is a block diagram illustrating a processor micro-architecture for a processor, according to at least one embodiment; FIG. 24 illustrates a deep learning application processor, according to at least one embodiment; FIG. 25 is a block diagram illustrating an example neuromorphic processor, according to at least one embodiment; FIG. 26 illustrates at least portions of a graphics processor, according to one or more embodiments; FIG. 27 illustrates at least portions of a graphics processor, according to one or more embodiments; FIG. 28 illustrates at least portions of a graphics processor, according to one or more embodiments; FIG. 29 is a block diagram of a g