EP-4742030-A1 - CONCURRENT PERFORMANCE OF SOFTWARE PROGRAMS

EP4742030A1EP 4742030 A1EP4742030 A1EP 4742030A1EP-4742030-A1

Abstract

Apparatuses, systems, and techniques to perform substantiation of task pipelines for sequential tasks performed by simultaneous, sequential kernels. In at least one emodiment, processors comprising one or more circuits to cause a compiler to indicate one or more portions of one or more software programs to be performed by one or more processors concurrently.

Inventors

SINHA, SOHAM
CHOUDHURY, RAHUL
AZIZIAN, Mahdi

Assignees

Nvidia Corporation

Dates

Publication Date: 20260513
Application Date: 20251106

Claims (14)

A processor comprising: one or more circuits to cause a compiler to indicate one or more portions of one or more software programs to be performed by one or more processors concurrently.
The processor of claim 1, wherein the one or more portions of the one or more software programs are performed using information generated by another one or more portions of the one or more software programs.
The processor of claim 1 or 2, wherein the compiler is to indicate the one or more portions of the one or more software programs based, at least in part, on an input memory requirement and an output memory requirement of the one or more software programs.
The processor of claim 1, 2, or 3, wherein the compiler is to cause one or more portions of memory to be allocated to be used by the one or more software programs.
The processor of any preceding claim, wherein the one or more portions of the one or more software programs are to be modified by the compiler to be performed by the one or more processors concurrently.
The processor of any preceding claim, wherein the one or more portions of one or more software programs are to generate information to be asynchronously provided to one or more other portions of the one or more software programs.
The processor of any preceding claim, wherein the one or more processors are to be allocated to perform the one or more portions of the one or more software programs.
A system comprising one or more processors according to any preceding claim.
A method comprising: causing a compiler to indicate one or more portions of one or more software programs to be performed by one or more processors concurrently.
The method of claim 9, further comprising performing the one or more portions of the one or more software programs using information generated by another one or more portions of the one or more software programs.
The method of claim 9 or 10, further comprising indicating, by the compiler, the one or more portions of the one or more software programs based, at least in part, on an input memory requirement and an output memory requirement of the one or more software programs.
The method of claim 9, 10, or 11, further comprising causing, by the compiler, one or more portions of memory to be allocated to be used by the one or more software programs.
The method of any of claims 9-12, wherein the one or more portions of the one or more software programs are to be modified by the compiler to be performed by the one or more processors concurrently.
The method of any of claims 9-13, further comprising generating, by the one or more portions of one or more software programs, information to be asynchronously provided to one or more other portions of the one or more software programs.

Description

TECHNICAL FIELD At least one embodiment pertains to generation of two or more GPU kernels to perform sequential program tasks in sequence without external controller interference. At least one embodiment pertains to construction of a 'pipeline' construct that allows input data to be computed by two or more sequential kernels without export from the pipeline. At least one embodiment pertains to cause a compiler to indicate one or more portions of one or more software programs to be performed by one or more processors concurrently. BACKGROUND Computation of asynchronous (e.g., not able to be performed simultaneously) kernels within a GPU requires substantiation of a kernel, processing, export of output data, instantiation of a second kernel, and import of input data. When performing such a task many times in repetition, delay of kernels due to startup and shutdown procedures causes latency. Asynchronous kernel performance methods can be improved. SUMMARY The invention is defined by the claims. In order to illustrate the invention, aspects and embodiments which may or may not fall within the scope of the claims are described herein. Apparatuses, systems, and techniques to perform substantiation of task pipelines for sequential tasks performed by simultaneous, sequential kernels. In at least one emodiment, processors comprising one or more circuits to cause a compiler to indicate one or more portions of one or more software programs to be performed by one or more processors concurrently. Any feature of one aspect or embodiment may be applied to other aspects or embodiments, in any appropriate combination. In particular, any feature of a method aspect or embodiment may be applied to an apparatus aspect or embodiment, and vice versa. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 illustrates an example architecture for a GPU kernel pipeline, in accordance with at least one embodiment;FIG. 2 illustrates an example substantiation of a task pipeline, in accordance with at least one embodiment;FIG. 3 illustrates an example conversion of source code into a task pipeline, in accordance with at least one embodiment;FIG. 4 illustrates an example pair of pipeline kernels providing data between each other through lock free buffer memory, in accordance with at least one embodiment;FIG. 5 illustrates an example pair of pipeline kernels providing data between each other through memory buffers with a synchronous completion notification, in accordance with at least one embodiment;FIG. 6 illustrates an example process for substantiation and operation of a kernel pipeline, in accordance with at least one embodiment;FIG. 7 illustrates an example perform pipeline operation to perform substantiation of a kernel pipeline, in accordance with at least one embodiment;FIG. 8 illustrates an example data center system, in accordance with at least one embodiment;FIG. 9 illustrates an system-on-a-chip (SOC), in accordance with at least one embodiment;FIG. 10A illustrates a parallel processor, in accordance with at least one embodiment;FIG. 10B illustrates a processing cluster, in accordance with at least one embodiment;FIG. 10C illustrates a graphics multiprocessor, in accordance with at least one embodiment;FIG. 11 illustrates an accelerator processor, in accordance with at least one embodiment;FIG. 12A illustrates a central processing unit, in accordance with at least one embodiment;FIG. 12B illustrates a core of the central processing unit in FIG. 12A, in accordance with at least one embodiment;FIG. 13 illustrates another accelerator processor, in accordance with at least one embodiment;FIG. 14 illustrates a neuromorphic processor, in accordance with at least one embodiment;FIG. 15 illustrates a supercomputer, in accordance with at least one embodiment;FIG. 16 illustrates another accelerator processor, in accordance with at least one embodiment;FIG. 17 illustrates another processor, in accordance with at least one embodiment;FIG. 18 illustrates another accelerator processor, in accordance with at least one embodiment;FIG. 19 illustrates a tensor processing unit, in accordance with at least one embodiment;FIG. 20 illustrates a RISC-V-compatible processor, in accordance with at least one embodiment;FIGS. 21A and 21B illustrate a language processing unit, in accordance with at least one embodiment;FIG. 22 illustrates a software stack of a programming platform, in accordance with at least one embodiment;FIG. 23 illustrates software that is supported by a programming platform, in accordance with at least one embodiment;FIG. 24 illustrates compiling code to execute on programming platforms of FIG. 18, in accordance with at least one embodiment;FIG. 25 illustrates an example of an autonomous vehicle and its system architecture, in accordance with at least one embodiment;FIG. 26A illustrates inference and/or training logic, in accordance with at least one embodiment;FIG. 26B illustrates inference and/or training logic, in accordance with at least one embod