Search

CN-122018909-A - Concurrent execution of software programs

CN122018909ACN 122018909 ACN122018909 ACN 122018909ACN-122018909-A

Abstract

The present disclosure relates to concurrent execution of software programs. Apparatus, systems, and techniques for materializing a task pipeline for sequential tasks performed by simultaneous sequential cores. In at least one embodiment, the processor includes one or more circuits for causing the compiler to indicate one or more portions of one or more software programs to be executed concurrently by the one or more processors.

Inventors

  • S. SINHA
  • R. CHOWDHURY
  • M. Azizizian

Assignees

  • 辉达公司

Dates

Publication Date
20260512
Application Date
20251110
Priority Date
20241112

Claims (20)

  1. 1. A processor, comprising: One or more circuits for causing the compiler to indicate one or more portions of one or more software programs to be executed concurrently by the one or more processors.
  2. 2. The processor of claim 1, wherein the one or more portions of the one or more software programs are executed using information generated by another one or more portions of the one or more software programs.
  3. 3. The processor of claim 1, wherein the compiler is to indicate the one or more portions of the one or more software programs based at least in part on input memory requirements and output memory requirements of the one or more software programs.
  4. 4. The processor of claim 1, wherein the compiler is to cause one or more portions of memory to be allocated for use by the one or more software programs.
  5. 5. The processor of claim 1, wherein the one or more portions of the one or more software programs are to be modified by the compiler to be executed concurrently by the one or more processors.
  6. 6. The processor of claim 1, wherein the one or more portions of the one or more software programs are to generate information to be asynchronously provided to one or more other portions of the one or more software programs.
  7. 7. The processor of claim 1, wherein the one or more processors are to be allocated to execute the one or more portions of the one or more software programs.
  8. 8. A system, comprising: One or more processors to cause the compiler to indicate one or more portions of one or more software programs to be executed concurrently by the one or more processors.
  9. 9. The system of claim 8, wherein the one or more portions of the one or more software programs are to be executed using information generated by another one or more portions of the one or more software programs.
  10. 10. The system of claim 8, wherein the compiler is to indicate the one or more portions of the one or more software programs based at least in part on input memory requirements and output memory requirements of the one or more software programs.
  11. 11. The system of claim 8, wherein the compiler is to cause one or more portions of memory to be allocated for use by the one or more software programs.
  12. 12. The system of claim 8, wherein the one or more portions of the one or more software programs are to be modified by the compiler to be executed concurrently by the one or more processors.
  13. 13. The system of claim 8, wherein the one or more portions of the one or more software programs are to generate information to be asynchronously provided to one or more other portions of the one or more software programs.
  14. 14. The system of claim 8, wherein the one or more processors are to be allocated to execute the one or more portions of the one or more software programs.
  15. 15. A method, comprising: The compiler is caused to indicate one or more portions of one or more software programs to be executed concurrently by the one or more processors.
  16. 16. The method of claim 15, further comprising executing the one or more portions of the one or more software programs using information generated by the other one or more portions of the one or more software programs.
  17. 17. The method of claim 15, further comprising indicating, by the compiler, the one or more portions of the one or more software programs based at least in part on input memory requirements and output memory requirements of the one or more software programs.
  18. 18. The method of claim 15, further comprising causing, by the compiler, one or more portions of memory to be allocated for use by the one or more software programs.
  19. 19. The method of claim 15, wherein the one or more portions of the one or more software programs are to be modified by the compiler to be executed concurrently by the one or more processors.
  20. 20. The method of claim 15, further comprising generating, by the one or more portions of the one or more software programs, information to be asynchronously provided to one or more other portions of the one or more software programs.

Description

Concurrent execution of software programs Technical Field At least one embodiment relates to generating two or more GPU kernels for sequentially executing sequential (sequential) program tasks without external controller intervention. At least one embodiment involves building a "pipeline" structure that allows input data to be calculated by two or more sequential cores without being derived from the pipeline. At least one embodiment relates to causing a compiler to indicate one or more portions of one or more software programs to be executed concurrently (concurrently) by one or more processors. Background The computation of asynchronous (e.g., not capable of being performed simultaneously) cores in a GPU requires the materialization of the cores (substantiation), processing, export of output data, instantiation of a second core, and import of input data. When such tasks are repeatedly performed a plurality of times, delay of the kernel due to the start-up and shut-down processes may cause delay. Asynchronous kernel execution methods may be improved. Drawings FIG. 1 illustrates an example architecture of a GPU kernel pipeline in accordance with at least one embodiment; FIG. 2 illustrates an example materialization of a task pipeline in accordance with at least one embodiment; FIG. 3 illustrates an example conversion of source code into a task pipeline in accordance with at least one embodiment; FIG. 4 illustrates a pair of example pipeline cores that mutually provide data through a lock-free buffer memory in accordance with at least one embodiment; FIG. 5 illustrates a pair of example pipeline cores that mutually provide data through a memory buffer using a synchronization completion notification in accordance with at least one embodiment; FIG. 6 illustrates an example process for the materialization and operation of a core pipeline in accordance with at least one embodiment; FIG. 7 illustrates example execution pipeline operations for performing a materialization of a core pipeline in accordance with at least one embodiment; FIG. 8 illustrates an example data center system in accordance with at least one embodiment; FIG. 9 illustrates a system on a chip (SOC) in accordance with at least one embodiment; FIG. 10A illustrates a parallel processor in accordance with at least one embodiment; FIG. 10B illustrates a processing cluster in accordance with at least one embodiment; FIG. 10C illustrates a graphics multiprocessor in accordance with at least one embodiment; FIG. 11 illustrates an accelerator processor in accordance with at least one embodiment; FIG. 12A illustrates a central processing unit in accordance with at least one embodiment; FIG. 12B illustrates the core of the central processing unit in FIG. 12A in accordance with at least one embodiment; FIG. 13 illustrates another accelerator processor in accordance with at least one embodiment; FIG. 14 illustrates a neuromorphic processor according to at least one embodiment; FIG. 15 illustrates a supercomputer in accordance with at least one embodiment; FIG. 16 illustrates another accelerator processor in accordance with at least one embodiment; FIG. 17 illustrates another processor in accordance with at least one embodiment; FIG. 18 illustrates another accelerator processor in accordance with at least one embodiment; FIG. 19 illustrates a tensor processing unit in accordance with at least one embodiment; FIG. 20 illustrates a RISC-V compatible processor in accordance with at least one embodiment; FIGS. 21A and 21B illustrate a language processing unit in accordance with at least one embodiment; FIG. 22 illustrates a software stack of a programming platform in accordance with at least one embodiment; FIG. 23 illustrates software supported by a programming platform in accordance with at least one embodiment; FIG. 24 illustrates compiled code for execution on the programming platform of FIG. 23 in accordance with at least one embodiment; FIG. 25 illustrates an example of an autonomous vehicle and its system architecture in accordance with at least one embodiment; FIG. 26A illustrates inference and/or training logic in accordance with at least one embodiment; FIG. 26B illustrates inference and/or training logic in accordance with at least one embodiment, and Fig. 26C illustrates training and deployment of a neural network in accordance with at least one embodiment. Detailed Description In at least one embodiment, a system and method implemented in accordance with the present disclosure is used to cause a compiler to indicate one or more portions of one or more software programs to be concurrently executed by one or more processors. In at least one embodiment, when multiple asynchronous sequential tasks (sequential tasks) are to be repeatedly performed by cores on a processor (e.g., a repeated loop of an image enhancement process performed one by one on many frames of video, the process being performed in a specified order), a pipeline may be created to allow