US-12619878-B2 - Master transform architecture for deep learning

US12619878B2US 12619878 B2US12619878 B2US 12619878B2US-12619878-B2

Abstract

Apparatuses, systems, and techniques to transform input data for training neural networks. In at least one embodiment, one or more data transforms are identified in a sequence of data transforms and combined into one or more master data transforms to be performed by one or more parallel processing units in order to prepare data for training an untrained neural network.

Inventors

Shekhar Dwivedi
Nicholas Alexander Haemel

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260505
Application Date: 20191218

Claims (20)

1 . One or more processors, comprising: circuitry to implement an overall transform process with a plurality of data transforms via forming a combined data transform from at least some of the plurality to be applied to input data by hardware resources including hardware accelerators that comprise at least two or more sequentially arranged hardware accelerators based, at least in part, on memory and computational processing capabilities of the hardware accelerators available to apply the combined data transform to the hardware accelerators.
2 . The one or more processors of claim 1 , wherein the circuitry is further to implement a sequence of transforms including the combined data transform in the overall transform process.
3 . The one or more processors of claim 2 , wherein the sequence of transforms is to include one or more individual transforms able to be applied to the input data, respectively, via one or more hardware accelerators.
4 . The one or more processors of claim 1 , wherein the combined data transform is to be usable to train one or more neural networks.
5 . The one or more processors of claim 1 , wherein the output data is to be of a format distinct from the input data.
6 . The one or more processors of claim 1 , wherein the overall transform process is to comprise changing dimensionality of the input data.
7 . The one or more processors of claim 1 , wherein the hardware resources that include hardware accelerators comprise hardware resources that include at least two or more hardware accelerators sequentially arranged and at least one CPU.
8 . The one or more processors of claim 1 , wherein the combined data transform comprises pre and post transforms, and is to prepare 3-dimensional image data for use in training a neural network.
9 . The one or more processors of claim 1 , wherein the forming a combined data transform based, at least in part, on memory and computational processing capabilities of the hardware accelerators available is to result in optimized utilization of the hardware accelerators.
10 . A system, comprising: one or more processors to implement an overall transform process with a plurality of data transforms via forming a combined data transform from at least some of the plurality to be applied to input data by hardware resources including hardware accelerators that include at least two or more hardware accelerators sequentially arranged based, at least in part, on memory and computational processing capabilities of the hardware accelerators available to apply the combined data transform to the hardware accelerators.
11 . The system of claim 10 , wherein the overall transform process is to include a sequence of transforms including the combined data transform.
12 . The system of claim 11 , wherein the sequence of transforms is to include one or more individual transforms able to be applied to the input data, respectively, via one or more hardware accelerators.
13 . The system of claim 10 , wherein the plurality of data transforms is to comprise a sequence of operations on three dimensional (3D) image data.
14 . The system of claim 10 , wherein the hardware accelerators comprise two or more GPUs.
15 . The system of claim 14 , wherein the available memory and computational processing capabilities are to be associated with the two or more GPUs.
16 . The system of claim 10 , wherein the combined data transform is to be usable to train one or more neural networks.
17 . The system of claim 10 , wherein the hardware resources including hardware accelerators comprise hardware resources including at least two or more hardware accelerators and at least one CPU.
18 . The system of claim 10 , wherein one or more neural networks are to perform inference using at least a portion of the data output.
19 . The system of claim 10 , wherein the combined data transform is to comprise pre and post transforms.
20 . The system of claim 10 , wherein the data output is to comprise three dimensional (3D) image data for use in training one or more neural networks.

Description

FIELD At least one embodiment pertains to processing resources used to perform and facilitate artificial intelligence. For example, at least one embodiment pertains to processors or computing systems used to transform input data for training neural networks and inferencing using neural networks, according to various novel techniques described herein. BACKGROUND Training neural networks such that they can be used to perform inferencing often requires massive amounts of data. This data is often available in various formats, or otherwise needs to be modified before it can be used to train a neural network. Pre and post transforms to prepare data for training and inferencing are a key part of training neural networks to perform deep learning inferencing. While applying transforms to input data in order to prepare it to train a neural network is often necessary, it is also expensive both in memory encounter technical limitations (e.g., memory requirements) and formatting-based limits. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates a system for training and inferencing using a neural network, according to at least one embodiment; FIG. 2 illustrates a system for training and inferencing using a neural network with acceleration by one or more parallel processing units (PPUs), according to at least one embodiment; FIG. 3 illustrates a sequence of example transforms to prepare data for training and inferencing using a neural network, according to at least one embodiment; FIG. 4 illustrates a percentage of time for each example transform to process data for use in training and inferencing using a neural network, according to at least one embodiment; FIG. 5 illustrates a sequence of example transforms to prepare data for training and inferencing using a neural network, where a subset of example transforms are performed by one or more graphics processing units (GPUs) and remaining example transforms are performed by one or more central processing unit (CPUs), according to at least one embodiment; FIG. 6 illustrates a sequence of example transforms to prepare data for training and inferencing using a neural network, where a subset of example transforms have been combined into a master transform performed by one or more GPUs, and remaining example transforms are performed individually by one or more CPUs, according to at least one embodiment; FIG. 7 illustrates a system to determine one or more master transforms each containing two or more data transforms from a sequence of transforms to be performed on one or more parallel processing units (PPUs), such as graphics processing units (GPUs), according to at least one embodiment; FIG. 8 illustrates a process to determine one or more master transforms each containing two or more data transforms from a sequence of transforms to be performed on one or more parallel processing units (PPUs), such as graphics processing units (GPUs), according to at least one embodiment; FIG. 9A illustrates inference and/or training logic, according to at least one embodiment; FIG. 9B illustrates inference and/or training logic, according to at least one embodiment; FIG. 10 illustrates training and deployment of a neural network, according to at least one embodiment; FIG. 11 illustrates an example data center system, according to at least one embodiment; FIG. 12A illustrates an example of an autonomous vehicle, according to at least one embodiment; FIG. 12B illustrates an example of camera locations and fields of view for the autonomous vehicle of FIG. 12A, according to at least one embodiment; FIG. 12C is a block diagram illustrating an example system architecture for the autonomous vehicle of FIG. 12A, according to at least one embodiment; FIG. 12D is a diagram illustrating a system for communication between cloud-based server(s) and the autonomous vehicle of FIG. 12A, according to at least one embodiment; FIG. 13 is a block diagram illustrating a computer system, according to at least one embodiment; FIG. 14 is a block diagram illustrating computer system, according to at least one embodiment; FIG. 15 illustrates a computer system, according to at least one embodiment; FIG. 16 illustrates a computer system, according at least one embodiment; FIG. 17A illustrates a computer system, according to at least one embodiment; FIG. 17B illustrates a computer system, according to at least one embodiment; FIG. 17C illustrates a computer system, according to at least one embodiment; FIG. 17D illustrates a computer system, according to at least one embodiment; FIGS. 17E and 17F illustrate a shared programming model, according to at least one embodiment; FIG. 18 illustrates exemplary integrated circuits and associated graphics processors, according to at least one embodiment; FIGS. 19A and 19B illustrate exemplary integrated circuits and associated graphics processors, according to at least one embodiment; FIGS. 20A and 20B illustrate additional exemplary graphics processor logic according to at least