CN-114270319-B - Reassigning tensor elements among machine learning computing units

CN114270319BCN 114270319 BCN114270319 BCN 114270319BCN-114270319-B

Abstract

Methods, systems, and apparatus are described that include means for reassigning tensor elements among computing units. In one aspect, a method includes distributing tensor elements of an N-dimensional tensor among a plurality of computing units of a computing system. Each computing unit reassigns a subset of the tensor elements previously assigned to the computing unit. Each computing unit accesses reassignment partition data specifying, for each computing unit, the tensor elements to be stored by the computing unit after reassignment of the tensor elements. For each tensor element previously assigned to a particular computing unit, the computing unit determines a global linearized index value for the tensor element based on the multidimensional index of the tensor element. The computing unit uses the reassigned partition data and the global linearization index value to determine a destination computing unit and sends the tensor element to the destination computing unit.

Inventors

David Alexander menemer
RAVI NARAYANASWAMI
YU TONGHE
Karel Daniel killbroo

Assignees

谷歌有限责任公司

Dates

Publication Date: 20260512
Application Date: 20201007
Priority Date: 20191007

Claims (20)

1. A method for reassigning tensor elements among computing units, comprising: Distributing tensor elements of an N-dimensional tensor among a plurality of computing units of a computing system, wherein each computing unit performs a computation using a subset of the tensor elements distributed to the computing units; receiving instructions to redistribute the tensor elements of the N-dimensional tensor among the computing units; In response to receiving the instruction, reassigning, by each computing unit, the subset of tensor elements previously assigned to the computing unit to one or more computing units of the computing system, including for each particular computing unit of the computing system: Accessing reassignment split data specifying, for each computing unit, the tensor elements to be stored by the computing unit after reassigning the tensor elements; For each tensor element previously assigned to the particular computing unit: determining a global linearization index value for the tensor element based on a multidimensional index of the tensor element in the N-dimensional tensor, the multidimensional index of the tensor element including, for each dimension of the N-dimensional tensor, an index value corresponding to a position of the tensor element along the dimension of the N-dimensional tensor; Determining a destination computing unit of the computing system to which the tensor element is to be reassigned using the reassignment partition data and the global linearization index value of the tensor element, and The tensor element is sent to the destination computing unit.
2. The method of claim 1, wherein the tensor elements of the N-dimensional tensor are reassigned in response to reshaping the N-dimensional tensor, the reshaping comprising adjusting a number of tensor elements in two or more dimensions of the N-dimensional tensor.
3. The method of claim 2, wherein determining a destination computing unit of the computing system to which the tensor element is to be reassigned using the segmentation data and the global linearization index value of the tensor element comprises: determining a second multidimensional index of tensor elements in the reshaped N-dimensional tensor based on the global linearized index value of the tensor elements and a number of tensor elements in each dimension of the reshaped N-dimensional tensor, and The destination computing unit to which the tensor element is to be reassigned is determined based on the multidimensional index of the tensor element and the reassignment partition data.
4. The method according to claim 1, wherein: distributing the tensor elements of the N-dimensional tensor among the plurality of computing units of the computing system includes: partitioning the N-dimensional tensor into a plurality of tensor slices based on one or more chunking dimensions of the N-dimensional tensor, and Assigning one or more tensor slices of the N-dimensional tensor to each computing unit, and The tensor elements of the N-dimensional tensor are reassigned in response to a change in the one or more chunking dimensions on which the N-dimensional tensor is partitioned.
5. The method of any preceding claim, wherein transmitting the tensor element to the destination computing unit comprises: Generating header information specifying the destination calculation unit for the tensor element; Delivering the header information and the tensor elements to a tile-to-tile network path managed by the particular computing unit, and The tensor elements are stored by the destination computing unit in queues of the particular computing unit, wherein each computing unit includes a respective queue of each computing unit of the computing system, each respective queue storing tensor elements received from a corresponding computing unit corresponding to the respective queue.
6. The method of claim 5, further comprising: for each computing unit of the computing system: traversing a second subset of tensor elements being reassigned to the computing unit based on the reassigned segmentation data, including for each particular tensor element in the second subset: determining the global linearization index value for the particular tensor element; Determining an originating computing unit from which the particular tensor element is received based on the global linearization index value of the particular tensor element and allocation split data specifying, for each computing unit, the tensor element to be stored by the computing unit after allocation of the tensor element; obtaining the specific tensor element from the corresponding queue of the origin computing unit, and The specific tensor element is stored in a local memory of the computing unit.
7. The method of claim 6, wherein determining the global linearization index value for the particular tensor element comprises determining the global linearization index value based on the multidimensional index for the particular tensor element.
8. A system for reassigning tensor elements among computing units, comprising: the controller is used for controlling the operation of the controller, the controller is configured to: distributing tensor elements of the N-dimensional tensor among a plurality of computing units of the computing system; receiving instructions for reassigning the tensor elements of the N-dimensional tensor between the computing units, and In response to receiving the instruction, causing the computing unit to reassign a subset of tensor elements previously assigned to the computing unit to one or more computing units; Wherein each computing unit is configured to: performing a calculation using a subset of the tensor elements assigned to the calculation unit; Accessing reassignment split data specifying, for each computing unit, the tensor elements to be stored by the computing unit after reassigning the tensor elements; For each tensor element previously assigned to a particular computing unit: determining a global linearization index value for the tensor element based on a multidimensional index of the tensor element in the N-dimensional tensor, the multidimensional index of the tensor element including, for each dimension of the N-dimensional tensor, an index value corresponding to a position of the tensor element along the dimension of the N-dimensional tensor; Determining a destination computing unit of the computing system to which the tensor element is to be reassigned using the reassignment partition data and the global linearization index value of the tensor element, and The tensor element is sent to the destination computing unit.
9. The system of claim 8, wherein the tensor elements of the N-dimensional tensor are reassigned in response to reshaping the N-dimensional tensor, the reshaping comprising adjusting a number of tensor elements in two or more dimensions of the N-dimensional tensor.
10. The system of claim 9, wherein each computing unit is configured to determine a destination computing unit of the computing system to which the tensor element is to be reassigned using the segmentation data and the global linearization index value of the tensor element by: determining a second multidimensional index of tensor elements in the reshaped N-dimensional tensor based on the global linearized index value of the tensor elements and a number of tensor elements in each dimension of the reshaped N-dimensional tensor, and The destination computing unit to which the tensor element is to be reassigned is determined based on the multidimensional index of the tensor element and the reassignment partition data.
11. The system of claim 8, wherein: Each computing unit is configured to distribute the tensor elements of the N-dimensional tensor among the plurality of computing units of the computing system by: partitioning the N-dimensional tensor into a plurality of tensor slices based on one or more chunking dimensions of the N-dimensional tensor, and Assigning one or more tensor slices of the N-dimensional tensor to each computing unit, and And wherein each computing unit is configured to reassign the tensor elements of the N-dimensional tensor in response to a change in the one or more chunking dimensions on which the N-dimensional tensor is partitioned.
12. The system of any of claims 8 to 11, wherein each computing unit is configured to send the tensor element to the destination computing unit by: Generating header information specifying the destination calculation unit for the tensor element; Delivering the header information and the tensor elements to a tile-to-tile network path managed by the particular computing unit, and The tensor elements are stored by the destination computing unit in queues of the particular computing unit, wherein each computing unit includes a respective queue of each computing unit of the computing system, each respective queue storing tensor elements received from a corresponding computing unit corresponding to the respective queue.
13. The system of claim 12, wherein each computing unit is configured to: traversing a second subset of tensor elements being reassigned to the computing unit based on the reassigned segmentation data, including for each particular tensor element in the second subset: determining the global linearization index value for the particular tensor element; Determining an originating computing unit from which the particular tensor element is received based on the global linearization index value of the particular tensor element and allocation split data specifying, for each computing unit, the tensor element to be stored by the computing unit after allocation of the tensor element; obtaining the specific tensor element from the corresponding queue of the origin computing unit, and The specific tensor element is stored in a local memory of the computing unit.
14. The system of claim 13, wherein each computing unit is configured to determine the global linearization index value for the particular tensor element by determining the global linearization index value based on the multidimensional index for the particular tensor element.
15. A computer storage medium encoded with a computer program, the program comprising instructions that, when executed by one or more data processing apparatus, cause the data processing apparatus to perform operations for reassigning tensor elements among computing units, the operations comprising: Distributing tensor elements of an N-dimensional tensor among a plurality of computing units of a computing system, wherein each computing unit performs a computation using a subset of the tensor elements distributed to the computing units; receiving instructions to redistribute the tensor elements of the N-dimensional tensor among the computing units; In response to receiving the instruction, reassigning, by each computing unit, the subset of tensor elements previously assigned to the computing unit to one or more computing units of the computing system, including for each particular computing unit of the computing system: Accessing reassignment split data specifying, for each computing unit, the tensor elements to be stored by the computing unit after reassigning the tensor elements; For each tensor element previously assigned to the particular computing unit: determining a global linearization index value for the tensor element based on a multidimensional index of the tensor element in the N-dimensional tensor, the multidimensional index of the tensor element including, for each dimension of the N-dimensional tensor, an index value corresponding to a position of the tensor element along the dimension of the N-dimensional tensor; Determining a destination computing unit of the computing system to which the tensor element is to be reassigned using the reassignment partition data and the global linearization index value of the tensor element, and The tensor element is sent to the destination computing unit.
16. The computer storage medium of claim 15, wherein the tensor elements of the N-dimensional tensor are reassigned in response to reshaping the N-dimensional tensor, the reshaping comprising adjusting a number of tensor elements in two or more dimensions of the N-dimensional tensor.
17. The computer storage medium of claim 16, wherein determining a destination computing unit of the computing system to which the tensor element is to be reassigned using the segmentation data and the global linearization index value of the tensor element comprises: determining a second multidimensional index of tensor elements in the reshaped N-dimensional tensor based on the global linearized index value of the tensor elements and a number of tensor elements in each dimension of the reshaped N-dimensional tensor, and The destination computing unit to which the tensor element is to be reassigned is determined based on the multidimensional index of the tensor element and the reassignment partition data.
18. The computer storage medium of claim 15, wherein: distributing the tensor elements of the N-dimensional tensor among the plurality of computing units of the computing system includes: partitioning the N-dimensional tensor into a plurality of tensor slices based on one or more chunking dimensions of the N-dimensional tensor, and Assigning one or more tensor slices of the N-dimensional tensor to each computing unit, and The tensor elements of the N-dimensional tensor are reassigned in response to a change in the one or more chunking dimensions on which the N-dimensional tensor is partitioned.
19. The computer storage medium of any of claims 15 to 18, wherein transmitting the tensor element to the destination computing unit comprises: Generating header information specifying the destination calculation unit for the tensor element; Delivering the header information and the tensor elements to a tile-to-tile network path managed by the particular computing unit, and The tensor elements are stored by the destination computing unit in queues of the particular computing unit, wherein each computing unit includes a respective queue of each computing unit of the computing system, each respective queue storing tensor elements received from a corresponding computing unit corresponding to the respective queue.
20. The computer storage medium of claim 19, wherein the operations comprise: for each computing unit of the computing system: traversing a second subset of tensor elements being reassigned to the computing unit based on the reassigned segmentation data, including for each particular tensor element in the second subset: determining the global linearization index value for the particular tensor element; Determining an originating computing unit from which the particular tensor element is received based on the global linearization index value of the particular tensor element and allocation split data specifying, for each computing unit, the tensor element to be stored by the computing unit after allocation of the tensor element; obtaining the specific tensor element from the corresponding queue of the origin computing unit, and The specific tensor element is stored in a local memory of the computing unit.

Description

Reassigning tensor elements among machine learning computing units Background Neural networks are machine learning models that employ one or more layers of the model to generate an output, e.g., a classification, for a received input. The input to the neural network may include a multidimensional tensor that includes tensor elements. In addition to the outer layers, some neural networks also include one or more hidden layers. The output of each hidden layer serves as an input to the next layer in the network, i.e., the next hidden layer or output layer of the network. Each layer generates an output from the received input according to the current value of the respective parameter set. Disclosure of Invention The present description relates generally to a hardware neural network computing unit and a network between computing units configured to redistribute tensor elements among the computing units. In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include assigning tensor elements of an N-dimensional tensor among a plurality of computing units of a computing system, wherein each computing unit performs a computation using a subset of the tensor elements assigned to the computing unit, receiving an instruction to reassign tensor elements of the N-dimensional tensor among the computing units, reassigning, by each computing unit, to one or more computing units of the computing system, the subset of tensor elements previously assigned to the computing unit in response to the receiving the instruction, for each particular computing unit of the computing system, including accessing reassignment segmentation data that specifies, for each computing unit, the tensor elements to be stored by the computing unit after reassigning the tensor elements, determining, for each tensor element previously assigned to the particular computing unit, a global linearization index value for the tensor element based on a multidimensional index of the tensor elements in N-dimensional tensor, reassigning, for each dimension of the tensor element including a position index value corresponding to the tensor element along the N-dimensional dimension to the computing unit, and reassigning the tensor element to the computing unit by a global destination index value. These and other implementations can each optionally include one or more of the following features. In some aspects, the tensor elements of the N-dimensional tensor are reassigned in response to reshaping the N-dimensional tensor, the reshaping including adjusting the number of tensor elements in two or more dimensions of the N-dimensional tensor. The determining a destination computing unit of a computing system to which the tensor element is to be reassigned using the segmentation data and the global linearization index value of the tensor element may include determining a second multidimensional index of the tensor element in the reshaped N-dimensional tensor based on the global linearization index value of the tensor element and the number of tensor elements in each dimension of the reshaped N-dimensional tensor, and determining a destination computing unit to which the tensor element is to be reassigned based on the multidimensional index of the tensor element and the reassignment segmentation data. In some aspects, assigning tensor elements of the N-dimensional tensor among the plurality of computing units of the computing system includes partitioning the N-dimensional tensor into a plurality of tensor slices based on one or more chunking dimensions of the N-dimensional tensor, and assigning one or more tensor slices of the N-dimensional tensor to each computing unit. Tensor elements of the N-dimensional tensor are reassigned in response to a change in one or more of the chunking dimensions on which the N-dimensional tensor is segmented. In some aspects, sending the tensor element to the destination computing unit may include generating header information specifying the destination computing unit for the tensor element, passing the header information and the tensor element to a tile-to-tile network channel managed by the particular computing unit, and storing, by the destination computing unit, the tensor element in a queue of the particular computing unit, wherein each computing unit includes a respective queue of each computing unit of the computing system, each respective queue storing tensor elements received from a corresponding computing unit corresponding to the respective queue. Some aspects may include, for each computing unit of the computing system, traversing a second subset of the tensor elements being reassigned to the computing unit based on reassignment partition data, including, for each particular tensor element in the second subset, determining a global linearization index value for the particular tensor element, determining an originating computing unit from which the particular tenso