CN-114761969-B - Multi-mode plane engine for a neural processor

CN114761969BCN 114761969 BCN114761969 BCN 114761969BCN-114761969-B

Abstract

Embodiments relate to a neural processor including a plurality of neural engine circuits and one or more plane engine circuits. The plurality of neural engine circuits are capable of performing convolution operations of input data of the neural engine circuits with one or more kernels to generate an output. The planar engine circuit is coupled to the plurality of neural engine circuits. The plane engine circuit generates an output from input data corresponding to an output of the neural engine circuit or a version of input data of the neural processor. The plane engine circuitry may be configured to a variety of modes. In the pooling mode, the plane engine circuitry reduces the spatial size of the version of the input data. In element-by-element mode, the plane engine circuitry performs element-by-element operations on the input data. In the reduced mode, the plane engine circuitry reduces the rank of the tensor.

Inventors

MILLS CHRISTOPHER L
K. W. waters
Y.Jin

Assignees

苹果公司

Dates

Publication Date: 20260505
Application Date: 20200923
Priority Date: 20191008

Claims (20)

1. A neural processor, comprising: a plurality of neural engine circuits, each of the plurality of neural engine circuits configured to perform a convolution operation of the first input data with one or more kernels to generate a first output, and A plane engine circuit coupled to the plurality of neural engine circuits and configured to operate in parallel with the plurality of neural engine circuits, the plane engine circuit operable in one of two or more modes to generate a second output, the two or more modes including a pooling mode and an element-by-element mode, the plane engine circuit comprising a programmable row buffer circuit, wherein the programmable row buffer circuit is configured to store intermediate results generated by the plane engine circuit in the pooling mode, wherein the plane engine circuit is configured to bypass storage of results generated by the plane engine circuit in the element-by-element mode in the programmable row buffer circuit, and wherein: In the pooling mode, the plane engine circuitry is configured to reduce a spatial size of a version of second input data received by the plane engine circuitry, the second input data corresponding to the first output or the version of input data of the neural processor, and In the element-by-element mode, the plane engine circuitry is configured to perform an element-by-element operation on the second input data corresponding to the first output or a version of the input data of the neural processor, and A data processor circuit coupled to the plurality of neural engine circuits and the planar engine circuit, the data processor circuit configured to buffer the second output for transmission to the plurality of neural engine circuits.
2. The neural processor of claim 1, wherein the plane engine circuit comprises: a first filter circuit configured to reduce a first size of a first dimension of the version of the second input data in the pooled mode to generate intermediate data, and A second filter circuit configured to reduce a second size of a second dimension of the intermediate data in the pooled mode to generate a version of the second output.
3. The neural processor of claim 2, wherein the programmable row buffer circuit is coupled to the first filter circuit and the second filter circuit, and wherein the intermediate result is provided to the second filter circuit.
4. The neural processor of claim 2, wherein at least one of the first filter circuit or the second filter circuit is configured to perform the element-wise operation on the version of the second input data in the element-wise mode.
5. The neural processor of claim 2, wherein the plane engine circuit further comprises a format converter coupled to the first filter circuit, the format converter configured to perform one or more format conversions on the second input data to generate the version of the second input data.
6. The neural processor of claim 1, wherein the convolution operation is one of a plurality of operations for implementing a machine learning model.
7. The neural processor of claim 1, wherein the more modes include a reduced mode, and wherein the plane engine circuitry is further configured to: in the reduction mode, a rank based on a tensor of the first input data is reduced.
8. The neural processor of claim 7, wherein the plane engine circuit comprises a filter circuit configured to: reducing the spatial size of the second input data received in the pooling mode, Performing the element-wise operation of versions of one or more tensors in the element-wise mode, and Scalar values are generated in the reduced mode.
9. The neural processor of claim 1, wherein the first input data represents data across a plurality of channels and the second input data represents data in one of the plurality of channels.
10. The neural processor of claim 1, wherein the element-wise operation comprises one or more of tensor addition, element-wise maximum, element-wise minimum, or element-wise multiplication.
11. The neural processor of claim 1, wherein circuitry of the plane engine circuitry is reconfigured when switching from the pooling mode to the element-wise mode.
12. A method for operating a neural processor, the method comprising: transmitting first input data to at least one of a plurality of neural engine circuits of the neural processor; Performing a convolution operation of the first input data with one or more kernels using the at least one of the plurality of neural engine circuits to generate a first output; Transmitting second input data to a plane engine circuit of the neural processor, the plane engine circuit coupled to the plurality of neural engine circuits and configured to operate in parallel with the plurality of neural engine circuits, the plane engine circuit comprising a programmable row buffer circuit; generating, at the plane engine circuitry, a second output from the second input data, the plane engine circuitry operable in one of two or more modes, the two or more modes including a pooling mode and an element-by-element mode, wherein: in the pooling mode, the plane engine circuitry is configured to reduce a spatial size of a version of the second input data corresponding to the first output or the version of the input data of the neural processor and store an intermediate result generated by the plane engine circuitry, and In the element-by-element mode, the plane engine circuitry is configured to perform an element-by-element operation on the second input data corresponding to the first output or a version of the input data of the neural processor, and to bypass storage of results generated in the element-by-element mode in the programmable row buffer circuitry, and The second output is buffered for transmission to the plurality of neural engine circuits by a data processor circuit coupled to the plurality of neural engine circuits and the planar engine circuit.
13. The method of claim 12, wherein reducing the spatial size of the version of the second input data received by the plane engine circuit in the pooling mode comprises: reducing a first dimension of the version of the second input data with a first filter circuit to generate intermediate data, and A second dimension of the intermediate data is reduced with a second filter circuit to generate a version of the second output.
14. The method of claim 13, wherein the programmable row buffer circuit is coupled to the first filter circuit and the second filter circuit, the method further comprising: the intermediate result is provided to the second filter circuit.
15. The method of claim 13, wherein performing the element-wise operation of the second input data in the element-wise mode comprises performing the element-wise operation with at least one of the first filter circuit or the second filter circuit.
16. The method of claim 12, wherein the convolution operation is one of a plurality of operations for implementing a machine learning model.
17. The method of claim 12, wherein the more modes comprise a reduced mode, and wherein the method further comprises reducing a rank based on a tensor of the first input data in the reduced mode.
18. An electronic device, comprising: Memory storing machine learning model, and A neural processor, the neural processor comprising: a plurality of neural engine circuits, each of the plurality of neural engine circuits configured to perform a convolution operation of the first input data with one or more kernels to generate a first output, and A plane engine circuit coupled to the plurality of neural engine circuits and configured to operate in parallel with the plurality of neural engine circuits, the plane engine circuit operable in one of two or more modes to generate a second output, the two or more modes including a pooling mode and an element-by-element mode, the plane engine circuit comprising a programmable row buffer circuit, wherein the programmable row buffer circuit is configured to store intermediate results generated by the plane engine circuit in the pooling mode, wherein the plane engine circuit is configured to bypass storage of results generated by the plane engine circuit in the element-by-element mode in the programmable row buffer circuit, and wherein: in the pooling mode, the plane engine circuitry is configured to reduce a spatial size of a version of second input data received by the plane engine circuitry, the second input data corresponding to the first output or the version of input data of the neural processor, and In the element-by-element mode, the plane engine circuitry is configured to perform an element-by-element operation on the second input data corresponding to the first output or a version of the input data of the neural processor, and A data processor circuit coupled to the plurality of neural engine circuits and the planar engine circuit, the data processor circuit configured to buffer the second output for transmission to the plurality of neural engine circuits.
19. The electronic device of claim 18, wherein the convolution operation is one of a plurality of operations for implementing the machine learning model.
20. A computer program product comprising a computer program which, when executed by a processor, causes the processor to perform the method according to any of claims 12-17.

Description

Multi-mode plane engine for a neural processor Background 1. Technical field The present disclosure relates to a circuit for performing operations related to a neural network, and more particularly to a neural processor including a plurality of neural engine circuits and one or more multimode plane engine circuits. 2. Description of related Art An Artificial Neural Network (ANN) is a computing system or model that uses a collection of connected nodes to process input data. ANNs are typically organized into layers, where different layers perform different types of transformations on their inputs. Extensions or variants of ANNs such as Convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), and Deep Belief Networks (DBNs) have received considerable attention. These computing systems or models typically involve extensive computational operations, including multiplication and accumulation. For example, CNN is a class of machine learning techniques that primarily use convolutions between input data and kernel data, which can be broken down into multiply and accumulate operations. These machine learning systems or models may be configured differently depending on the type of input data and the operations to be performed. The configuration of such variations will include, for example, preprocessing operations, the number of channels in the input data, the kernel data to be used, nonlinear functions to be applied to the convolution results, and the application of various post-processing operations. It is relatively easy to instantiate and execute machine learning systems or models of various configurations using a Central Processing Unit (CPU) and its main memory, as such systems or models can be instantiated by updating code only. However, relying solely on the CPU to perform various operations of these machine learning systems or models would consume a significant amount of the bandwidth of the Central Processing Unit (CPU) as well as increase overall power consumption. Disclosure of Invention Embodiments relate to a neural processor including a plurality of neural engine circuits and a planar engine circuit operable in a plurality of modes and coupled to the plurality of neural engine circuits. At least one of the neural engine circuits performs a convolution operation of the first input data with one or more kernels to generate a first output. The plane engine circuit generates a second output from second input data corresponding to the first output or to a version of the input data of the neural processor. The input data to the neural processor may be data received from a source external to the neural processor, or the output of a neural engine circuit or a planar engine circuit in a previous cycle. In the pooling mode, the planar engine circuit reduces the spatial size of the version of the second input data. In the element-by-element mode, the plane engine circuitry performs an element-by-element operation on the second input data. In the reduced mode, the plane engine circuitry reduces the rank of the tensor. Drawings Fig. 1 is a high-level diagram of an electronic device according to an embodiment. Fig. 2 is a block diagram illustrating components in an electronic device according to one embodiment. Fig. 3 is a block diagram illustrating a neural processor circuit, according to one embodiment. Fig. 4 is a block diagram of a neural engine in a neural processor circuit, according to one embodiment. Fig. 5 is a conceptual diagram illustrating a loop for processing input data at a neural processor circuit, according to one embodiment. Fig. 6A, 6B, and 6C are conceptual diagrams illustrating a pooling operation, an element-by-element operation, and a reduction operation, respectively, according to one embodiment. Fig. 7 is a flow chart illustrating a method of operation of a neural processor, according to one embodiment. The drawings depict various non-limiting embodiments and are therefore for purposes of illustration only. Detailed Description Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. Numerous specific details are set forth in the following detailed description in order to provide a thorough understanding of the various described embodiments. However, the embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments. Embodiments of the present disclosure relate to a neural processor that includes a plurality of neural engine circuits and one or more plane engine circuits that are efficient in performing different types of computations. The neural engine circuitry may be efficient for performing computationally intensive operations (e.g., convolution operations), while the planar engine circuitry may be efficient for performing computationall