CN-116415640-B - Neural network reasoning chip, neural network reasoning method and terminal
Abstract
The application provides a neural network reasoning chip, a neural network reasoning method and a terminal, wherein the chip comprises a control unit, an operation unit array and a partial and accumulation unit array, the operation unit array comprises a first operation unit sub-array and a second operation unit sub-array which are positioned in the same column, the control unit is used for inputting first characteristic data and second characteristic data into the first operation unit sub-array and the second operation unit sub-array respectively, determining a target selection branch of the second operation unit sub-array and starting the target selection branch, the first operation unit sub-array is used for obtaining a first sub-array operation result based on the first characteristic data and the first weight data, the second operation unit sub-array is used for obtaining a second sub-array intermediate operation result based on the second characteristic data and the second weight data, calculating the second sub-array intermediate operation result and the first sub-array operation result through the target selection branch, and outputting the second sub-array operation result to the partial and accumulation unit sub-array.
Inventors
- BU HAIXIANG
- ZHANG HUI
- WANG ZHIHUI
Assignees
- 哲库科技(上海)有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20230327
Claims (12)
- 1. A neural network reasoning chip is characterized by comprising a control unit, an operation unit array, a part of accumulation unit array and a second operation unit array, wherein the operation unit array comprises a first operation unit sub-array and a second operation unit sub-array which are positioned in the same column, The control unit is used for controlling the first characteristic data and the second characteristic data corresponding to the input channel to be respectively input into the first operation unit subarray and the second operation unit subarray; the first operation unit subarray is used for obtaining a first subarray operation result based on the first characteristic data and the first weight data and outputting the first subarray operation result to the second operation unit subarray; The second operation unit subarray is used for obtaining a second subarray intermediate operation result based on the second characteristic data and the second weight data, carrying out operation on the second subarray intermediate operation result and the first subarray operation result through the target selection branch to obtain a second subarray operation result, and outputting the second subarray operation result to a part and an accumulation unit subarray which are positioned in the same row as the second operation unit subarray in the part and the accumulation unit array so as to carry out accumulation operation; the target selection leg comprises a first selection leg, wherein, The control unit is used for controlling the first characteristic data and the second characteristic data which correspond to the same input channel and are different in byte bits, respectively inputting the first operation unit subarray and the second operation unit subarray, determining the first selection branch and starting the first selection branch; The second operation unit sub-array is configured to obtain a second sub-array intermediate operation result based on the second feature data and the second weight data, perform shift processing on the second sub-array intermediate operation result through the first selection branch, sum the shifted second sub-array intermediate operation result and the first sub-array operation result to obtain the second sub-array operation result, and output the second sub-array operation result to the part and the accumulation unit sub-array located in the same column to perform accumulation operation.
- 2. The neural network inference chip of claim 1, wherein the first selection branch comprises a selector, a shifter, and an accumulator, wherein, The control unit is used for controlling the selector to select the first selection branch; The shifter is configured to shift the second sub-array intermediate operation result to the left by the target shift number of bits to obtain the shifted second sub-array intermediate operation result when determining the target shift number of bits of the operation result obtained by performing the operation on the second characteristic data based on the byte number of the first characteristic data of the same input channel; and the accumulator is used for summing the shifted second subarray intermediate operation result and the first subarray operation result to obtain the second subarray operation result.
- 3. The neural network inference chip of claim 1, wherein the target-selection leg comprises a second selection leg, wherein, The control unit is used for controlling the first characteristic data and the second characteristic data corresponding to different input channels to be respectively input into the first operation unit subarray and the second operation unit subarray, determining the second selection branch and starting the second selection branch; The second operation unit sub-array is configured to obtain a second sub-array intermediate operation result based on the second feature data and the second weight data, sum the second sub-array intermediate operation result and the first sub-array operation result through the second selection branch to obtain the second sub-array operation result, and output the second sub-array operation result to the part and the accumulation unit sub-array located in the same column to perform accumulation operation.
- 4. The neural network inference chip of claim 3, wherein the second selection branch includes a selector and an accumulator, wherein, The control unit is used for controlling the selector to select the second selection branch; And the accumulator is used for summing the second subarray intermediate operation result and the first subarray operation result to obtain the second subarray operation result.
- 5. The neural network inference chip of any of claims 1 to 4, wherein the sub-array of arithmetic units comprises at least two merged arithmetic units in different columns, the merged arithmetic units comprising at least a first arithmetic unit and a second arithmetic unit in the same column, the characteristic data comprising at least a third characteristic data and a fourth characteristic data, wherein, The control unit is used for controlling the third characteristic data and the fourth characteristic data corresponding to different input channels to be respectively input into the first operation unit and the second operation unit; the first operation unit is used for carrying out corresponding position multiplication calculation on the third characteristic data and the third weight data to obtain a first product, and outputting the first product to the second operation unit; The second operation unit is used for carrying out corresponding position multiplication calculation on the fourth characteristic data and the fourth weight data to obtain a second product, summing the first product and the second product to obtain a part of sum accumulation results, storing the part of sum accumulation results into a beat register, and carrying out accumulation processing on the part of sum accumulation results through an addition tree to obtain an initial subarray operation result.
- 6. A neural network reasoning method is characterized in that the neural network reasoning method is applied to a neural network reasoning chip, the neural network reasoning chip comprises a control unit, an operation unit array and a part and accumulation unit array, the operation unit array comprises a first operation unit sub-array and a second operation unit sub-array which are positioned in the same column, wherein, The control unit controls the first characteristic data and the second characteristic data corresponding to the input channel to be respectively input into the first operation unit subarray and the second operation unit subarray; the first operation unit subarray obtains a first subarray operation result based on the first characteristic data and the first weight data and outputs the first subarray operation result to the second operation unit subarray; The second operation unit subarray obtains a second subarray intermediate operation result based on the second characteristic data and the second weight data, calculates the second subarray intermediate operation result and the first subarray operation result through the target selection branch, and outputs the second subarray operation result to a part and an accumulation unit subarray which are positioned in the same row with the second operation unit subarray in the part and the accumulation unit array so as to perform accumulation operation; the target selection leg comprises a first selection leg, wherein, The control unit controls the first characteristic data and the second characteristic data which correspond to the same input channel and are different in byte bits to be respectively input into the first operation unit subarray and the second operation unit subarray, determines the first selection branch circuit and starts the first selection branch circuit; The second operation unit subarray obtains a second subarray intermediate operation result based on the second characteristic data and the second weight data, performs shift processing on the second subarray intermediate operation result through the first selection branch, sums the shifted second subarray intermediate operation result with the first subarray operation result to obtain the second subarray operation result, and outputs the second subarray operation result to the part and the accumulation unit subarray which are positioned in the same column to perform accumulation operation.
- 7. The neural network reasoning method of claim 6, wherein the first selection branch includes a selector, a shifter, and an accumulator, wherein, The control unit controls the selector to select the first selection branch; The shifter shifts the second subarray intermediate operation result to the left by the target shift bit number under the condition of determining the target shift bit number based on the first characteristic data and the second characteristic data, so as to obtain a shifted second subarray intermediate operation result; And the accumulator sums the shifted second subarray intermediate operation result with the first subarray operation result to obtain the second subarray operation result.
- 8. The neural network reasoning method of claim 6, wherein the target-selection leg includes a second selection leg, wherein, The control unit controls the first characteristic data and the second characteristic data corresponding to different input channels to be respectively input into the first operation unit subarray and the second operation unit subarray, determines the second selection branch and opens the second selection branch; The second operation unit subarray obtains a second subarray intermediate operation result based on the second characteristic data and the second weight data, sums the second subarray intermediate operation result and the first subarray operation result through the second selection branch to obtain the second subarray operation result, and outputs the second subarray operation result to the part and the accumulation unit subarray which are positioned in the same column to perform accumulation operation.
- 9. The neural network reasoning method of claim 8, wherein the second selection branch includes a selector and an accumulator, wherein, The control unit controls the selector to select the second selection branch; and the accumulator sums the second subarray intermediate operation result and the first subarray operation result to obtain the second subarray operation result.
- 10. The neural network reasoning method of any of claims 6-9, wherein the sub-array of arithmetic units comprises at least two merged arithmetic units in different columns, the merged arithmetic units comprising at least a first arithmetic unit and a second arithmetic unit in the same column, the feature data comprising at least a third feature data and a fourth feature data, wherein, The control unit controls the third characteristic data and the fourth characteristic data corresponding to different input channels to be respectively input into the first operation unit and the second operation unit; the first operation unit performs corresponding position multiplication calculation on the third characteristic data and the third weight data to obtain a first product, and outputs the first product to the second operation unit; The second operation unit performs corresponding position multiplication calculation on the fourth characteristic data and the fourth weight data to obtain a second product, sums the first product and the second product to obtain a part of sum accumulation results, stores the part of sum accumulation results in a beat register, and performs accumulation processing on the part of sum accumulation results through an addition tree to obtain an initial subarray operation result.
- 11. A terminal comprising the neural network inference chip of any one of claims 1 to 5.
- 12. A terminal comprising a neural network inference chip, a processor, a memory storing executable instructions that when executed implement the method of any one of claims 6 to 10.
Description
Neural network reasoning chip, neural network reasoning method and terminal Technical Field The embodiment of the application relates to the field of chip design, in particular to a neural network reasoning chip, a neural network reasoning method and a terminal. Background In recent years, with the continuous development of computer science and chip architecture, neural network algorithms exhibit advantages far exceeding those of conventional algorithms in the fields of computer vision, audio and video processing, automatic driving and the like by virtue of the strong computational power of modern computers. Neural networks have the characteristics of being computationally intensive and memory intensive, which presents a greater challenge for the deployment of algorithms on hardware. The current neural network reasoning chip architecture mainly includes a graphics processor (Graphics Processing Unit, GPU), a field programmable gate array (Field Programmable GATE ARRAY, FPGA), and an Application SPECIFIC INTEGRATED Circuit (ASIC). For a neural network reasoning chip architecture in the related art, the computational parallelism of the architecture is determined by using an operation unit array connected in a systolic array (Systolic Array, SA), and the computational parallelism comprises an Input Channel (IC) parallelism and an Output Channel (OC) parallelism, so that convolution operation is performed by using the operation unit array with the computational parallelism. However, when the number of input channel ICs is not an integer multiple of the IC parallelism of the architecture, there is a problem in that the operation cell array utilization is low. Disclosure of Invention The embodiment of the application provides a neural network reasoning chip, a neural network reasoning method and a terminal, and provides a chip architecture which can realize application scenes of various input channel numbers of a network layer and different quantization precision requirements. The technical scheme of the embodiment of the application is realized as follows: In a first aspect, an embodiment of the present application provides a neural network inference chip, including a control unit, an operation unit array, and a partial and accumulation unit array, where the operation unit array includes a first operation unit sub-array and a second operation unit sub-array that are located in the same column, The control unit is used for controlling the first characteristic data and the second characteristic data corresponding to the input channel to be respectively input into the first operation unit subarray and the second operation unit subarray; the first operation unit subarray is used for obtaining a first subarray operation result based on the first characteristic data and the first weight data and outputting the first subarray operation result to the second operation unit subarray; The second operation unit sub-array is configured to obtain a second sub-array intermediate operation result based on the second feature data and the second weight data, perform an operation on the second sub-array intermediate operation result and the first sub-array operation result through the target selection branch, obtain a second sub-array operation result, and output the second sub-array operation result to a part of the accumulation unit array and the accumulation unit sub-array that are located in the same column as the second operation unit sub-array, so as to perform an accumulation operation. In a second aspect, an embodiment of the present application provides a neural network reasoning method applied to a neural network reasoning chip, the neural network reasoning chip including a control unit, an operation unit array, and a partial and accumulation unit array, the operation unit array including a first operation unit sub-array and a second operation unit sub-array located in the same column, wherein, The control unit controls the first characteristic data and the second characteristic data corresponding to the input channel to be respectively input into the first operation unit subarray and the second operation unit subarray; the first operation unit subarray obtains a first subarray operation result based on the first characteristic data and the first weight data and outputs the first subarray operation result to the second operation unit subarray; The second operation unit subarray obtains a second subarray intermediate operation result based on the second characteristic data and the second weight data, performs operation on the second subarray intermediate operation result and the first subarray operation result through the target selection branch to obtain a second subarray operation result, and outputs the second subarray operation result to a part and an accumulation unit subarray which are positioned in the same column as the second operation unit subarray in the part and the accumulation unit array so as to perform acc