CN-116050488-B - Data processing method and device, neural network accelerator and storage medium
Abstract
The embodiment of the application discloses a data processing method, a device, a neural network accelerator and a storage medium, wherein the device comprises a conversion processing module, a conversion processing module and a conversion processing module, wherein the conversion processing module is used for executing first matrix conversion operation in a Winograd algorithm on corresponding image data and executing second matrix conversion operation in the Winograd algorithm on corresponding convolution kernel aiming at each channel in a characteristic image to obtain corresponding converted image data and convolution kernel, the at least one convolution processing unit comprises calculation groups with the number matched with the number of channels in the characteristic image, each calculation group supports point multiplication operation in the Winograd algorithm on the converted image data and the convolution kernel corresponding to one channel in the characteristic image to obtain corresponding point multiplication data, and the conversion processing module is also used for accumulating the point multiplication data corresponding to different channels in the characteristic image and executing third matrix conversion operation in the Winograd algorithm on the obtained accumulated data to obtain a convolution processing result corresponding to the characteristic image.
Inventors
- ZHU YEHUA
- SUN WEI
- BU HAIXIANG
Assignees
- 哲库科技(上海)有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20211021
Claims (9)
- 1. The data processing device is characterized by comprising at least one convolution processing unit and a conversion processing module connected with the at least one convolution processing unit; The conversion processing module is used for executing a first matrix conversion operation in a Winograd algorithm on corresponding image data aiming at each channel in the characteristic image, and executing a second matrix conversion operation in the Winograd algorithm on a corresponding convolution kernel to obtain corresponding converted image data and the convolution kernel; The at least one convolution processing unit comprises calculation groups, the number of which is matched with the number of channels in the characteristic image, and each calculation group supports the execution of dot multiplication operation in the Winograd algorithm on converted image data and convolution kernels corresponding to one channel in the characteristic image to obtain corresponding dot multiplication data; The conversion processing module is further used for accumulating point multiplication data corresponding to different channels in the characteristic image, and executing a third matrix conversion operation in the Winograd algorithm on the accumulated data to obtain a convolution processing result corresponding to the characteristic image; each convolution processing unit comprises at least one multiply-accumulate tree, and each multiply-accumulate tree comprises at least one point multiplication operator; Each calculation group comprises a point multiplication operator in each multiplication accumulation tree in a convolution processing unit; And each calculation group supports dot multiplication of converted image data corresponding to one channel in the characteristic image and data at the same position in a convolution kernel by using a dot multiplication operator in the group to obtain corresponding dot multiplication data.
- 2. The apparatus of claim 1, wherein the conversion processing module comprises an accumulation unit; The accumulation unit is connected with the at least one convolution processing module and is used for accumulating the point multiplication data corresponding to different channels in the characteristic image to obtain accumulated data.
- 3. The apparatus of claim 2, wherein the conversion processing module further comprises: A first conversion unit disposed in the at least one convolution processing unit; The first conversion unit is configured to perform, for each channel in the feature image, the first matrix conversion operation on corresponding image data, and perform the second matrix conversion operation on corresponding convolution kernels, to obtain corresponding converted image data and convolution kernels; And the accumulation unit is further used for executing the third matrix conversion operation on the accumulated data to obtain the convolution processing result.
- 4. The apparatus of claim 2, wherein the conversion processing module further comprises a second conversion unit; The second conversion unit is connected with the at least one convolution processing module and the accumulation unit; the second conversion unit is used for: For each channel in the characteristic image, performing the first matrix conversion operation on the corresponding image data and the second matrix conversion operation on the corresponding convolution kernel to obtain the corresponding converted image data and the convolution kernel, and And executing the third matrix conversion operation on the accumulated data to obtain the convolution processing result.
- 5. The apparatus of claim 1, wherein the device comprises a plurality of sensors, The at least one convolution processing unit is arranged in a column.
- 6. A method of data processing, comprising: acquiring the number of channels in the characteristic image; Selecting at least one convolution processing unit from the convolution processing unit array based on the channel number, wherein the at least one convolution processing unit comprises a calculation group with the number matched with the channel number; performing a first matrix conversion operation in a Winograd algorithm on corresponding image data by utilizing a conversion processing module aiming at each channel in the characteristic image, and performing a second matrix conversion operation in the Winograd algorithm on a corresponding convolution kernel to obtain corresponding converted image data and the convolution kernel; Performing point multiplication operation in the Winograd algorithm on the converted image data and convolution kernels corresponding to different channels in the characteristic image by utilizing different calculation groups in at least one convolution processing unit to obtain corresponding point multiplication data; accumulating point multiplication data corresponding to different channels in the characteristic image by using the conversion processing module, and executing a third matrix conversion operation in the Winograd algorithm on the accumulated data to obtain a convolution processing result corresponding to the characteristic image; wherein, in the at least one convolution processing unit, each convolution processing unit comprises at least one multiply-accumulate tree, and each multiply-accumulate tree comprises at least one point multiplication operator; Each calculation group comprises a point multiplication operator in each multiplication accumulation tree in a convolution processing unit; performing point multiplication operation in the Winograd algorithm on converted image data and convolution kernels corresponding to different channels in the feature image by using different calculation groups in at least one convolution processing unit to obtain corresponding point multiplication data, wherein the point multiplication operation comprises the following steps: And performing point multiplication on the converted image data corresponding to one channel in the characteristic image and the convolution kernel by using each calculation group, and performing point multiplication on the data at the same position by using a point multiplication operator in the group to obtain corresponding point multiplication data.
- 7. A neural network accelerator comprising a data processing apparatus as claimed in any one of claims 1 to 5.
- 8. A data processing device is characterized by comprising a processor, a memory and a communication bus; the communication bus is used for realizing communication connection between the processor and the memory; The processor is configured to execute one or more programs stored in the memory to implement the data processing method of claim 6.
- 9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data processing method as claimed in claim 6.
Description
Data processing method and device, neural network accelerator and storage medium Technical Field The embodiment of the application relates to the technical field of communication, in particular to a data processing method, a device, a neural network accelerator and a storage medium. Background The artificial intelligent accelerator architecture is mainly divided into a convolution processing accelerating unit and a vector processing accelerating unit, wherein convolution operation occupies most of calculated amount in an artificial intelligent algorithm network, and an operation core is multiply-accumulate operation. At present, a convolution operation method corresponding to a hardware architecture is generally direct convolution operation, which includes a large number of multiplication and addition operations, and the large number of multiplication and addition operations mean more hardware resources and power consumption, and the operation efficiency is lower. Disclosure of Invention The embodiment of the application provides a data processing method, a data processing device, a neural network accelerator and a storage medium, which not only improve the efficiency of convolution operation, but also have high flexibility and expansibility. The technical scheme of the embodiment of the application is realized as follows: the embodiment of the application provides a data processing device, which comprises at least one convolution processing unit and a conversion processing module connected with the at least one convolution processing unit; The conversion processing module is used for executing a first matrix conversion operation in a Winograd algorithm on corresponding image data aiming at each channel in the characteristic image, and executing a second matrix conversion operation in the Winograd algorithm on a corresponding convolution kernel to obtain corresponding converted image data and the convolution kernel; The at least one convolution processing unit comprises calculation groups, the number of which is matched with the number of channels in the characteristic image, and each calculation group supports the execution of dot multiplication operation in the Winograd algorithm on converted image data and convolution kernels corresponding to one channel in the characteristic image to obtain corresponding dot multiplication data; The conversion processing module is further configured to accumulate dot product data corresponding to different channels in the feature image, and perform a third matrix conversion operation in the Winograd algorithm on the obtained accumulated data, so as to obtain a convolution processing result corresponding to the feature image. In the device, the conversion processing module comprises an accumulation unit; The accumulation unit is connected with the at least one convolution processing module and is used for accumulating the point multiplication data corresponding to different channels in the characteristic image to obtain accumulated data. In the above apparatus, the conversion processing module further includes: A first conversion unit disposed in the at least one convolution processing unit; The first conversion unit is configured to perform, for each channel in the feature image, the first matrix conversion operation on corresponding image data, and perform the second matrix conversion operation on corresponding convolution kernels, to obtain corresponding converted image data and convolution kernels; And the accumulation unit is further used for executing the third matrix conversion operation on the accumulated data to obtain the convolution processing result. In the device, the conversion processing module further comprises a second conversion unit; The second conversion unit is connected with the at least one convolution processing module and the accumulation unit; the second conversion unit is used for: For each channel in the characteristic image, performing the first matrix conversion operation on the corresponding image data and the second matrix conversion operation on the corresponding convolution kernel to obtain the corresponding converted image data and the convolution kernel, and And executing the third matrix conversion operation on the accumulated data to obtain the convolution processing result. In the above apparatus, in the at least one convolution processing unit, each convolution processing unit includes at least one multiply-accumulate tree, each multiply-accumulate tree including at least one point multiplier; Each calculation group comprises a point multiplication operator in each multiplication accumulation tree in a convolution processing unit; And each calculation group supports dot multiplication of converted image data corresponding to one channel in the characteristic image and data at the same position in a convolution kernel by using a dot multiplication operator in the group to obtain corresponding dot multiplication data. In the above de