EP-4053746-B1 - WINOGRAD CONVOLUTION OPERATION METHOD, APPARATUS, AND DEVICE, AND STORAGE MEDIUM

EP4053746B1EP 4053746 B1EP4053746 B1EP 4053746B1EP-4053746-B1

Inventors

ZHANG, YINGNAN
GAO, Yufeng
ZENG, HONGBO
ZHANG, Yao
LIU, SHAOLI
HUANG, Di
ZHOU, Shiyi
ZHANG, Xishan
LIU, CHANG
GUO, JIAMING

Dates

Publication Date: 20260506
Application Date: 20200903

Claims (9)

A winograd convolution operation method carried out by a winograd convolution operation apparatus, comprising: splitting data in a winograd convolution operation into a plurality of sub-tensors during a transformation process of the winograd convolution operation, including parsing the data to obtain the plurality of sub-tensors, where the data is a sum of the plurality of sub-tensors, and the number of the plurality of sub-tensors is the same as the number of non-zero elements in the data, and each sub-tensor has a single non-zero element, and the non-zero element in each sub-tensor is the same as the non-zero element in a corresponding position in the data; transforming the plurality of sub-tensors to obtain winograd transformation results of the plurality of sub-tensors, and summing the winograd transformation results of the plurality of sub-tensors to obtain a winograd transformation result of the data, wherein a summation operation of the winograd transformation results of the plurality of sub-tensors is completed by a plurality of operation sub-apparatuses based on set strategies, including determining the number of non-zero elements in each transformed matrix corresponding to each sub-tensor; according to the number of the non-zero elements, determining an order of summing the winograd transformation results of the plurality of sub-tensors, so as to enable the plurality of operation sub-apparatuses to perform the summation operation in a manner of load balancing; and according to the determined order of summing winograd transformation results of the plurality of sub-tensors and a preset mapping relationship between element positions in a result matrix and operation sub-apparatuses, distributing addition tasks to the plurality of operation sub-apparatuses; and completing the winograd convolution operation according to the winograd transformation result of the data.
The method of claim 1, wherein before distributing the addition tasks to the plurality of operation sub-apparatuses, the method further comprises: determining the number of addition tasks required for the summation operation according to the winograd transformation results of the plurality of sub-tensors; determining the number of clocks according to a size of the sub-tensors or the result matrix; and determining the number of the operation sub-apparatuses according to the number of the clocks and the number of the addition tasks required for the summation operation.
The method of any one of claims 1-2, wherein transforming the plurality of sub-tensors to obtain the winograd transformation results of the plurality of sub-tensors includes: obtaining a winograd transformation result of a meta-tensor corresponding to each sub-tensor, wherein the meta-tensor is a tensor that sets a non-zero element of a sub-tensor as 1; setting a non-zero element value of the sub-tensor as a coefficient to be multiplied by the winograd transformation result of the meta-tensor corresponding to the sub-tensor to obtain a winograd transformation result of the sub-tensor.
A winograd convolution operation apparatus, comprising means for carrying out the method of claims 1 to 3, wherein the winograd convolution operation apparatus comprises: a splitting unit configured to split data in a winograd convolution operation into a plurality of sub-tensors during a transformation process of the winograd convolution operation; a transformation and summation operation unit configured to transform the plurality of sub-tensors to obtain winograd transformation results of the plurality of sub-tensors and sum the winograd transformation results of the plurality of sub-tensors to obtain a winograd transformation result of the data, wherein a summation operation of the winograd transformation results of the plurality of sub-tensors is completed by a plurality of operation sub-apparatuses based on set strategies; and a convolution operation unit configured to complete the winograd convolution operation according to the winograd transformation result of the data.
A computer-readable storage medium, on which an instruction is stored, wherein when run on a computer, the instruction enables the computer to execute all the method steps of any one of claims 1-3.
A computer program comprising instructions which, when executed by a computer, cause the computer to carry out all the method steps of any of claims 1-3.
An artificial intelligence chip, comprising the winograd convolution operation apparatus of claim 4.
An electronic device comprising the artificial intelligence chip of claim 7.
A board card, comprising a storage component, an interface apparatus, a control component and the artificial intelligence chip of claim 7, wherein the artificial intelligence chip is connected to the storage component, the control component, and the interface apparatus, respectively; the storage component is configured to store data; the interface apparatus is configured to implement data transfer between the artificial intelligence chip and an external device; and the control component is configured to monitor a state of the artificial intelligence chip.

Description

CROSS REFERENCE TO RELATED APPLICATION This application claims priority to Chinese patent application No. 201911061091.9 filed on November 1, 2019 and entitled "WINOGRAD CONVOLUTION OPERATION METHOD, APPARATUS, AND DEVICE, AND STORAGE MEDIUM". TECHNICAL FIELD This disclosure relates to the technical field of artificial intelligence and in particular relates to a kind of winograd convolution operation method, apparatus, device, and storage medium. BACKGROUND With the development of artificial intelligence technologies, a convolution neural network model has emerged. The convolution neural network model is a kind of feed-forward neural network model that includes convolution calculations and a deep structure, and the convolution neural network model is one of representative models of deep learning. In the convolution neural network model, convolution operations on neurons and convolution kernels are required. In the prior art, if the convolution operations are implemented by hardware, a large number of multipliers are required to be used on a chip. Since during an implementation process of hardware, overheads brought by multiplication implementations are much higher than that of addition implementations in terms of timing, power consumption and area, processing efficiency of the chip may be greatly reduced if the convolution operations are implemented by the hardware in the prior art. The patent publication US2019042923A1 discloses a Winograd Convolution Accelerator Architecture. SUMMARY Based on this, in order to solve the technical problem above, the present disclosure provides a kind of winograd convolution operation method, apparatus, device, and storage medium. A first aspect of embodiments of the present disclosure provides a winograd convolution operation method. The method may be applied to winograd convolution transformations, and the method may include: splitting data in a winograd convolution operation into a plurality of sub-tensors during a transformation process of the winograd convolution operation;transforming the plurality of sub-tensors to obtain winograd transformation results of the plurality of sub-tensors and summing the winograd transformation results of the plurality of sub-tensors to obtain a winograd transformation result of the data, where a summation operation of the winograd transformation results of the plurality of sub-tensors is completed by a plurality of operation sub-apparatuses based on set strategies; andcompleting the winograd convolution operation according to the winograd transformation result of the data. A second aspect of embodiments of the present disclosure provides a winograd convolution operation apparatus, comprising: a splitting unit configured to split data in a winograd convolution operation into a plurality of sub-tensors during a transformation process of the winograd convolution operation;a transformation and summation operation unit configured to transform the plurality of sub-tensors to obtain winograd transformation results of the plurality of sub-tensors and sum the winograd transformation results of the plurality of sub-tensors to obtain a winograd transformation result of the data, where a summation operation of the winograd transformation results of the plurality of sub-tensors is completed by a plurality of operation sub-apparatuses based on set strategies; anda convolution operation unit configured to complete the winograd convolution operation according to the winograd transformation result of the data. A third aspect of embodiments of the present disclosure provides a winograd convolution operation apparatus, including processors and a memory; where the memory is configured to store a program code; andthe processors are configured to call the program code stored in the memory and execute the method of the first aspect. A fourth aspect of embodiments of the present disclosure provides a computer-readable storage medium, on which an instruction is stored, where when run on a computer, the instruction enables the computer to execute the method of the first aspect. A fifth aspect of embodiments of the present disclosure provides an artificial intelligence chip, including the apparatus of the second aspect or the third aspect. A sixth aspect of embodiments of the present disclosure provides an electronic device, including the artificial intelligence chip of the fifth aspect. A seventh aspect of embodiments of the present disclosure provides a board card, including a storage component, an interface apparatus, a control component and the artificial intelligence chip of the fifth aspect, where the artificial intelligence chip is connected to the storage component, the control component, and the interface apparatus, respectively;the storage component is configured to store data;the interface apparatus is configured to implement data transfer between the artificial intelligence chip and an external device; andthe control component is configured to monitor