US-12626130-B2 - Method and device for compressing neural network

US12626130B2US 12626130 B2US12626130 B2US 12626130B2US-12626130-B2

Abstract

A method for compressing a neural network includes: obtaining a neural network including J operation layers; compressing a j th operation layer with K j compression ratios to generate K j operation branches; obtaining K j weighting factors; replacing the j th operation layer with the K j operation branches weighted by the K j weighting factors to generate a replacement neural network; performing forward propagation to the replacement neural network, a weighted sum operation being performed on K j operation results generated by the K j operation branches with the K j weighting factors and a result of the weighted sum operation being used as an output of the j th operation layer; performing backward propagation to the replacement neural network, updated values of the K j weighting factors being calculated based on a model loss; and determining an operation branch corresponding to the maximum value of the updated values of the K j weighting factors as a compressed j th operation layer.

Inventors

Zhen Dong
Yuanfei NIE
Huan FENG

Assignees

MONTAGE TECHNOLOGY CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20211119
Priority Date: 20201120

Claims (17)

1 . A method for compressing a neural network, comprising: obtaining an original neural network to be compressed, the original neural network comprising J operation layers to be compressed, where J is an integer greater than 1; compressing a j th operation layer of the J operation layers with K j different compression ratios to generate K j operation branches, where j and K j are integers, 1≤j≤J, and K j ≥1; obtaining, for the j th operation layer, a set of K j weighting factors corresponding to the respective K j operation branches, where the K j weighting factors have respective initial values; replacing the j th operation layer with the K j operation branches weighted by the set of K j weighting factors, to generate a replacement neural network; performing forward propagation to the replacement neural network based on a preset dataset to generate K j operation results, where a weighted sum operation is performed on the K j operation results with the K j weighting factors and a result of the weighted sum operation is used as an output of the j th operation layer, and where the K j operation results is generated by; compressing an input data from the preset dataset to generate K j compressed input data corresponding to the K j operation branches respectively, where the K j compressed input data have K j different accuracies corresponding to the K j different compression ratios of the K j operation branches respectively; and performing the forward propagation on the K j replaced operation branches and the K j compressed input data to generate the K j operation results; performing backward propagation to the replacement neural network based on the preset dataset, where, for the j th operation layer, updated values of the K j weighting factors are calculated based on a model loss of the replacement neural network relative to the original neural network, wherein the model loss is determined based on a product of a loss function determined based on an application type of the original neural network, and a performance index related to a hardware index of a hardware platform on which the original neural network is to be deployed; and determining, for the j th operation layer, an operation branch corresponding to a maximum value of the updated values of the K j weighting factors as a compressed j th operation layer.
2 . The method of claim 1 , wherein the application type of the original neural network comprises: classification, positioning, detection or segmentation.
3 . The method of claim 1 , wherein the forward propagation and the backward propagation to the replacement neural network are iteratively performed for multiple times based on the preset dataset, and in the iteration process of the forward propagation and the backward propagation, the updated values of the K j weighting factors obtained in a backward propagation are assigned to the K j weighting factors to be used in a forward propagation next to the backward propagation.
4 . The method of claim 3 , further comprising: calculating, in the iteration process, a model size of the replacement neural network based on the operation branch corresponding to the maximum value of the updated values of the K j weighting factors; obtaining a change in the model size of the replacement neural network calculated after each iteration of the iteration process; and stopping the iteration process when the change is within a preset range.
5 . The method of claim 1 , wherein the j th operation layer to be compressed comprises a convolutional layer, an activation layer, a batch normalization layer, a pooling layer, or a fully connected layer.
6 . The method of claim 1 , further comprising: normalizing values of the K j weighting factors before performing the weighted sum operation on the K j operation results generated by the K j operation branches with the K j weighting factors.
7 . The method of claim 1 , wherein a parameter of the j th operation layer is represented by an N 0j -bit binary number, parameters of the K j operation branches are represented by N 1j -bit, N 2j -bit, . . . , N K j j -bit binary numbers respectively, N 0j , N 1j , N 2j , . . . , N K j j are integers greater than or equal to 1, and N 1j , N 2j , . . . , N K j j are less than or equal to N 0j .
8 . The method of claim 1 , wherein the hardware index of a hardware platform on which the original neural network is to be deployed comprises: a storage space, a number of floating-point operations, a delay time or a power consumption.
9 . A device for compressing a neural network, comprising: a processor; and a memory, wherein the memory stores program instructions that are executable by the processor, and when executed by the processor, the program instructions cause the processor to perform: obtaining an original neural network to be compressed, the original neural network comprising J operation layers to be compressed, where J is an integer greater than 1; compressing a j th operation layer of the J operation layers with K j different compression ratios to generate K j operation branches, where j and K j are integers, 1≤j≤J, and K j ≥1; obtaining, for the j th operation layer, a set of K j weighting factors corresponding to the respective K j operation branches, where the K j weighting factors have respective initial values; replacing the j th operation layer with the K j operation branches weighted by the set of K j weighting factors, to generate a replacement neural network; performing forward propagation to the replacement neural network based on a preset dataset to generate K j operation results, where a weighted sum operation is performed on the K j operation results with the K j weighting factors and a result of the weighted sum operation is used as an output of the j th operation layer, and where the K j operation results is generated by; compressing an input data from the preset dataset to generate K j compressed input data corresponding to the K j operation branches respectively, where the K j compressed input data have K j different accuracies corresponding to the K j different compression ratios of the K j operation branches respectively; and performing the forward propagation on the K j replaced operation branches and the K j compressed input data to generate the K j operation results; performing backward propagation to the replacement neural network based on the preset dataset, where, for the j th operation layer, updated values of the K j weighting factors are calculated based on a model loss of the replacement neural network relative to the original neural network, wherein the model loss is determined based on a product of a loss function determined based on an application type of the original neural network, and a performance index related to a hardware index of a hardware platform on which the original neural network is to be deployed; and determining, for the j th operation layer, an operation branch corresponding to a maximum value of the updated values of the K j weighting factors as a compressed j th operation layer.
10 . The device of claim 9 , wherein the application type of the original neural network comprises: classification, positioning, detection or segmentation.
11 . The device of claim 9 , wherein the forward propagation and the backward propagation to the replacement neural network are iteratively performed for multiple times based on the preset dataset, and in the iteration process of the forward propagation and the backward propagation, the updated values of the K j weighting factors obtained in a backward propagation are assigned to the K j weighting factors to be used in a forward propagation next to the backward propagation.
12 . The device of claim 11 , wherein when executed by the processor, the program instructions further cause the processor to perform: calculating, in the iteration process, a model size of the replacement neural network based on the operation branch corresponding to the maximum value of the updated values of the K j weighting factors; obtaining a change in the model size of the replacement neural network calculated after each iteration of the iteration process; and stopping the iteration process when the change is within a preset range.
13 . The device of claim 9 , wherein the j th operation layer to be compressed comprises a convolutional layer, an activation layer, a batch normalization layer, a pooling layer, or a fully connected layer.
14 . The device of claim 9 , when executed by the processor, the program instructions further cause the processor to perform: normalizing values of the K j weighting factors before performing the weighted sum operation on the K j operation results generated by the K j operation branches with the K j weighting factors.
15 . The device of claim 9 , wherein a parameter of the j th operation layer is represented by an N 0j -bit binary number, parameters of the K j operation branches are represented by N 1j -bit, N 2j -bit, . . . , N K j j -bit binary numbers respectively, N 0j , N 1j , N 2j , . . . , N K j j are integers greater than or equal to 1, and N 1j , N 2j , . . . , N K j j are less than or equal to N 0j .
16 . The device of claim 9 , wherein the hardware index of a hardware platform on which the original neural network is to be deployed comprises: a storage space, a number of floating-point operations, a delay time or a power consumption.
17 . A non-transitory computer-readable storage medium having stored therein instructions that, when executed by a processor, cause the processor to perform a method for compressing a neural network, the method comprising: obtaining an original neural network to be compressed, the original neural network comprising J operation layers to be compressed, where J is an integer greater than 1; compressing a j th operation layer of the J operation layers with K j different compression ratios to generate K j operation branches, where j and K j are integers, 1≤j≤J, and K j ≥1; obtaining, for the j th operation layer, a set of K j weighting factors corresponding to the respective K j operation branches, where the K j weighting factors have respective initial values; replacing the j th operation layer with the K j operation branches weighted by the set of K j weighting factors, to generate a replacement neural network; performing forward propagation to the replacement neural network based on a preset dataset to generate K j operation results, where a weighted operation is performed on the K j operation results with the K j weighting factors and a result of the weighted sum operation is used as an output of j th operation layer, and wherein the K j operation results is generated by; compressing an input data from the preset dataset to generate K j compressed input data corresponding to the K j operation branches respectively, where the K j compressed input data have K j different accuracies corresponding to the K j different compression ratios of the K j operation branches respectively; and performing the forward propagation on the K j replaced operation branches and the K j compressed input data to generate the K j operation results; performing backward propagation to the replacement neural network based on the preset dataset, where, for the j th operation layer, updated values of the K j weighting factors are calculated based on a model loss of the replacement neural network relative to the original neural network, wherein the model loss is determined based on a product of a loss function determined based on an application type of the original neural network, and a performance index related to a hardware index of a hardware platform on which the original neural network is to be deployed; and determining, for the j th operation layer, an operation branch corresponding to a maximum value of the updated values of the K j weighting factors as a compressed j th operation layer.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application claims priority to Chinese patent application No. 202011308961.0 filed on Nov. 20, 2020, the entire content of which is incorporated herein by reference. TECHNICAL HELD This application relates to the field of neural network, and in particular, to a method and a device for compressing a neural network. BACKGROUND Nowadays, neural networks have been widely used in many technical fields, such as image recognition, voice recognition, autonomous driving, and medical imaging. For example, convolutional neural network (CNN) is a representative network structure and algorithm of the neural network technology, and has achieved great success in the image processing application. However, the neural network has too many computation layers and parameters, which take up a large amount of storage and computing resources, thereby limiting its application on a hardware platform with limited resources (for example, an embedded system). SUMMARY An objective of the present application is to provide a method for compressing a neural network, which can obtain a higher compression ratio with less accuracy loss. In an aspect of the application, a method for compressing a neural network is provided. The method may include: obtaining an original neural network to be compressed, the original neural network including J operation layers to be compressed, where J is an integer greater than 1; compressing a jth operation layer of the J operation layers with Kj different compression ratios to generate Kj operation branches, where j and Kj are integers, 1≤j≤J, and Kj≥1; obtaining, for the jth operation layer, a set of Kj weighting factors corresponding to the respective Kj operation branches, where the KJ weighting factors have respective initial values; replacing the jth operation layer with the Kj operation branches weighted by the set of Kj weighting factors, to generate a replacement neural network; performing forward propagation to the replacement neural network based on a preset dataset, where a weighted sum operation is performed on Kj operation results generated by the Kj operation branches with the Kj weighting factors and a result of the weighted sum operation is used as an output of the jth operation layer; performing backward propagation to the replacement neural network based on the preset dataset, where, for the jth operation layer, updated values of the Kj weighting factors are calculated based on a model loss of the replacement neural network relative to the original neural network; and determining, for the jth operation layer, an operation branch corresponding to a maximum value of the updated values of the Kj weighting factors as a compressed jth operation layer. In another aspect of the application, a device for compressing a neural network is provided. The device may include: a processor; and a memory, wherein the memory stores program instructions that are executable by the processor, and when executed by the processor, the program instructions cause the processor to perform: obtaining an original neural network to be compressed, the original neural network including J operation layers to be compressed, where J is an integer greater than 1; compressing at jth operation layer of the J operation layers with Kj different compression ratios to generate Kj operation branches, where j and Kj are integers, 1≤j≤J, and Kj≥1; obtaining, for the jth operation layer, a set of Kj weighting factors corresponding to the respective Kj operation branches, where the Kj weighting factors have respective initial values; replacing the jth operation layer with the Kj operation branches weighted by the set of Kj weighting factors, to generate a replacement neural network; performing forward propagation to the replacement neural network based on a preset dataset, where a weighted sum operation is performed on Kj operation results generated by the Kj operation branches with the Kj weighting factors and a result of the weighted sum operation is used as an output of the jth operation layer; performing backward propagation to the replacement neural network based on the preset dataset, where, for the jth operation layer, updated values of the Kj weighting factors are calculated based on a model loss of the replacement neural network relative to the original neural network; and determining, for the jth operation layer, an operation branch corresponding to a maximum value of the updated values of the Kj weighting factors as a compressed jth operation layer. In another aspect of the application, a non-transitory computer-readable storage medium is provided. The non-transitory computer-readable storage medium has stored therein instructions that, when executed by a processor, cause the processor to perform a method for compressing a neural network, the method including: obtaining an original neural network to be compressed, the original neural network including J operation layers to be compress