CN-116108913-B - Pruning method and device of neural network, electronic equipment and storage medium

CN116108913BCN 116108913 BCN116108913 BCN 116108913BCN-116108913-B

Abstract

The embodiment of the application discloses a pruning method, a pruning device, electronic equipment and a storage medium of a neural network, wherein the method comprises the steps of determining initial measurement values of output channels of an ith layer and target parameter values of the ith layer in the neural network to be pruned; normalizing the initial measurement value of each output channel of the ith layer according to the target parameter value of the ith layer to obtain the target measurement value of each output channel of the ith layer, pruning the ith layer according to the target measurement value of each output channel of the ith layer to obtain a pruned neural network, wherein i is a positive integer. Therefore, the objective measure value of each output channel of the ith layer is used for pruning the ith layer, so that global pruning treatment of cross-layer comparison can be realized, the reasoning effect of the small model after pruning (namely the neural network after pruning) is more remarkable, and meanwhile, the precision of the small model after pruning is also improved.

Inventors

LI XUECHEN

Assignees

哲库科技(上海)有限公司

Dates

Publication Date: 20260505
Application Date: 20230223

Claims (14)

1. A pruning method of a neural network, applied to an electronic device, the method comprising: determining an initial measure value of each output channel of an ith layer in a neural network to be pruned and a target parameter value of the ith layer; Normalizing the initial measurement value of each output channel of the ith layer according to the target parameter value of the ith layer to obtain the target measurement value of each output channel of the ith layer; Pruning is carried out on the ith layer according to the target measure value of each output channel of the ith layer to obtain a pruned neural network, wherein i is a positive integer, and the pruned neural network is used for determining an output result corresponding to input data applied to at least one of the machine vision field, the natural language processing field, the automatic driving field and the robot field; Pruning the ith layer according to the target measure value of each output channel of the ith layer to obtain a pruned neural network, wherein the pruning process comprises the following steps: determining the target sparsity and the parameter number of the neural network to be pruned; determining a target threshold of the ith layer according to the target sparsity, the parameter number and the target parameter value of the ith layer; and pruning the ith layer according to the target threshold value of the ith layer and the target measure value of each output channel of the ith layer to obtain the pruned neural network.
2. The method of claim 1, wherein the target parameter value comprises a number of output channels.
3. The method of claim 1, wherein the determining the target threshold for the i-th layer based on the target sparsity, the number of parameters, and the target parameter value for the i-th layer comprises: determining a global threshold according to the target sparsity and the parameter number; Determining an initial threshold according to the target sparsity and the target parameter value of the ith layer; and determining a minimum value in the global threshold and the initial threshold, and taking the minimum value as a target threshold of the ith layer.
4. A method according to claim 3, wherein said determining a global threshold from said target sparsity and said number of parameters comprises: Determining target measure values of all output channels of all layers in the neural network to be pruned, wherein all layers in the neural network to be pruned comprise the ith layer; Performing product calculation on the target sparsity and the parameter quantity to determine a first product; And determining the global threshold according to the target measure value of each output channel of all layers and the first product.
5. The method of claim 4, wherein said determining said global threshold based on said first product and a target measure value for each output channel of said all layers comprises: Sorting the target measure values of all the output channels of all the layers from small to large to obtain a first sorting result; And taking the p-th target measure value in the first sorting result as the global threshold, wherein p represents the first product and p is a positive integer.
6. A method according to claim 3, wherein said determining an initial threshold from said target sparsity and target parameter values of said i-th layer comprises: Performing product calculation on the target sparsity and the target parameter value of the ith layer to determine a second product; And determining the initial threshold according to the target measure value of each output channel of the ith layer and the second product.
7. The method of claim 6, wherein said determining said initial threshold based on said second product and a target measure value for each output channel of said i-th layer comprises: Sorting the target measure values of the output channels of the ith layer from small to large to obtain a second sorting result; and taking the q-th target measure value in the second sorting result as the initial threshold, wherein q represents the second product and q is a positive integer.
8. The method according to claim 1, wherein pruning the ith layer according to the target threshold of the ith layer and the target measure value of each output channel of the ith layer comprises: Comparing the target threshold value of the ith layer with the target measure value of each output channel of the ith layer; if j target measure values in the target measure values of each output channel of the ith layer are smaller than the target threshold value of the ith layer, deleting the output channels corresponding to the j target measure values; wherein j is an integer and , And the number of the output channels of the ith layer.
9. The method of claim 1, wherein determining the initial measure of each output channel of the ith layer in the neural network to be pruned comprises: Determining a filtering vector of each output channel of the ith layer; and determining the initial measure value of each output channel of the ith layer according to the filtering vector of each output channel of the ith layer.
10. The method of claim 9, wherein determining the initial measure of each output channel of the ith layer based on the filtered vector of each output channel of the ith layer comprises: Calculating the distance between the filtering vector of the first output channel and the filtering vector of each output channel except the first output channel in the ith layer; performing accumulation calculation on the calculated distances to determine an initial measure value of the first output channel; Wherein the first output channel is one of the output channels of the ith layer.
11. A pruning device of a neural network, characterized in that the pruning device of the neural network comprises a determining unit, a normalizing unit and a pruning unit, wherein: the determining unit is configured to determine initial measurement values of output channels of an ith layer and target parameter values of the ith layer in the neural network to be pruned; The normalization unit is configured to normalize the initial measurement value of each output channel of the ith layer according to the target parameter value of the ith layer to obtain the target measurement value of each output channel of the ith layer; the pruning unit is configured to prune the ith layer according to the target measure value of each output channel of the ith layer to obtain a pruned neural network, wherein i is a positive integer, and the pruned neural network is used for determining an output result corresponding to input data applied to at least one of the machine vision field, the natural language processing field, the automatic driving field and the robot field; the determining unit is configured to determine the target sparsity and the parameter number of the neural network to be pruned, and determine the target threshold value of the ith layer according to the target sparsity, the parameter number and the target parameter value of the ith layer; And the pruning unit is configured to prune the ith layer according to the target threshold value of the ith layer and the target measure value of each output channel of the ith layer to obtain the pruned neural network.
12. An electronic device comprising a processor and a memory for storing a computer program, the processor being adapted to invoke and execute the computer program stored in the memory for performing the method according to any of claims 1 to 10.
13. A chip comprising a processor for calling and running a computer program from a memory, causing a device on which the chip is mounted to perform the method of any one of claims 1 to 10.
14. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program which, when executed by at least one processor, implements the method according to any one of claims 1 to 10.

Description

Pruning method and device of neural network, electronic equipment and storage medium Technical Field The present application relates to the field of artificial intelligence technologies, and in particular, to a pruning method and apparatus for a neural network, an electronic device, and a storage medium. Background The deep learning algorithm based on the neural network is widely applied to the fields of machine vision, natural language processing, automatic driving, robots and the like, however, the complex structure and the huge parameter quantity of the deep learning algorithm put higher requirements on the computing power, memory and other resources of a computer, so that the deployment of the neural network on the edge equipment with limited resources is severely limited. Among them, related technologies propose a series of model compression algorithms, for example, a structured pruning algorithm is used to reduce the model size, so as to reduce the demands of the neural network on resources such as computing power and memory. However, in the structured pruning algorithm, although the neural network may be pruned in a layer-by-layer pruning manner, the inference effect of the pruned small model obtained in this manner is not significant, and the accuracy of the pruned small model still needs to be improved. Disclosure of Invention The embodiment of the application provides a pruning method, a pruning device, electronic equipment and a storage medium of a neural network, which can enable the reasoning effect of a small model after pruning to be more remarkable and improve the accuracy of the small model after pruning. In order to achieve the above purpose, the technical scheme of the application is realized as follows: In a first aspect, an embodiment of the present application provides a pruning method of a neural network, where the method includes: Determining an initial measure value of each output channel of the ith layer in the neural network to be pruned and a target parameter value of the ith layer; Normalizing the initial measurement value of each output channel of the ith layer according to the target parameter value of the ith layer to obtain the target measurement value of each output channel of the ith layer; Pruning is carried out on the ith layer according to the target measure value of each output channel of the ith layer to obtain a pruned neural network, wherein i is a positive integer. In a second aspect, an embodiment of the present application provides a pruning device of a neural network, where the pruning device of the neural network includes a determining unit, a normalizing unit, and a pruning unit, where: The determining unit is configured to determine initial measurement values of output channels of an ith layer and target parameter values of the ith layer in the neural network to be pruned; The normalization unit is configured to normalize the initial measurement values of the output channels of the ith layer according to the target parameter values of the ith layer to obtain the target measurement values of the output channels of the ith layer; And the pruning unit is configured to prune the ith layer according to the target measure value of each output channel of the ith layer to obtain a pruned neural network, wherein i is a positive integer. In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory. The memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory to execute the pruning method of the neural network. In a fourth aspect, a chip provided by an embodiment of the present application is configured to implement the pruning method of the neural network described in the first aspect. Specifically, the chip comprises a processor for calling and running a computer program from a memory, so that a device provided with the chip executes the pruning method of the neural network. In a fifth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program, where the computer program is executed by at least one processor to implement the pruning method of the neural network according to the first aspect. In a sixth aspect, an embodiment of the present application provides a computer program product, including computer program instructions, where the computer program instructions cause a computer to execute the pruning method of the neural network according to the first aspect. In a seventh aspect, an embodiment of the present application provides a computer program, which when executed on a computer, causes the computer to execute the pruning method of the neural network described in the first aspect. The embodiment of the application provides a pruning method of a neural network, which comprises the steps of firstly determining an initial measurement value of each output chan