US-12619877-B2 - Neural network method and apparatus

US12619877B2US 12619877 B2US12619877 B2US 12619877B2US-12619877-B2

Abstract

A method and apparatus for the pruning of a neural network is provided. The method sets a weight threshold value based on a weight distribution of layers included in a neural network, predicts a change of inference accuracy of a neural network by pruning of each layer based on the weight threshold value, determines a current subject layer to be pruned with a weight threshold value among the layers included in the neural network, and prunes a determined current subject layer.

Inventors

Minkyoung CHO
Wonjo Lee
Seungwon Lee

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20240624
Priority Date: 20190916

Claims (18)

1 . A processor-implemented neural network method of one of more processors pruning a neural network, the method comprising: generating a resultant pruned neural network, for reducing overfitting corresponding to the neural network, through a performing of plural pruning iterations by the one or more processors until a number of all layers of the neural network or at least all of the layers have been pruned, the plural pruning iterations respectively including: determining, based on a weight distribution of layers included in the neural network, a weight threshold value to prune the neural network to a target pruning rate; pruning plural layers of the neural network by the one or more processors based on the determined weight threshold value; calculating a sensitivity of each of the layers corresponding to a change in inference accuracy of the neural network based on an input pruning data set; determining, based on the calculated sensitivities, a current subject layer to be pruned among each of the layers of the neural network; and generating a pruned neural network by pruning the determined current subject layer to reduce a number of weight-based computations compared to the determined current subject layer in the neural network, wherein the generating of the pruned neural network comprises pruning the current subject layer by adjusting a pruning rate of weights of the current subject layer by updating the weight threshold value until a corresponding inference accuracy of the generated pruned neural network is decreased to a threshold accuracy.
2 . The method of claim 1 , wherein the generating of the resultant pruned neural network comprises performing the plural pruning iterations until an inference accuracy of the generated pruned neural network is determined to meet a target inference accuracy threshold.
3 . The method of claim 1 , wherein the determining of the current subject layer to be pruned comprises determining a subject layer that has a lowest sensitivity among the calculated sensitivities as the current subject layer to be pruned.
4 . The method of claim 3 , wherein the lowest sensitivity represents that the determined subject layer has a least effect on a decrease in an inference accuracy of a previously pruned neural network compared to an inference accuracy of a currently trained neural network.
5 . The method of claim 1 , wherein the change in the inference accuracy is predicted based on a difference between an inference accuracy for each of the layers before pruning on each layer is performed and an inference accuracy for each of the layers after pruning on each of the layers is performed.
6 . The method of claim 1 , wherein the determining of the weight threshold value comprises determining a weight value corresponding to the target pruning rate to be the weight threshold value when the weight distribution corresponds to a standard normal distribution.
7 . The method of claim 1 , wherein the updating of the weight threshold value comprises increasing a current weight threshold value when the corresponding inference accuracy of the generated pruned neural network that includes weights pruned to the current weight threshold value is not decreased to the threshold accuracy.
8 . The method of claim 1 , wherein the input pruning data set comprises one of a data set generated by randomly extracting a predetermined number of data sources for each class included in a given data set, or a data set generated by selecting valid classes from the given data set and randomly extracting a predetermined number of data sources for each selected valid class.
9 . The method of claim 1 , wherein the generating of the pruned neural network is performed without retraining of the pruned neural network using the input pruning data set.
10 . A neural network apparatus comprising: one or more processors configured to execute computer-readable instructions; and one or more memories storing the computer-readable instructions, which when executed by the one or more processors configure the one or more processors to generate a resultant pruned neural network, for reducing overfitting corresponding to the neural network, through a performing of plural pruning iterations by the one or more processors until a number of all layers of the neural network or at least all of the layers have been pruned, the plural pruning iterations respectively including: a determination, based on a weight distribution of layers included in the neural network, of a weight threshold value to prune the neural network to a target pruning rate; a pruning of plural layers of the neural network by the one or more processors based on the determined weight threshold value; a calculation of a sensitivity of each of the layers corresponding to a change in inference accuracy of the neural network based on an input pruning data set; a determination, based on the calculated sensitivities, of a current subject layer to be pruned among each of the layers of the neural network; and a generation of a pruned neural network by pruning the determined current subject layer, to reduce a number of weight-based computations compared to the determined current subject layer in the neural network, wherein the generating of the pruned neural network comprises pruning the current subject layer by adjusting a pruning rate of weights of the current subject layer by updating the weight threshold value until a corresponding inference accuracy of the generated pruned neural network is decreased to a threshold accuracy.
11 . The apparatus of claim 10 , wherein the plural pruning iterations is repeated until an inference accuracy of the generated pruned neural network is determined to meet a target inference accuracy threshold.
12 . The apparatus of claim 10 , wherein the one or more processors are further configured to determine a subject layer that has a lowest sensitivity among the calculated sensitivities as the current subject layer to be pruned.
13 . The apparatus of claim 12 , wherein the lowest sensitivity represents that the determined subject layer has a least effect on a decrease in an inference accuracy of a previously pruned neural network compared to an inference accuracy of a currently trained neural network.
14 . The apparatus of claim 10 , wherein the change in the inference accuracy is predicted based on a difference between an inference accuracy for each of the layers before pruning on each layer is performed and an inference accuracy for each of the layers after pruning on each of the layers is performed.
15 . The apparatus of claim 10 , wherein the one or more processors are further configured to determine a weight value corresponding to the target pruning rate to be the weight threshold value when the weight distribution corresponds to a standard normal distribution.
16 . The apparatus of claim 10 , wherein the one or more processors are further configured to increase a current weight threshold value when the corresponding inference accuracy of the generated pruned neural network that includes weights pruned to the current weight threshold value is not decreased to the threshold accuracy.
17 . The apparatus of claim 10 , wherein the input pruning data set comprises one of a data set generated by randomly extracting a predetermined number of data sources for each class included in a given data set, or a data set generated by selecting valid classes from the given data set and randomly extracting a predetermined number of data sources for each selected valid class.
18 . The apparatus of claim 10 , wherein the generation of the pruned neural network is performed without retraining of the pruned neural network using the input pruning data set.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a Continuation Application of U.S. application Ser. No. 16/835,532, filed on Mar. 31, 2020, which claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2019-0113527, filed on Sep. 16, 2019, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes. BACKGROUND 1. Field The following description relates to neural network methods and apparatuses. 2. Description of Related Art A neural network is a processor-implemented computing system which is implemented by referring to a computational architecture. An apparatus processing a neural network may implement a large amount of complex operations on input data. As the input data and the training operations of a neural network increase, connectivity of an architecture forming a neural network may be complicated, accuracy of past training data may be increased, and an overfitting problem may be generated in which reliability of a prediction value on new data is lowered instead of an increase of accuracy with respect to previous training data. Furthermore, the increase in the complexity of a neural network may cause an excessive increase in the memory assignment amount, and result in an inefficient performance in the miniaturization and commercialization of the related device. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. In one general aspect, a processor implemented neural network method includes setting a weight threshold value to prune the neural network to a target pruning rate, based on a determined weight distribution; pruning plural layers of the neural network based on the weight threshold value; predicting a change in inference accuracy of the pruned plural layers of the neural network based on an input pruning data set; determining a current subject layer to be pruned among each of the layers of the neural network, based on the predicted change in inference accuracy; and generating a pruned neural network by pruning the determined current subject layer. The pruning data set may be a predetermined number of data sources that are randomly extracted from each class included in a given data set. The method may further include determining a weight distribution of the layers of the neural network. The current subject layer may be determined to be a layer that is predicted to have a lowest sensitivity to the predicted change in inference accuracy among layers other than a previously pruned layer. The predicting of the change in inference accuracy may include calculating a sensitivity for each of the plural layers based on a difference between an inference accuracy before pruning on each layer is performed, and an inference accuracy after pruning on each of the plural layers is performed. The layer that is predicted to have the lowest sensitivity may correspond to a layer that is predicted to have a least effect on a decrease in the inference accuracy of the neural network. The setting of the weight threshold value may include setting a weight value corresponding to the target pruning rate to be the weight threshold value when the determined weight distribution corresponds to a standard normal distribution. The pruning of the current subject layer may include pruning the current subject layer by adjusting a pruning rate of weights of the current subject layer by updating the weight threshold value until the inference accuracy of the neural network based on the pruning data set is decreased to a threshold accuracy. The updating of the weight threshold value may include increasing a current weight threshold value when the inference accuracy of the neural network that includes weights pruned to the current weight threshold value is not decreased to the threshold accuracy. The determining of the current subject layer and the pruning of the determined current subject layer may be repeatedly performed until a number of all layers or at least all of the plural layers have been pruned. The pruning data set may include one of a data set generated by randomly extracting a predetermined number of data sources for each class included in the given data set, or a data set generated by selecting valid classes from the pruning data set and randomly extracting a predetermined number of data sources for each selected valid class. The providing of the pruning data set may include randomly extracting samples of the predetermined number of data sources from each class included in the given data set; determining a label corresponding to each of the randomly extracted samples by performing inference on the randomly extracted samples