US-12626092-B2 - Method for learning synaptic weight values of a neural network, related data processing method, computer program, calculator and processing system

US12626092B2US 12626092 B2US12626092 B2US 12626092B2US-12626092-B2

Abstract

Method for learning synaptic weight values of a neural network, related data processing method, computer program, calculator and processing system The invention relates to a method for training synaptic weight values of at least one layer of an artificial neural network. The method is computer-implemented, and comprises training the weight values from training data, each weight value from said training being a quantized weight value belonging to a set of quantized values. The set of quantized values consists of values encoded with a predefined number B of bits, and with a quantization step P between two successive quantized values that satisfies: P = 1 ⌊ 2 B - 1 2 ⌋ where └·┘ represents the integer part function; the quantized values also being included in a predefined interval chosen from the interval [−1−P; 1] and the interval [−1; 1].

Inventors

Inna KUCHER
David Briand
Olivier Bichler

Assignees

Commissariat à l'énergie atomique et aux énergies alternatives

Dates

Publication Date: 20260512
Application Date: 20221123
Priority Date: 20211124

Claims (20)

1 . A method, implemented by at least one processor of an electronic calculator, for training synaptic weight values of at least one layer of an artificial neural network, each artificial neuron of a respective layer being adapted to perform a weighted sum of input value(s) and then to apply an activation function to the weighted sum to provide an output value, each input value being received from a respective element connected to an input of said neuron and multiplied by a synaptic weight associated with the connection between said neuron and the respective element, the respective element being an input variable of the neural network or a neuron of a preceding layer of the neural network, the method comprising: training the weight values of the neural network from training data, each weight value obtained from said training being a quantized weight value belonging to a set of quantized values; wherein the set of quantized values consists of values encoded with a predefined number B of bits, and with a quantization step P between two successive quantized values that satisfies: P = 1 ⌊ 2 B - 1 2 ⌋ where └·┘ represents the integer part function; the quantized values also being included in a predefined interval, also called quantization interval, the quantization interval being chosen from the interval [−1−P; 1] and the interval [−1; 1], the method further comprising inferring the previously trained artificial neural network, for the processing, in particular the classification, of data received at input of the electronic calculator, the inferring including performing the weighted sum of input value(s) and then applying the activation function to the weighted sum via integer and/or fixed-point operators and integer and/or fixed-point registers.
2 . The method according to claim 1 , wherein the set of quantized values includes the null value.
3 . The method according to claim 1 , wherein the set of quantized values satisfies the following equation: EQA ={−1− P+i·P;i∈[ 0;2 B −1]}={ i·P;i∈[− 2 B-1 ;2 B-1 −1]} the quantization interval being then equal to the interval [−1−P; 1].
4 . The method according to claim 1 , wherein the set of quantized values satisfies the following equation: EQS={− 1+ i·P;i∈[ 0;2 B −2]}={ i·P;i ∈[−(2 B-1 −1);2 B-1 −1]} the quantization interval being then equal to the interval [−1; 1].
5 . The method according to claim 1 , wherein the method further comprises the following step: initial training of the weight values of the neural network from the training data, each learned weight value being furthermore converted, via a transpose function, into a bounded weight value belonging to a predefined interval, also called bounding interval; the training step being carried out after the initial training step and from the bounded weight values obtained during the initial training; the initial training step forming a first training of the neural network, and the training step forming a second training of the neural network, subsequent to the first training.
6 . The method according to claim 5 , wherein the bounding interval is equal to the quantization interval.
7 . The method according to claim 5 , wherein the transpose function satisfies the following equation: F C ⁢ A ( W i , j ) = ( 1 + P 2 ) · ( tan ⁢ h ( W i , j ) max r , s | tan ⁢ h ( W r , s ) | ) - P 2 where F CA represents a first transpose function, also called asymmetric transpose function; W i,j represents a weight value from a matrix W of weight values; P represents the quantization step; tanh represents the hyperbolic tangent function; └·┘ represents the absolute value function; └·┘ represents the integer part function; max represents the maximum function; the quantization interval being then equal to the interval [−1−P; 1].
8 . The method according to claim 5 , wherein the transpose function satisfies the following equation: F C ⁢ S ( W i , j ) = tan ⁢ h ( W i , j ) max r , s ❘ "\[LeftBracketingBar]" tan ⁢ h ( W r , s ) ❘ "\[RightBracketingBar]" where F CS represents a second transpose function, also called symmetric transpose function; W i,j represents a weight value from a matrix W of weight values; tanh represents the hyperbolic tangent function; └·┘ represents the absolute value function; max represents the maximum function; the quantization interval being then equal to the interval [−1; 1].
9 . The method according to claim 1 , wherein in the training step, each trained weight value is converted via a quantization function into the respective quantized weight value belonging to the quantization interval.
10 . The method according to claim 9 , wherein the quantization function satisfies the following equation: F Q ( W ) = P · round ⁢ ( W P ) where F Q represents the quantization function; W represents a respective weight value; P represents the quantization step; and round a rounding operation.
11 . The method according to claim 1 , wherein the predefined number B of bits is less than or equal to 8.
12 . The method according to claim 11 , wherein the predefined number of bits B is between 3 and 5.
13 . The method according to claim 1 , wherein the artificial neural network is configured to process data.
14 . The method according to claim 13 , wherein the artificial neural network is configured to classify data.
15 . The method according to claim 1 , wherein the artificial neural network is configured to be implemented by an electronic calculator connected to a sensor, for processing at least one object from the sensor.
16 . A non-transitory computer-readable medium including a computer program comprising software instructions which, when executed by a computer, implement a training method according to claim 1 .
17 . An electronic calculator for processing data, via the implementation of a network of artificial neurons, each artificial neuron of a respective layer of the neural network being adapted to perform a weighted sum of input value(s) and then to apply an activation function to the weighted sum to provide an output value, each input value being received from a respective element connected to an input of said neuron and multiplied by a synaptic weight associated with the connection between said neuron and the respective element, the respective element being an input variable of the neural network or a neuron of a preceding layer of the neural network, the calculator comprising: at least one processor configured to train the weight values of the neural network from training data, each weight value obtained from said training being a quantized weight value belonging to a set of quantized values; wherein the set of quantized values consists of values encoded with a predefined number B of bits, and with a quantization step P between two successive quantized values that satisfies: P = 1 ⌊ 2 B - 1 2 ⌋ where └·┘ represents the integer part function; the quantized values also being included in a predefined interval, also called quantization interval, the quantization interval being chosen from the interval [−1−P; 1] and the interval [−1; 1], infer the previously trained artificial neural network, for the processing, in particular the classification, of data received at input of the electronic calculator, perform the weighted sum of input value(s) and then apply an activation function to the weighted sum via integer and/or fixed-point operators and integer and/or fixed-point registers.
18 . The calculator according to claim 17 , wherein the registers are registers of up to 8 bits.
19 . The calculator according to claim 17 , wherein trained quantized weight values are multiplied by an integer multiple equal to ⌊ 2 B - 1 2 ⌋ for the inference of the neural network, with B representing the predefined number of bits used for encoding the quantized weight values and └·┘ representing the integer part function.
20 . The calculator according to claim 17 , wherein the activation function is applied according to the following equation: A ⁡ ( Q ) = α 2 I 2 * round ⁢ ( I 2 α 2 ⁢ α 1 I 1 ⁢ Υ ⁢ 1 ⌊ I w 2 ⌋ * clip ( ( Q * n ) + β Υ ⁢ I 1 α 1 ⁢ ⌊ I w 2 ⌋ ; 0 ; I 2 Υ ⁢ I 1 α 1 ⁢ ⌊ I w 2 ⌋ ) ) where A represents a global activation function for the fusion of a convolution layer and a subsequent batch normalisation layer; Q are the weights belonging to the interval [ - ⌊ I w 2 ⌋ , ⌊ I w 2 ⌋ ] , I 1 is an integer equal to 2 Bc −1, with Bc representing a predefined number of bits used for encoding the previous batch normalisation layer; I 2 is an integer equal to 2 Bn −1, with Bn representing a predefined number of bits used for encoding the current batch normalisation layer; I w is an integer equal to 2 Bw −1, with Bw representing a predefined number of bits used for encoding the weights of the convolution layer; n is an integer, corresponding to the output of the rounding operation of the previous layer; β and Y are parameters of the current batch normalization layer; α 1 , α 2 are parameters of the clip activation function defined below, α 1 being associated with the activation function of the previous batch normalization layer and α 2 with that of the current batch normalization layer; └·┘ represents the integer part function; round represents a rounding operation; clip represents an activation function that satisfies the following equation: clip ⁢ ( x ; 0 ; α i ) = 1 2 ⁢ ( ❘ "\[LeftBracketingBar]" x ❘ "\[RightBracketingBar]" - ❘ "\[LeftBracketingBar]" x - α i ❘ "\[RightBracketingBar]" + α i ) .

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a U.S. non-provisional application claiming the benefit of French Application No. 21 12441, filed on Nov. 24, 2021, which is incorporated herein by reference in its entirety. FIELD The present invention relates to a method for training synaptic weight values of at least one layer of an artificial neural network, each artificial neuron of a respective layer being adapted to perform a weighted sum of input value(s) and then to apply an activation function to the weighted sum to provide an output value, each input value being received from a respective element connected to an input of said neuron and multiplied by a synaptic weight associated with the connection between said neuron and the respective element, the respective element being an input variable of the neural network or a neuron of a preceding layer of the neural network. The method is computer-implemented, and comprises training the weight values of the neural network from training data, each weight value from said training being a quantized weight value belonging to a set of quantized values. The invention further relates to a data processing method, in particular for classifying data, the method being implemented by an electronic calculator implementing such an artificial neural network. A further object of the invention is a non-transitory computer-readable medium including a computer program comprising software instructions which, when executed by a computer, implement such a training method. The invention also relates to an electronic calculator for processing data, in particular for classifying data, via the implementation of such an artificial neural network; as well as an electronic system for processing object(s), comprising a sensor and such an electronic calculator connected to the sensor, the calculator being configured to process each object from the sensor. The invention relates to the field of training artificial neural networks, also known as ANNs. Examples of artificial neural networks are convolutional neural networks, also known as CNNs, recurrent neural networks, such as Long Short-Term Memory (LTSM), or Transformer neural networks, typically used in the field of automatic language processing (ALP). The invention further relates to the field of electronic calculators, also known as chips, for implementing such neural networks, these electronic calculators making it possible to use the neural network during an inference phase, after a prior phase of training the neural network from training data, the training phase typically being implemented by computer. BACKGROUND A known technique for significantly reducing a memory footprint during the training phase is based on network quantization. Quantization involves reducing the number of bits used to encode each synaptic weight, so that the total memory footprint is reduced by the same factor. The article “Towards Efficient Training for Neural Network Quantization” by Q. Jin et al describe a training method of the above type, with quantization of synaptic weight values, also known as Scale-Adjusted Training (SAT), which allows the compression of weights and activations to a reduced number of state levels that can be represented in a predefined number of bits, typically no more than 8 bits. During training, weights and activations are represented as a floating point and on the interval [−1,1] for weights and the interval [0,+∞] for activations when the activation function is of the rectified linear unit type, also noted ReLU (or the interval [0, α] for activations quantized with the SAT method). The weight quantization algorithm used by the SAT method is described in the article “DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients” by S. Zhou et al, also called the DoReFa algorithm. However, the results obtained with such a method are insufficient when the neural network is implemented with integer and/or fixed-point operators and integer and/or fixed-point registers. SUMMARY The aim of the invention is then to propose a method of training a neural network which thereafter allows inference of said network with integer and/or fixed-point operators and integer and/or fixed-point registers. To this end, the invention relates to a method for training synaptic weight values of at least one layer of an artificial neural network, each artificial neuron of a respective layer being adapted to perform a weighted sum of input value(s) and then to apply an activation function to the weighted sum to provide an output value, each input value being received from a respective element connected to an input of said neuron and multiplied by a synaptic weight associated with the connection between said neuron and the respective element, the respective element being an input variable of the neural network or a neuron of a preceding layer of the neural network,the method being computer-implemented and comprisin