CN-116721305-B - Hybrid precision quantized perception training method based on neural network structure search

CN116721305BCN 116721305 BCN116721305 BCN 116721305BCN-116721305-B

Abstract

A hybrid precision quantized perception training method based on neural network structure search comprises the steps of inputting an image original data set, dividing a training data set and a verification data set, obtaining gradients of a super gateway in the training set, updating a weight of the super gateway, obtaining gradients of the super gateway in the verification set, updating bit importance parameters of the super network, storing current hybrid precision configuration, completing set iteration times or complexity of the current hybrid precision configuration to be lower than expected complexity, obtaining a hybrid precision configuration set of a target network under different constraints, conducting quantized perception training on the hybrid precision network under different constraints from flatness of a minimum value area of a model loss function, searching optimal hybrid precision configuration of the model under constraint conditions with low cost calculation cost by means of calculation equivalence of parameter sharing and convolution operators, and further improving generalization capability of a low-bit quantization model or a hybrid precision model containing a low-bit quantization layer by minimizing target loss value and quantization loss.

Inventors

Shang Fanhua
CHEN FEI
LIU HONGYING
LIU YUANYUAN
REN YAN
WAN LIANG

Assignees

天津大学

Dates

Publication Date: 20260512
Application Date: 20230411

Claims (5)

1. A hybrid precision quantized perception training method based on neural network structure search is characterized by comprising the following specific steps: (1) Processing the image raw dataset, dividing it into a training dataset Dtrain and a validation dataset Dval; (2) Constructing a super-network, wherein each quantifiable layer in the super-network adopts a composite convolution module, each composite convolution module comprises a plurality of quantized branches, different branches have independent learnable quantization step sizes, all branches share the same full-precision weight tensor, and the output of the quantized branches is aggregated and then only subjected to one convolution operation by utilizing the calculation equivalence of a convolution operator during forward propagation; sampling a batch of data samples from the training dataset Dtrain, inputting the data samples into the super-network for forward reasoning, and obtaining a current target loss value; (3) Updating the weight of the super-network, namely updating the weight of the current super-network model by using a gradient descent method; (4) The gradient of the super gateway in the verification set is obtained by sampling a batch of data samples from the verification dataset Dval, inputting the data samples into the super network for forward reasoning, and obtaining the current target loss value; (5) Updating bit importance parameters of the super network, namely updating weight bit importance parameters and activation value bit importance parameters of the super network by using a gradient descent method; (6) Storing the current mixed precision configuration, namely taking a quantization bit corresponding to a probability maximum item from weight bit importance parameters and activation value bit importance parameters of the super network as the mixed precision configuration of the current target network, and storing the mixed precision configuration in a file form; (7) Repeating the steps (2) to (5) until the set iteration times or the complexity of the current mixed precision configuration is lower than the expected complexity; (8) Acquiring a mixed precision configuration set of a target network under different constraints by reading the content of the mixed precision configuration file stored in the step (6); (9) And starting from the flatness of the minimum value area of the model loss function, carrying out quantized perception training on the mixed precision network under different constraints.
2. The hybrid precision quantized perceptual training method based on neural network structure search of claim 1, wherein the expected complexity in the step (7) is a complexity which is set by a user according to requirements of model calculation cost in an actual application scene.
3. The method for hybrid precision quantized perceptual training based on neural network structure search of claim 1, wherein the different constraints in step (8) and step (9) comprise different model sizes and different model calculation costs, i.e. calculation complexity.
4. The hybrid precision quantized perceptual training method based on neural network structure search of claim 1, wherein the specific method of the step (9) is as follows: (9-1) reading a configuration file of the mixed precision configuration obtained in the step (6) through a program and mapping the configuration file to bit distribution of a target neural network model, and setting a maximum disturbance coefficient and training configuration; (9-2) performing end-to-end quantization perception training on the learnable parameters of the quantization model, namely the weight and the quantization step size, wherein the weight is updated by adopting an optimization method based on loss sharpness, and the quantization step size is updated by adopting a standard gradient descent method.
5. The method of claim 4, wherein the optimizing method based on loss sharpness in step (9-2) is used, and the magnitude of the disturbance is attenuated as the number of training iterations increases.

Description

Hybrid precision quantized perception training method based on neural network structure search Technical Field The invention belongs to the technical field of computer vision, mainly relates to model quantization of a deep neural network, and particularly relates to a hybrid precision quantized perception training method based on neural network structure search. Background Model quantization is an important research direction for deep learning industrialization. Most existing quantization methods employ fixed precision quantization (also known as uniform precision quantization), i.e., weights and activation values for all layers in the network are typically quantized using the same bit width. The network model quantized by fixed precision is favored because the network model is well supported on conventional hardware such as a CPU, an FPGA and the like. However, the fixed precision quantization mode ignores the position, structure, parameter number, FLOPs and other attributes of the network layer on the network model, and under the condition of the same network parameter number and computational complexity, the fixed precision quantization may lead to suboptimal performance. Thus, hybrid precision quantization has been developed, which aims to allocate different quantization bit widths for weights and activation values of different layers, solving the above-mentioned limitations to some extent. Compared with fixed precision quantization, the mixed precision quantization is more flexible, and the memory and the calculation cost can be further saved under the condition of not sacrificing the network performance. In addition, hardware supporting hybrid precision reasoning (e.g., a12, tuning GPUs, etc.) also speeds up the floor-standing application of hybrid precision models. Existing hybrid precision quantization techniques can be divided into rule-based methods and learning-based methods. Rule-based methods utilize specific metrics to determine the optimal quantization bit width for each layer. For example, the HAWQ method uses the Hessian matrix as a metric to determine the hierarchical quantization bit width of the network. These rule-based metrics typically rely on heuristics provided by domain experts and therefore have limited scalability in practice. Inspired by neural network structure search (NAS) techniques, researchers have proposed learning-based methods to automatically search for the optimal bit width of the network layer, these algorithms being built on top of Deep Reinforcement Learning (DRL) or based differentiable NAS methods. Although the existing mixed precision quantification methods achieve certain results, the methods still have the defects of low searching efficiency, high calculation cost and the like. Disclosure of Invention In order to overcome the defects of the prior art, the invention aims to provide a hybrid precision quantized perception training method based on neural network structure search, which is characterized in that optimal hybrid precision configuration of a model under constraint conditions is searched, memory requirements in the searching process are reduced by adopting parameter sharing, the calculation equivalence of a convolution operator is utilized, a complex convolution module is used for replacing an expensive parallel convolution module, the size of a search space and the calculated amount of a super-network are decoupled under the condition that the calculation complexity of the super-network is kept unchanged, so that hybrid precision configuration of a large-scale network is directly searched under the condition that the proxy task is not based, the generalization capability of a quantization model is improved by simultaneously minimizing a target loss value and quantization loss sharpness after the hybrid precision configuration of the target model is obtained, the phenomena of training difficulty and performance significant reduction caused by low-bit quantization are relieved, and the generalization capability of the hybrid precision model with a low-bit quantization layer is further improved. In order to achieve the above purpose, the invention adopts the technical means that: A hybrid precision quantized perception training method based on neural network structure search comprises the following specific steps: (1) Processing the image raw dataset, dividing it into a training dataset Dtrain and a validation dataset Dval; (2) Acquiring gradients of the super gateway in the training set, namely sampling a batch of data samples from the training data set Dtrain in the step (1), inputting the data samples into the super network for forward reasoning, and obtaining a current target loss value; (3) Updating the weight of the super-network, namely updating the weight of the current super-network model by using a gradient descent method; (4) Sampling a batch of data samples from the verification dataset Dval in the step (1), inputting the data samples into the