CN-116956990-B - Model quantization method, device, equipment and storage medium

CN116956990BCN 116956990 BCN116956990 BCN 116956990BCN-116956990-B

Abstract

The embodiment of the application provides a model quantization method, device, equipment and storage medium, and relates to the technical field of model quantization. The method comprises the steps of searching optimal parameters of an original model layer by layer based on a preset model quantization sequence, determining optimal scale parameters of each layer of the original model, and quantizing the original model based on the optimal scale parameters of each layer of the original model to obtain a final quantized model. According to the method, the optimal parameters of each layer of the original model are searched layer by layer in sequence according to the preset sequence, so that the quantized model fitting the actual application requirements can be obtained rapidly and accurately.

Inventors

HUANG YAFENG
YANG HAO

Assignees

苏州轻舟智航智能技术有限公司

Dates

Publication Date: 20260505
Application Date: 20230809

Claims (8)

1. The model quantization method is characterized by being applied to a deep learning model picture reasoning scene and comprising the following steps of: Performing optimal parameter search on an original model layer by layer based on a model quantization sequence set by actual application requirements, and determining optimal scale parameters of each layer of the original model, wherein the optimal scale parameters of each layer are scale parameters of all layers suitable for the actual application requirements; Quantizing the original model based on the optimal scale parameters of each layer of the original model to obtain a final quantized model; Finally evaluating the quantized model according to actual application requirements; converting an input picture RGB format into a format required by model input, and storing all intermediate layer results with a calibration module in the reasoning process so as to compare intermediate results with the follow-up quantized results; The searching the optimal parameters of the original model layer by layer and determining the optimal scale parameters of each layer of the original model comprises the following steps: Acquiring a plurality of candidate scale parameters of a current layer of the original model; Determining a quantization model to be measured corresponding to each candidate scale parameter, and obtaining a precision result to be measured corresponding to each quantization model to be measured; Calculating the difference degree of each to-be-detected precision result and the full-precision result of the original model respectively, and determining the optimal scale parameter of the current layer based on each difference degree; the obtaining a plurality of candidate scale parameters of the current layer of the original model includes: and respectively calculating the scale parameters corresponding to each preset percentage point parameter according to a plurality of preset percentage point parameters corresponding to the current layer of the original model, and obtaining a plurality of candidate scale parameters of the current layer of the original model.
2. The method for model quantization according to claim 1, wherein calculating the degree of difference between each of the precision results to be measured and the full-precision result of the original model, and determining the optimal scale parameter of the current layer based on each degree of difference, respectively, comprises: Calculating the difference degree of each to-be-measured precision result and the full-precision result of the original model based on a preset loss function; and determining the candidate scale parameter corresponding to the precision result to be detected with the minimum difference as the optimal scale parameter of the current layer.
3. The model quantization method according to claim 2, wherein the calculating the difference between each of the to-be-measured precision results and the original model full-precision result based on the preset loss function includes: And respectively combining and calculating the difference degree of each to-be-measured precision result and the original model full-precision result based on at least two preset loss functions and corresponding weights.
4. The method for model quantization according to claim 1, wherein the searching for optimal parameters of the original model layer by layer and determining optimal scale parameters of each layer of the original model comprises: And in the process of searching the optimal parameters of the current layer of the original model, configuring the layer with the determined optimal scale parameters in the original model by adopting the corresponding determined optimal scale parameters.
5. The method for model quantization according to claim 1, wherein the searching for optimal parameters of the original model layer by layer and determining optimal scale parameters of each layer of the original model comprises: And in the process of searching the optimal parameters of the current layer of the original model, configuring the layer with the determined optimal scale parameters in the original model by adopting the corresponding initial scale parameters of the original model.
6. The model quantization device is characterized by being applied to a deep learning model picture reasoning scene, and comprises the following components: The parameter determining module is used for searching optimal parameters of the original model layer by layer based on a model quantization sequence set by actual application requirements and determining the optimal scale parameters of each layer of the original model, wherein the optimal scale parameters of each layer are scale parameters of all layers suitable for the actual application requirements; The model quantization module is used for quantizing the original model based on the optimal scale parameters of each layer of the original model to obtain a final quantized model; the model quantization module is specifically further used for: Finally evaluating the quantized model according to actual application requirements; the parameter determining module is specifically used for performing optimal parameter search on the original model layer by layer, wherein the optimal parameter search comprises the steps of converting an RGB format of an input picture into a format required by model input, and storing all middle layer results with a calibration module in the reasoning process so as to perform middle result comparison with the follow-up quantized results; The parameter determining module is specifically configured to: Acquiring a plurality of candidate scale parameters of a current layer of the original model; Determining a quantization model to be measured corresponding to each candidate scale parameter, and obtaining a precision result to be measured corresponding to each quantization model to be measured; Calculating the difference degree of each to-be-detected precision result and the full-precision result of the original model respectively, and determining the optimal scale parameter of the current layer based on each difference degree; The parameter determining module is specifically configured to: and respectively calculating the scale parameters corresponding to each preset percentage point parameter according to a plurality of preset percentage point parameters corresponding to the current layer of the original model, and obtaining a plurality of candidate scale parameters of the current layer of the original model.
7. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor is configured to implement the model quantization method of any one of claims 1-5 when the program is executed by the processor.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the model quantization method according to any of claims 1-5.

Description

Model quantization method, device, equipment and storage medium Technical Field The present application relates to the field of model quantization technologies, and in particular, to a method, an apparatus, a device, and a storage medium for model quantization. Background Quantization is a technique for reducing the size of a neural network model by converting floating point number parameters in the model to smaller integers or fixed point numbers. Typically, this technique can reduce the model size by a factor of several without significantly reducing the model accuracy. The inflight proposes a software development kit TensorRT for high-performance deep learning reasoning, which can accelerate the speed of deep learning model reasoning and support various hardware platforms such as GPU and Tegra systems. TensorRT support a variety of optimization techniques, including model compression quantization techniques, which reduce the size of the model and improve reasoning performance. However, at present, tensorRT can only support that each layer of structure of the model has the same scale parameter, so that the quantized model is solidified, the relation between the size and the precision of the model is not balanced, and the quantized model fitting the actual application requirement cannot be obtained. Disclosure of Invention The embodiment of the application aims to provide a model quantization method, device, equipment and storage medium, which can rapidly and accurately acquire a quantization model fitting actual application requirements. In a first aspect, an embodiment of the present application provides a model quantization method, including: Performing optimal parameter search on an original model layer by layer based on a preset model quantization sequence, and determining optimal scale parameters of each layer of the original model; and quantizing the original model based on the optimal scale parameters of each layer of the original model to obtain a final quantized model. According to the embodiment of the application, the optimal parameters of each layer of the original model are searched layer by layer in sequence according to the preset sequence, so that the quantized model fitting the actual application requirement can be obtained rapidly and accurately. In some possible embodiments, the searching the optimal parameters of the original model layer by layer and determining the optimal scale parameters of each layer of the original model includes: Acquiring a plurality of candidate scale parameters of a current layer of the original model; Determining a quantization model to be measured corresponding to each candidate scale parameter, and obtaining a precision result to be measured corresponding to each quantization model to be measured; And respectively calculating the difference degree of each to-be-measured precision result and the full-precision result of the original model, and determining the optimal scale parameter of the current layer based on each difference degree. In the embodiment of the application, the optimal scale parameter of each layer is obtained by acquiring a plurality of candidate parameters corresponding to the layer, adopting each candidate parameter to respectively calculate the corresponding model output result, and then carrying out difference degree comparison with the full-precision result. Thereby further improving the accuracy of the acquired optimal scale parameters. In some possible embodiments, the calculating the difference degree between each to-be-measured precision result and the original model full-precision result, and determining the optimal scale parameter of the current layer based on each difference degree includes: Calculating the difference degree of each to-be-measured precision result and the full-precision result of the original model based on a preset loss function; and determining the candidate scale parameter corresponding to the precision result to be detected with the minimum difference as the optimal scale parameter of the current layer. In the embodiment of the application, the difference of the prediction results of the quantization model and the original model is represented according to the preset loss function, so that the accuracy of obtaining the optimal scale parameter can be further improved. In some possible embodiments, the calculating, based on a preset loss function, a difference between each to-be-measured precision result and an original model full-precision result includes: And respectively combining and calculating the difference degree of each to-be-measured precision result and the original model full-precision result based on at least two preset loss functions and corresponding weights. In the embodiment of the application, the output result difference degree of the characterization quantization model and the original model is combined and calculated by setting at least two loss functions and respectively setting correspond