CN-117492839-B - Low-bit quantization processing method based on simd

CN117492839BCN 117492839 BCN117492839 BCN 117492839BCN-117492839-B

Abstract

The invention provides a simd-based low-bit quantization processing method which comprises the steps of S1, converting input data sum into 64-bit integer data, converting mul into 64-bit integer data, carrying out max_precision=15, S2, shifting a result obtained by multiplying sum and mul by max_precision-left_shift to the right, obtaining a result res1, wherein res1 is 32-bit integer data, as shown in a formula, res 1= (sum×mul) > (max_precision-left_shift), S3, shifting res1 to the right_shift to obtain res2, carrying out formula optimization processing on res1, S5, carrying out clip processing on res2, obtaining a result res3, and carrying out data conversion by using res3 and bitw according to whether the required output is signed or unsigned, so as to obtain a final result. Quantized simd optimization is achieved, as well as speed improvement.

Inventors

YU XIAOJING
TIAN FENGBIN

Assignees

北京君正集成电路股份有限公司

Dates

Publication Date: 20260508
Application Date: 20220725

Claims (6)

1. The low-bit quantization processing method based on simd is characterized in that in the method, data input parameters required by quantization processing, preprocessing data sum, fixed shift value max_precision, model parameter multiplication data mul, model parameter left shift data left_shift, model parameter right shift data right_shift and output feature map bit number bitw are adopted, and the method comprises the following steps: s1, convolving calculation result data as input data sum of quantization processing, converting the input data sum into 64-bit integer data, converting mul into 64-bit integer data, and enabling max_precision to be=15; S2, shifting a result obtained by multiplying sum and mul to the right by max_precision-left_shift, and obtaining a result res1, wherein res1 is 32-bit integer data, and the result is shown in a formula: res1=(sum×mul)>>(max_precision-left_shift); s3, shifting res1 right_shift to the right, resulting in res2, as shown in the formula: res2=res1>>right_shift; s4, carrying out optimization processing on a formula corresponding to res1, wherein the optimization processing is as shown in a formula (1): res2=[(sum×(mul)<<(left_shift+16))]>>(max_precision+16) (1); move mul data 16 bits to the left, set to mul32, i.e mul32=mul<<(left_shift+16) (2) Max_precision+16 is set to max_precision31, i.e max_prcision31=max_precision+16, Has the following components max_precision31=31 (3) Where max_precision31 is the number of bits shifted to the left, here 31 bits, since there is one simd instruction sumv = ingenic _ mulq _h (sumv, mulv), an operation of shifting 31 bits after multiplication of two registers storing 32 bits of data can be achieved, so here processing is performed in the direction of shifting this 31 bits; From (2) and (3) res2=[(sum×mul32)>>max_precision31] (4) S5, clipping is carried out on res2, 255 is larger than 255, 0 is smaller than 0, the intermediate result is unchanged, and a result res3 is obtained; s6, outputting whether the output is signed or unsigned according to the requirement, and performing data conversion by utilizing res3 and bitw to obtain a final result.
2. The simd-based low-bit quantization processing method according to claim 1, wherein the mul is 16-bit integer data, and left_shift and right_shift data are 16 or less.
3. The method of claim 1, wherein the step S2 is a step S2, and the method is characterized in that the function of the method cannot be realized by using the existing instruction, and more instructions are needed to be used for realizing the function, so that the parameters of the method need to be changed, when the model parameters are loaded, the parameters mul32 are regenerated according to the step S2, so that the parameters meet the instruction requirements, and meanwhile, when the model is loaded, the right_shift data of the model are converted into 32-bit integer data, the model is loaded, namely, the data are loaded into the mul32 and right_shift data groups required after the conversion, and in the step S2, the mul32 and the right_shift represent specific numbers, and represent one type of data, namely, the data groups.
4. A low-bit quantization processing method based on simd according to claim 3, wherein the algorithm design of simd includes setting a register for loading mul32 data as mulv, a register for convolution accumulation as sumv, and a register for loading right_shift data as shiftv; The step S4 further includes: (1) Load mul32 data into register mulv; (2) The simd instruction is used for operation to realize the calculation of the formula (1), and the instruction is used for satisfying the shift of the right shift 31 after multiplication, and the specific instruction is as follows: sumv=ingenic_mulq_h(sumv,mulv); The instruction realizes multiplication of two registers, and the multiplied result shifts 31 bits to the right; (3) Shift instruction operation using round-off with banker sumv=ingenic_srar_h(sumv,shiftv); The instruction is to shift data in sumv to the right according to the data in the corresponding shiftv, and the shifted result has a bank rounding method; The step S5 further comprises the step of, (4) Processing sumv to obtain maximum value and minimum value, namely clip, for greater than 2 bitw -1, 2 bitw -1, for less than 0, bitws is bit width, its value range is 4-6, if the generated data is 8 bits, bitw =8, 2 bitw -1 stored in register v8_max is set, it is maximum value, 0 stored in register v8_min is minimum value, concrete instruction realizes sum_0= ingenic _ maxs _h (sum_0, v 8_min); the instruction realizes that the maximum value in sum_0 and v8_min is taken and stored in sum_0; sum_0=ingenic_mins_h(sum_0,v8_max); the instruction realizes that the minimum value is taken from sum_0 and v8_max and is stored in sum_0; (5) Repeating the steps (1) - (4), i.e. steps S4 and S5, to obtain four sets of sum_0 data, wherein sum_0 is 8-bit data stored in 32 bits, 4 data are stored in sum_0, 16 data are stored in four sets of sum_0 data in a register sum_1, and total 16 data are stored in sum_1.
5. The method of claim 4, wherein in step S6, the result is converted into signed or unsigned according to specific needs, if signed, v_8 is set to2 bitw-1 , and a specific simd instruction implements sum_1= ingenic _sub_b (sum_1, v_8); the instruction implements sum_1 to v8_8 difference, and is stored in sum_1.
6. The simd-based low-bit quantization processing method according to claim 1, wherein the method is a processing method for outputting a feature map with a bit number of 4 to 6 bits, in which input data sum of convolution accumulation, i.e., convolution calculation result data, as quantization processing is 32-bit integer data.

Description

Low-bit quantization processing method based on simd Technical Field The invention belongs to the technical field of image processing, and particularly relates to a low-bit quantization processing method based on simd. Background In integrated circuit technology, existing chip manufacturers develop their own chips in accordance with the development of the age. In chip applications, the respective problems also occur in the respective chip designs. For example, chips produced by Beijing Jun Integrated circuits Co., ltd (Beijing Jun for short), such as T and X series chips of the types of Beijing Jun T30 and T31, and T and X series chips of the types Beijing Jun T30 and T31, have simd instruction sets. And (3) designing an optimization algorithm aiming at a simd instruction set based on T series such as Beijing jun Zheng T30, T31 and the like. The algorithm is suitable for the operation of vector (vector) instructions. However, the registers of the T30 and T31 type chips are 128-bit registers, and the number of the registers is limited, so that the problem of the number of the registers has to be considered in the optimal design, the simd instruction set is limited, and some operations can be realized by using several instructions. In addition, on the Beijing jun front chip, the C program is directly used, so that the speed is relatively slow. The common terminology in the prior art is as follows: 1. simd instruction, single instruction flow multiple data flow, i.e. one operation instruction can execute multiple data flows, thus raising operation speed of program. More commonly understood is the calculation of a vector (vector). Different chips, specific instruction sets, are different. 2. Convolution kernel, which is a matrix used for image processing and a parameter used for operation with an original image. The convolution kernel is typically a matrix of columns (e.g., a matrix of 3*3) with a weight for each square in the region. The matrix shape is typically 1×1,3×3,5×5,7×7,1×3,3×1,2×2,1×5,5×1. 3. Convolution, in which the center of a convolution kernel is placed on a pixel to be calculated, the products of each element in the kernel and its covered image pixel values are calculated and summed once, and the resulting structure is the new pixel value for that location, a process called convolution. 4. Feature map the result of the convolution calculation of the input data is called feature map (or output data), and the result of the full connection of the data is called feature map (or output data). The feature map size is generally expressed as length x width x depth, or 1x depth. Disclosure of Invention In order to solve the problems, the application aims to realize quantized simd optimization and speed improvement. Specifically, the invention provides a low-bit quantization processing method based on simd, in the method, data input parameters required by quantization processing, preprocessing data sum, a fixed shift value max_precision, model parameter multiplication data mul, model parameter left shift data left_shift, model parameter right shift data right_shift, and output feature map bit number bitw, the method comprises the following steps: s1, convolving calculation result data as input data sum of quantization processing, converting the input data sum into 64-bit integer data, converting mul into 64-bit integer data, and enabling max_precision to be=15; S2, shifting a result obtained by multiplying sum and mul to the right by max_precision-left_shift, and obtaining a result res1, wherein res1 is 32-bit integer data, and the result is shown in a formula: res1=(sum×mul)>>(max_precision-left_shift); s3, shifting res1 right_shift to the right, resulting in res2, as shown in the formula: res2=res1>>right_shift; s4, carrying out optimization processing on a formula corresponding to res1, wherein the optimization processing is as shown in a formula (1): res 2= [ (sum < < (left_shift+16)) > > (max_precision+16) (1), and shifting the mul data 16 bits to the left, to mul32, i.e., set mul32=mul<<(left_shift+16) (2) Max_precision+16 is set to max_precision31, i.e max_prcision31=max_precision+16, I.e. max_precision31=31 (3) Where max_precision31 is the number of bits shifted to the left, here 31 bits. Since there is one simd instruction sumv = ingenic _ mulq _h (sumv, mulv), the operation of shifting 31 bits after multiplication of two registers storing 32 bits of data can be implemented, and thus processing is performed in the direction of shifting 31 bits. From (2) and (3) res2=[(sum×mul32)>>max_precision31] (4) S5, clipping processing is carried out on the res2 to obtain a result res3; s6, outputting whether the output is signed or unsigned according to the requirement, and performing data conversion by utilizing res3 and bitw to obtain a final result. The mul is 16-bit integer data, and left_shift and right_shift data are less than or equal to 16. The function of the method can not be realized by using th