CN-122024012-A - Low-bit quantization method, system, equipment and medium for YOLO target detection model
Abstract
The invention provides a low-bit quantization method, a system, equipment and a medium for a YOLO target detection model, which comprise the following steps of performing sparse quantization reasoning on activation distribution of the YOLO target detection model through a sparse quantization strategy to obtain target output, performing low-bit compression on a large number of convolution weights of a back-bone-Neck area through a task regularization strategy after the sparse quantization reasoning is finished, performing functional deconstruction on a detection head of the YOLO target detection model through a head quantization strategy according to a scale-by-task path paradigm, realizing high-precision low-bit quantization of the YOLO series model on the premise of not retraining the model, effectively reducing reasoning delay and energy consumption, simultaneously keeping detection precision close to a full-precision model, and being particularly suitable for edge equipment and real-time detection scenes.
Inventors
- JIANG JINGFEI
- ZHU MINGHUA
- XU JINWEI
- LI LIANGWEI
- ZHOU SHUNAN
- Lv Qianru
- NIU DI
Assignees
- 中国人民解放军国防科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260410
Claims (10)
- 1. The low-bit quantization method for the YOLO target detection model is characterized by comprising the following steps of: sparse quantization reasoning is carried out on the activation distribution of the YOLO target detection model through a sparse quantization strategy so as to obtain target output; After the sparse quantization reasoning is completed, carrying out low-bit compression on a large number of convolution weights of a back bone-Neck region through a task regularization strategy; and performing functional deconstructment on the detection head of the YOLO target detection model according to the scale-by-task path paradigm through a head quantization strategy.
- 2. The YOLO object detection model-oriented low-bit quantization method of claim 1, wherein sparse quantization reasoning is performed on activation distribution of the YOLO object detection model by a sparse quantization strategy to obtain an object output, and specifically comprises: Adaptively selecting a cutoff threshold based on the activation profile; dividing the activation profile into a main activation portion and a long tail activation portion; performing low-bit quantization convolution on the main active portion to obtain a first result; performing sparse convolution on the long tail activated part to obtain a second result; and acquiring target output according to the first result and the second result.
- 3. The low-bit quantization method for YOLO target detection model according to claim 1, wherein after the sparse quantization reasoning is completed, performing low-bit compression on a large number of convolution weights of a back bone-Neck region by using a task regularization strategy, specifically comprising: Constructing task loss of a detection head level; And performing low-bit compression on a large number of convolution weights of the backlight-Neck area according to the task loss.
- 4. The YOLO object detection model-oriented low-bit quantization method of claim 3, wherein the constructing the task loss of the detection head level specifically comprises: constructing task penalties for a detection head hierarchy by bounding box regression penalty, classification penalty, and target confidence penalty, the task penalties The specific calculation formula of (2) is as follows: ; Wherein, the Is the number of positive sample detection frames; Representing the bounding box regression loss; representing the classification loss; Representing the target confidence loss; Weight coefficients representing the regression loss of the bounding box; A weight coefficient representing a classification loss; A weight coefficient representing the target confidence loss.
- 5. The YOLO object detection model oriented low-bit quantization method of claim 4, wherein the classification loss and the object confidence loss are both obtained by cross entropy loss optimization.
- 6. The YOLO object detection model-oriented low-bit quantization method of claim 4, wherein the boundary box regression loss is obtained through CIoU losses, and a specific calculation formula of the boundary box regression loss is: ; Wherein, the A bounding box representing full-precision model predictions; Representing the output result of sparse quantization reasoning; representing the euclidean distance between their center points; representing the diagonal length of the smallest bounding rectangle covering both frames; indicating that uniformity of aspect ratio is quantified; Representing a positive trade-off parameter.
- 7. The YOLO target detection model-oriented low-bit quantization method of claim 1, wherein the performing functional deconstructing on the detection head of the YOLO target detection model according to a scale x task path paradigm by a header quantization strategy specifically comprises: Dividing the detection head of the YOLO target detection model into six sub-modules according to a scale X task path paradigm; And constructing an optimization target oriented to the submodule according to the final output reconstruction error of the submodule.
- 8. A low-bit quantization system oriented to a YOLO target detection model is characterized by comprising a quantization reasoning module, a compression module and a deconstructing module; the quantitative reasoning module is used for carrying out sparse quantitative reasoning on the activation distribution of the YOLO target detection model through a sparse quantization strategy so as to obtain target output; the compression module is used for carrying out low-bit compression on a large number of convolution weights of the back bone-Neck region through a task regularization strategy after the sparse quantization reasoning is completed; And the deconstructing module is used for performing functional deconstructing on the detection head of the YOLO target detection model according to the scale X task path paradigm through a head quantization strategy.
- 9. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, the memory storing a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
- 10. A computer readable storage medium, characterized in that the storage medium is for storing a computer program for causing a computer to perform the method of any one of claims 1-7.
Description
Low-bit quantization method, system, equipment and medium for YOLO target detection model Technical Field The invention relates to the technical field of artificial intelligence and computer vision, in particular to a low-bit quantization method, a system, equipment and a medium for a YOLO target detection model. Background The target detection task is widely applied in the fields of automatic driving, intelligent security, industrial quality inspection, medical image analysis and the like, the core task is to realize rapid and accurate positioning and classification of targets, and the real-time performance and reliability of the system are directly determined by the model performance. Due to its end-to-end one-stage detection characteristics and a good balance between speed and accuracy, the YOLO series model has become the dominant choice in the above scenario. However, with the popularization of edge devices and mobile terminals, models face severe restrictions in terms of computational effort, storage and energy consumption, which makes model compression technology a key to guaranteeing efficient deployment and wide applicability of target detection models. When the existing target detection model is deployed to the edge equipment with limited resources, the existing target detection model is limited by computational capacity, storage space and energy consumption, and efficient reasoning is usually realized through a model compression and quantization technology. However, the mainstream post-training quantization (PTQ) method is mostly designed for image classification tasks, and lacks consideration for detecting task characteristics. When the method is directly applied to the YOLO series end-to-end detection model, obvious performance degradation occurs, and the method is mainly characterized in that (1) the sensitivity of the detection model to quantization errors is far higher than that of the classification model due to task isomerism and the core effect of the cross-over ratio (IoU) in boundary box overlap calculation, and (2) the activation distribution of the YOLO model shows obvious long tail characteristics, and SiLU activation function aggravates the problem, so that low-bit quantization is further complicated. (3) The detection head has a complex structure and comprises multi-scale and multi-task parallel branches, and the traditional unified quantization granularity cannot give consideration to the heterogeneous characteristics of different sub-modules, so that the quantization precision is unbalanced. In view of this, how to achieve both performance and efficiency under low-bit quantization conditions is a technical problem that needs to be solved by those skilled in the art. Disclosure of Invention In order to solve the problems, the invention provides a low-bit quantization method, a system, equipment and a medium for a YOLO target detection model, which can realize high-precision low-bit quantization of a YOLO series model on the premise of not retraining the model, effectively reduce reasoning delay and energy consumption, simultaneously maintain detection precision close to that of a full-precision model, and are particularly suitable for edge equipment and real-time detection scenes. The first object of the present invention is to provide a low-bit quantization method for YOLO target detection model; The technical scheme provided by the invention is as follows: a low-bit quantization method facing to a YOLO target detection model comprises the following steps: sparse quantization reasoning is carried out on the activation distribution of the YOLO target detection model through a sparse quantization strategy so as to obtain target output; After the sparse quantization reasoning is completed, carrying out low-bit compression on a large number of convolution weights of a back bone-Neck region through a task regularization strategy; and performing functional deconstructment on the detection head of the YOLO target detection model according to the scale-by-task path paradigm through a head quantization strategy. Preferably, the sparse quantization reasoning is performed on the activation distribution of the YOLO target detection model through a sparse quantization strategy to obtain a target output, which specifically includes: Adaptively selecting a cutoff threshold based on the activation profile; dividing the activation profile into a main activation portion and a long tail activation portion; performing low-bit quantization convolution on the main active portion to obtain a first result; performing sparse convolution on the long tail activated part to obtain a second result; and acquiring target output according to the first result and the second result. Preferably, after the sparse quantization reasoning is completed, performing low-bit compression on a large number of convolution weights of the back bone-Neck region through a task regularization strategy, including: Constructing tas