CN-121982490-A - Method and system for BSD algorithm end-side deployment based on YOLO model and capable of running stably

CN121982490ACN 121982490 ACN121982490 ACN 121982490ACN-121982490-A

Abstract

The invention discloses a BSD algorithm end-side deployment method based on a YOLO model and capable of running stably, which comprises the following steps of S1, model structure optimization. The invention constructs a light and high-precision detection frame BSD_YOLO through model structure optimization, three-stage training, deriving quantization and end-side deployment full-link collaborative design, adopts MobileNet structured pruning, biFPN feature fusion and decoupling detection heads to simplify calculation, combines distillation training, multi-illumination robustness training and quantization fine adjustment guarantee precision, realizes stable deployment and real-time reasoning of a model on the end side through standardized preprocessing flow and CPU/NPU collaborative post-processing, is suitable for a BSD blind area monitoring scene of a truck electronic rearview mirror, can rapidly detect 9 types of targets at the edge end with limited calculation force, meets engineering indexes such as detection frequency, response time, failure reporting rate and the like, and ensures stable and reproducible deployment links.

Inventors

YU JUNBO
FU GUANGLAI

Assignees

江门市宏力后视镜实业有限公司

Dates

Publication Date: 20260505
Application Date: 20260327

Claims (10)

1. A BSD algorithm end-side deployment method based on a YOLO model and capable of running stably is characterized by comprising the following steps: The method comprises the steps of S1, model structure optimization, namely constructing a modular detection frame of a backbox+ Neck +head, wherein the backbox replaces a YOLO source Darknet by MobileNet, and realizes light weight through two-stage structured pruning, the Neck adopts a BiFPN structure to perform cross-scale feature weighted bidirectional fusion, and the Head adopts a decoupling structure to respectively output a continuous boundary frame regression feature map and a category-quality feature map; s2, three-stage training: S201, training a high-capacity YOLO teacher model and constraining student model training by distillation losses, wherein the distillation losses comprise classified distillation losses, continuous frame distillation losses and mass distillation losses; S202, decomposing an input image into a reflectivity component and an illumination component based on a Retinex theory, generating a multi-illumination synthetic image through illumination adjustment, and carrying out robustness training by combining consistency loss; S203, adopting a quantization process of PTQ calibration and QAT fine adjustment to quantize and fine adjustment, introducing a learnable quantization parameter to reduce quantization error, and keeping the output distribution of a distillation constraint stable model; And S3, model derivation and end side configuration, namely deriving and reasoning the trained YOLO teacher model according to an end side compatible format, compiling and deploying the model to an end side NPU, and carrying out fixed operator sequence preprocessing on camera input data during end side reasoning, carrying out reasoning by calling NPU computing power, and finally carrying out post-processing by the CPU/NPU cooperation to output a target detection result.
2. The BSD algorithm end-side deployment method based on YOLO model and capable of stable operation according to claim 1, wherein the two-stage structured pruning of S1 is specifically: S101, performing sparsification training based on BN scaling factor gamma in the first stage, and superposing an L1 sparse term in the total loss detection, wherein the loss formula is as follows: ; Wherein L is a pruned convolution layer set, lambda is a sparsification coefficient, and 30% of redundant channels are cut after the training and the global sorting; S102, pruning PWConv layers by adopting an FPGM redundancy filter, calculating redundancy based on a geometric median, wherein the adopted geometric median formula is as follows: ; Redundancy score for each filter To ensure DWConv/PWConv coupling consistency of MobileNet, the 10% redundancy filter is cut, structured cutting is only performed on PWConv, the dimensions of front and back channels are synchronously updated, DWConv channels keep consistent with corresponding PWConv inputs, and irregular tensor shapes and extra rearrangement overheads on the end side are avoided.
3. The BSD algorithm end-side deployment method capable of running stably based on YOLO model according to claim 1, wherein the BiFPN features of S1 are fused in fast normalized fusion mode, and the fusion formula is: ; Wherein the method comprises the steps of Weighted summation of input features by learner-able non-negative weights with ReLU activation, up-sampling BiFPN using nearest neighbor algorithm, up-sampling prior to use Alignment convolution unifies to Downsampling is depthwise convolutions of stride.
4. The BSD algorithm end-side deployment method based on YOLO model and capable of running stably according to claim 1, wherein the specific logic steps of the decoupling detection head of the decoupling structure in S1 are as follows: S1001, regression branch output: Direct regression Four non-negative distances, activated via ReLU: ; decoding into a pixel domain bounding box: ; Where (x, y) = (j+0.5)/(i+0.5) sk is the center of the grid; s1002, class-quality branch output: Wherein the front part Dimension as category logits), activated by sigmoid: p=σ (z), the target ranking score is: ; Wherein the method comprises the steps of For the category of the logic, Is the mass logic.
5. The BSD algorithm end-side deployment method of claim 1, wherein the formulas used for classifying distillation loss, continuous frame distillation loss and mass distillation loss in S201 are: S2011, classifying distillation loss, namely, carrying out temperature treatment on logits KL of (a); ; S2012, continuous frame distillation loss after decoding teachers and students And (3) carrying out L1+ IoU constraint: ; s2013, mass distillation loss, namely BCE or MSE is carried out on mass branches, and the aim is teacher mass output : ; Total loss of stage: 。
6. the BSD algorithm end-side deployment method based on YOLO model and capable of running stably according to claim 1, wherein the specific logic steps of the multi-illumination synthesis and consistency training of S202 are as follows: s2021 decomposing the network using pre-training For input image And (3) decomposition: , ; Wherein the method comprises the steps of In order for the light to be of a reflectivity, Is illumination; s2022 based on Construction of multiple illumination versions The method is carried out by a mode comprising power law adjustment, local illumination field disturbance and noise injection; , ; And reconstructing a composite image: ; Wherein the method comprises the steps of The brightness-enhancing effect can be simulated and the brightness-enhancing effect can be realized, Can simulate darkening; can simulate the ambient light deflection when necessary Or (b) Injecting noise conforming to the sensor characteristics to enhance realism; s2023, the consistency loss comprises classification consistency KL loss, continuous frame consistency L1 loss and quality consistency L2 loss; classification consistency (KL for logits): ; Consecutive frame consistency (L1 for decoded frame): ; quality consistency (L2 for quality probability): ; the total loss of the stage two is as follows: 。
7. The BSD algorithm end-side deployment method based on YOLO model and capable of running stably according to claim 1, wherein the specific logic steps of the quantization fine-tuning of S203 are as follows: s2031 for real value tensor Is a symmetric quantization of: ; Wherein the method comprises the steps of For quantization step size (scale); S2032, calculating detection loss on a pseudo-quantized graph by adopting LSQ (least squares) leachable step length or PACT (programmable power analysis) leachable cutoff parameters, introducing self-distillation constraint, and taking a floating point student model which is trained in stage one as a teacher reference; In the QAT stage, the formula adopted in the calculation of the pseudo-quantization chart is as follows: ; The self-distillation constraint does not need characteristic distillation, only keeps output distillation to reduce calculation cost, and the self-distillation constraint adopts the formula: 。
8. The method as set forth in claim 1, wherein in S3, the end-side preprocessing includes filling the scaling + Letterbox to a fixed resolution (640×640 or 736×736), and normalizing the pixels: and NHWC-NCHW arrangement conversion, wherein the preprocessing process does not introduce data-related conditional branches.
9. The method for end-side deployment of a BSD algorithm based on a YOLO model and capable of running stably according to claim 1, wherein in S3, the target detection result includes 9 kinds of blind zone targets in a truck driving scene, the detection performance satisfies single-frame reasoning delay less than or equal to 26ms, the real-time performance and engineering index requirements of BSD blind zone monitoring are met, and a combination of VarifocalLoss + CIoU loss and BCE quality supervision is adopted as a loss function; Wherein VarifocalLoss is originally defined as: ; Wherein the method comprises the steps of Is constructed in such a way that for each positive sample position, its GT class dimension Taking IoU of a prediction frame and GT, wherein other class dimensions are 0, and negative samples are all 0; CIoU losses are: ; the BCE quality supervision is as follows: ; the final single-scale loss is: 。
10. A BSD algorithm end-side deployment system based on YOLO model and capable of stable operation, for implementing the method of any of claims 1-9, comprising a model optimization module, a training module, an end-side deployment module, and a post-processing module; the model optimization module is used for realizing the design of a backstone lightweight pruning, biFPN trans-scale feature fusion and decoupling detection head, outputting a BSD_YOLO modularized detection frame at the adaptation end side and supporting three-scale feature extraction and efficient feature fusion; The training module is used for executing a three-stage training process, integrating distillation loss calculation, retinex image decomposition and synthesis and quantization fine adjustment functions, and outputting a high-precision and quantization compatible detection model; the terminal side configuration module comprises a model export unit, a compiling unit and an reasoning unit, wherein the export unit is responsible for model format conversion, the compiling unit is used for completing NPU adaptive compiling, and the reasoning unit is used for receiving the preprocessed input data and executing real-time reasoning through the terminal side NPU; The post-processing module is used for cooperatively processing the reasoning result by the CPU/NPU, outputting the upper-layer warning and displaying of the target frame information and using the log service; the NPU is an embedded end-side integrated edge-side processor and supports INT8 quantitative reasoning, the detection model is a BSD_YOLO framework based on YOLO optimization, three-scale feature layers P3, P4 and P5 are output, and corresponding step sizes are obtained 。

Description

Method and system for BSD algorithm end-side deployment based on YOLO model and capable of running stably Technical Field The invention relates to the technical field of target detection and edge computing end side deployment, in particular to a BSD algorithm end side deployment method and system which are based on a YOLO model and can stably run. Background With the development of intelligent driving technology, an electronic rearview mirror BSD blind area monitoring system has become a key configuration for improving the driving safety of a truck, and the key requirement is that multi-class targets in a driving scene are detected accurately in real time on edge equipment such as embedded end side and the like: 1. The NPU calculation force at the edge end is limited, the operator support is limited, the parameter quantity of a backbond (such as Darknet) of an original YOLO model is large, the calculation redundancy is high, the single-frame reasoning time delay after direct deployment is too long (about 200 ms/frame), the real-time requirement is difficult to meet, 2, the precision loss is easy to occur in the model quantization process, the trans-scale feature fusion efficiency is low, the detection robustness is insufficient under the complex illumination condition, in addition, the problems of the redundancy of a calculator graph, the complex post-processing flow and the like exist in the existing deployment link, and the stability and reproducibility of model export, compiling and reasoning are affected. In the prior art, optimization for YOLO end side deployment is concentrated on a single dimension, such as only performing Backbone lightweight or simple quantization processing, and the cooperative optimization of a model structure, a training strategy and a deployment link cannot be realized, so that comprehensive requirements of end side instantaneity, detection precision and deployment stability are difficult to meet at the same time. Disclosure of Invention Based on the technical problems in the background art, the BSD algorithm end side deployment method based on the YOLO model and capable of stably running comprises the following steps: The method comprises the steps of S1, model structure optimization, namely constructing a modular detection frame of a backbox+ Neck +head, wherein the backbox replaces a YOLO source Darknet by MobileNet, and realizes light weight through two-stage structured pruning, the Neck adopts a BiFPN structure to perform cross-scale feature weighted bidirectional fusion, and the Head adopts a decoupling structure to respectively output a continuous boundary frame regression feature map and a category-quality feature map; s2, three-stage training: S201, training a high-capacity YOLO teacher model and constraining student model training by distillation losses, wherein the distillation losses comprise classified distillation losses, continuous frame distillation losses and mass distillation losses; S202, decomposing an input image into a reflectivity component and an illumination component based on a Retinex theory, generating a multi-illumination synthetic image through illumination adjustment, and carrying out robustness training by combining consistency loss; S203, adopting a quantization process of PTQ calibration and QAT fine adjustment to quantize and fine adjustment, introducing a learnable quantization parameter to reduce quantization error, and keeping the output distribution of a distillation constraint stable model; And S3, model derivation and end side configuration, namely deriving and reasoning the trained YOLO teacher model according to an end side compatible format, compiling and deploying the model to an end side NPU, and carrying out fixed operator sequence preprocessing on camera input data during end side reasoning, carrying out reasoning by calling NPU computing power, and finally carrying out post-processing by the CPU/NPU cooperation to output a target detection result. Preferably, the two-stage structured pruning of S1 is specifically: S101, performing sparsification training based on BN scaling factor gamma in the first stage, and superposing an L1 sparse term in the total loss detection, wherein the loss formula is as follows: ; Wherein L is a pruned convolution layer set, lambda is a sparsification coefficient, and 30% of redundant channels are cut after the training and the global sorting; S102, pruning PWConv layers by adopting an FPGM redundancy filter, calculating redundancy based on a geometric median, wherein the adopted geometric median formula is as follows: ; Redundancy score for each filter To ensure DWConv/PWConv coupling consistency of MobileNet, the 10% redundancy filter is cut, structured cutting is only performed on PWConv, the dimensions of front and back channels are synchronously updated, DWConv channels keep consistent with corresponding PWConv inputs, and irregular tensor shapes and extra rearrangement overheads on the end side