CN-121982485-A - Rapid target detection method based on RK3588 platform

CN121982485ACN 121982485 ACN121982485 ACN 121982485ACN-121982485-A

Abstract

The invention relates to a rapid target detection method based on an RK3588 platform, which solves the problems that the frame rate is often lower than 10fps and real-time requirements are difficult to meet when a target detection algorithm is directly operated in equipment. The method is based on yolov s target detection model, by modifying model structure and activation function, model quantization, using thread pool and other methods, the operation efficiency of the target detection algorithm on the RK3588 platform can be greatly improved, and the actually measured frame rate can exceed 100fps. Finally, high frame rate target detection on the edge computing device is achieved. The method specifically comprises the steps of 1) modifying a model structure, replacing an activation function, removing decoding operation of a binding box in a YOLOv model, and executing the decoding operation of the binding box in a CPU, 2) quantizing the model trained by an RK3588 platform to better exert the performance of the NPU, and specifically comprises ① numerical precision reduction, ② linear transformation, 3) accelerating a thread pool, accelerating model reasoning by using the thread pool, and 4) detecting a target.

Inventors

Shi Kuanyu
Dang Chenguang
FENG HUIYONG
LI CHAO
WEI XUEMEI
WEI ZHUO
GAO LIMING
LIN WEI
SUN YANXIN
GU YANAN

Assignees

中国兵器工业试验测试研究院

Dates

Publication Date: 20260505
Application Date: 20251211

Claims (5)

1. A rapid target detection method based on an RK3588 platform is characterized in that the method is based on a yolov s target detection model, the calculation efficiency of a target detection algorithm on the RK3588 platform is improved by modifying a model structure and an activation function, quantifying the model and using a thread pool method, the actually measured frame rate can exceed 100fps, and finally high-frame-rate target detection on edge computing equipment is realized, and the method specifically comprises the following steps: 1) Model structure modification Replacing the activation function, changing the activation function SiLU function in the YOLOv model to a ReLU function, the formula of the function being: ; Dismantling decoding operation of the binding box in the YOLOv model, directly outputting 3 pairs of branches [ cls1, reg1, cls2, reg2, cls3, reg3] by the improved model, namely, obtaining 6 output results, manually performing post-processing, and putting the decoding operation of the binding box in a CPU for execution; 2) Model quantization Quantifying the trained model of the RK3588 platform to better play the role of the NPU, specifically comprising: ① Numerical precision reduction by converting model parameters and activation values of the floating point representation to low precision integer representations to reduce computation and storage requirements; ② Linear transformation, which is to map floating point numbers to integer ranges by using linear transformation, and to relate to a scaling factor and a zero point, so as to ensure that the numerical ranges and the distribution are kept as consistent as possible; The scaling factor is a floating point number used for scaling the numerical range and converting the floating point number into an integer; The zero point is an integer and represents an integer value corresponding to the zero value of the floating point number, and the zero point is used for calibrating zero point alignment of the floating point number and the integer; 3) Thread pool acceleration Thread pools are used to accelerate model reasoning. 4) Target detection S1, acquiring image data, namely shooting different scene images by using a camera array carried by ground photoelectricity to acquire a required actual background image or video stream, or modeling and simulating a required target by using 3dmax and UE modeling simulation software to acquire a high-fidelity simulation target image at multiple angles; s2, preprocessing the image, namely performing target segmentation, color block extraction and target labeling preprocessing on the image; S3, inputting the image data into a target detection model deployed on an RK3588 platform, automatically reading target types and input resolution parameters in the model by a program, and calling the target detection model for reasoning; and S4, drawing the target detection result in the original image, and pushing and displaying in a rtmp video stream mode.
2. The rapid target detection method based on RK3588 platform according to claim 1, wherein model deployment uses tool suite RKNN-Toolkit of Rayleigh micro-company, which supports model conversion, model reasoning, performance evaluation, model quantification functions.
3. The rapid target detection method based on the RK3588 platform of claim 2, wherein the model quantization process is as follows: S1, deriving the Pytorch weight file obtained by training into an intermediate file in a ONNX format by utilizing ONNX tools; S2, initializing RKNN objects; S3, configuring RKNN a model, wherein the configuration comprises the steps of carrying out channel mean value on the model, RGB2BGR conversion of a quantized picture and quantization type; S4, loading ONNX a model; S5, constructing RKNN models, and selecting a plurality of pictures from a training set or a verification set to serve as a calibration data set (50-200 pictures are recommended); S6, deriving RKNN a model to finish quantization.
4. The rapid target detection method based on RK3588 platform of claim 3 wherein 50-200 pictures are selected as calibration data sets.
5. The rapid target detection method based on RK3588 platform as set forth in claim 1, wherein the specific workflow of the thread pool is as follows: S1, establishing a thread A to be responsible for continuously reading a video stream from a nacelle and acquiring an image; s2, establishing that a thread C is responsible for continuously acquiring a result from a thread pool; S3, calling setUP () function to initialize a thread pool B, and setting the thread number as 12; S4, after the image is acquired, a submitTask () function is called, and a task is submitted to a task queue; S5, a thread pool manager monitors tasks in a task queue in real time, creates tasks according to requirements, and performs image preprocessing, model reasoning and post-processing operations; s6, circularly calling getTargetImgResult () functions in the thread C to obtain model reasoning and post-processing results.

Description

Rapid target detection method based on RK3588 platform Technical Field The invention belongs to the technical field of computer vision and edge computing, and is mainly used for rapidly detecting and identifying targets in a complex background, in particular relates to a high-frame-rate target detection method aiming at RK3588 edge computing platform optimization, and is suitable for scenes with high real-time requirements such as monitoring, unmanned aerial vehicle reconnaissance, security inspection gate machines and the like. Background Along with the development of computer vision technology, the target detection technology is increasingly widely applied to life aspects, such as monitoring, unmanned aerial vehicle reconnaissance, security inspection, gate and other scenes. And the daily requirement is difficult to meet by only running the target detection algorithm at the pc end. RK3588 is an edge computing device available from Rayleigh, inc., with a computing power of 6TOPs. The target detection algorithm is directly operated in the equipment, the frame rate is often lower than 10fps, and the real-time requirement is difficult to meet. Disclosure of Invention The invention provides a rapid target detection method based on an RK3588 platform. The method is based on yolov s target detection model, by modifying model structure and activation function, model quantization, using thread pool and other methods, the operation efficiency of the target detection algorithm on the RK3588 platform can be greatly improved, and the actually measured frame rate can exceed 100fps. Finally, high frame rate target detection on the edge computing device is achieved. (1) Model structure modification The invention improves based on YOLOv target recognition model. As shown in fig. 1, the structure of YOLOv can be largely divided into four main parts, an input layer, a Backbone network (Backbone), a feature pyramid network (FPN, feature Pyramid Network), and a Head (Head). Improvement-replacement of activation function In the YOLOv model, the activation function in the convolution module uses a SiLU (Sigmoid Linear Unit) function, the formula of which is as follows: The SiLU function is smooth and predictable over the entire input range, making the optimization process more stable. However, since SiLU functions involve Sigmoid computation, the computational complexity is high and the training and reasoning speed is slow. In addition, siLU also has non-zero output in the negative interval, and does not have the characteristic of sparse activation, which may increase the calculation and storage overhead, and is unfavorable for the deployment work of the subsequent target detection model on the airborne calculation platform. To solve this problem, siLU functions are changed to ReLU (RECTIFIED LINEAR Unit) functions, which have the formula: The advantages of ReLU compared to SiLU are as follows: ① The calculation of the ReLU is very simple, and only one comparison operation is adopted, so that the speed of the ReLU is very fast in training and reasoning; ② The gradient vanishing problem is relieved, namely the gradient of the ReLU in the positive interval is constantly 1, the gradient is not approaching to 0 when larger input is carried out like Sigmoid and Tanh, and the gradient vanishing problem is effectively relieved; ③ Sparse activation-when the input is less than 0, the ReLU output is 0, meaning that neurons will be sparsely activated, helping to reduce computation and storage overhead. As can be seen from fig. 2, the running overhead of the ReLU is lower, which is more beneficial to the subsequent lightweight deployment and improves the real-time performance of the target detection task. Improvement II, split model output In the original YOLOv model, the output contains the decoding operation on the decoding box. However, the operation is complex to perform, and the execution efficiency in the NPU of the airborne computing platform is low, which can seriously affect the model operation efficiency. Thus, the decoding operation is removed and put into the CPU for execution. The improved model structure is shown in fig. 3: Taking 640×640 as an example of the model input image size, the decode operation is removed, and the original 6 outputs of the model are directly returned. The output result of the final model and its tensor size are shown in fig. 4: The improved model directly outputs 3 pairs of branches ([ cls1, reg1, cls2, reg2, cls3, reg3 ]), namely the category and the corresponding box regression value. After the 6 output results are obtained, the post-processing is performed manually, and the decoding operation of the sounding box is carried out in the CPU, so that the reasoning efficiency of the model and the airborne computing platform can be greatly improved. (2) Model quantization Model quantization is a technique that reduces the computational complexity and memory requirements of deep learning models by using