CN-121982293-A - High-altitude operation safety belt detection method based on improved YOLOv network

CN121982293ACN 121982293 ACN121982293 ACN 121982293ACN-121982293-A

Abstract

The invention discloses an aerial operation safety belt detection method based on an improved YOLOv network, which is based on a YOLOv network, wherein DynamicConv and C3k2 are introduced into a feature extraction network to be fused, C3k2_DC is constructed, the self-adaptive feature extraction capacity of the network to a multi-scale target is enhanced, a multi-level feature fusion module SDI and a weighted bidirectional feature pyramid network Bi-FPN are integrated in the feature fusion network to form a brand-new feature fusion network structure, fusion of deep semantic and shallow detail features is enhanced, small target detection precision of safety belts and the like is improved, and finally, a layer self-adaptive sparse pruning LAMP algorithm based on amplitude is adopted, model redundancy parameters are reduced on the premise of maintaining the detection precision, and lightweight design is realized. The invention can realize accurate detection of the high-altitude operation safety belt, effectively meets the dual requirements of real-time performance and light weight in actual engineering application, and shows good engineering application prospect.

Inventors

DU QILIANG
Min Zhizhu
TIAN LIANFANG
YUAN LING

Assignees

华南理工大学

Dates

Publication Date: 20260505
Application Date: 20260212

Claims (5)

1. The high-altitude operation safety belt detection method based on the improved YOLOv network is characterized in that the improved YOLOv network is characterized in that a feature extraction network BackBone and a feature Fusion network Neck of an original YOLOv network are improved, and a structural pruning is carried out on the improved YOLOv network by utilizing an amplitude-based layer self-adaptive sparse pruning LAMP algorithm, wherein the improvement of BackBone is that a multi-branch multi-scale feature Fusion module C3k2 in the network is uniformly replaced by a multi-branch multi-scale dynamic convolution feature Fusion module which is called C3k2_DC, the improvement of Neck is that an original network structure is replaced by a weighted bidirectional feature pyramid network which is called Bi-FPN, and a feature splicing module Concat is replaced by a feature splicing module which is fused with a multi-level feature Fusion module SDI which is called Fusion; The specific implementation of the high-altitude operation safety belt detection method comprises the following steps: s1, acquiring image data and preprocessing to obtain image data marked with three types of labels of a safety belt, an aerial worker and a ground worker as a training data set for subsequent network training; S2, inputting a training data set into an improved YOLOv network to train, firstly extracting characteristics of a safety belt target in an overhead operation scene through BackBone during training, then inputting obtained characteristic information into Neck for multi-level information fusion, and generating a corresponding safety belt detection result by a detection output module of the improved YOLOv network; S3, performing network pruning on the trained reference model by using an LAMP algorithm, and pruning a channel with low score until reaching a preset global sparsity by calculating the importance score of the weight of the convolution layer; And S4, deploying the optimal model obtained by pruning into edge GPU equipment through an inference acceleration framework TensorRT, optimally converting the model structure and parameters to generate an acceleration model suitable for real-time inference of the edge GPU equipment, acquiring real-time images by the edge GPU equipment through a field high-definition deployment control ball, inputting the images into the acceleration model for inference, and finally obtaining and outputting a detection result of the high-altitude operation safety belt.
2. The method for detecting the safety belt for the high-altitude operation based on the improved YOLOv network according to claim 1, wherein in the step S1, firstly, low-quality samples which are unclear, overexposed or insufficient, target missing and key characteristics can not be identified in original image data are removed, then, the effective samples are uniformly converted into a standard image format compatible with the network and are adjusted to a preset uniform size 640 x 640, the sample dimension of the input network is ensured to be consistent, the effective samples with the adjusted size are constructed into a training dataset, finally, the training dataset is subjected to geometric transformation, pixel level adjustment and scene fusion operation, wherein the geometric transformation comprises random cutting, overturning, rotating and scaling, the pixel level adjustment comprises color dithering, gray level conversion, slight noise addition and exposure adjustment, the scene fusion comprises Mosaic and cutting and pasting operation CutMix, two images are mixed according to rectangular areas, and labels are mixed at the same time, and the labels are three types of safety belt, high-altitude operation personnel and ground operation personnel.
3. The method for detecting the high-altitude operation safety belt based on the improved YOLOv network according to claim 1, wherein BackBone comprises a basic convolution module Conv, a C3k2_DC, a multi-scale enhancement module SPPF and a feature extraction module C2PSA, wherein Conv is formed by convolution operation and batch normalization BN and SiLU activation function splicing, C3k2_DC comprises two branches, a first branch adopts a cascade structure of 1 basic convolution module and 1 dynamic convolution module DynamicConv, the dimension adjustment and the local feature enhancement are carried out on a shunt feature through the basic convolution module, then 2 parallel convolution kernels are used for inputting DynamicConv, dynamicConv, global information encoding is carried out on the input feature by means of a lightweight attention network, 2 corresponding weight vectors are generated, and after the weight vectors are normalized by Softmax, the weight fusion is carried out on the output features of the 2 convolution kernels, so that dynamic convolution output is formed, and the calculation mode is as follows: ; In the formula, The attention weight representing the kth convolution kernel, In order to input the characteristics of the feature, And (3) with The method comprises the steps of respectively obtaining weight and offset of a kth convolution kernel, wherein 'x' represents convolution operation, introducing cross-layer residual error connection operation into a first branch, and finally adding DynamicConv input and output element by element, adopting a cascade structure of 2 basic convolution modules and 1 DynamicConv, wherein the 2 basic convolution modules of the cascade structure form a local feature extraction submodule for extracting and stacking multiscale local features, SPPF consists of 2 basic convolution modules, 3 continuous max pooling modules MaxPooled d and Concat, and C2PSA consists of 2 basic convolution modules, feature segmentation modules Split and polarization self-attention modules PSABlock and Concat; The Neck is to fuse the four feature maps with different scales inputted by BackBone, integrate the feature map information extracted by BackBone at different stages, and increase the receptive field of the network, in Neck, replace the original network structure with Bi-FPN, including Conv, c3k2_dc, and employ a module Upsample and Fusion, wherein Upsample is used to upsample the high-level semantic features to align them with the low-level features in spatial resolution, fusion is used to encode the global information of the input features through a lightweight attention sub-network for each level of input features, to generate scale dependent weights corresponding to the level, and the scale dependent weights of all levels need to be normalized by one softmax to realize weight distribution constraint of each level of features, and the output feature calculation mode of Fusion is specifically as follows: ; In the formula, The output of Fusion is indicated as such, Representing the input characteristics of the i-th layer, Represents the scale dependent weight of the ith layer, satisfies the following conditions The Bi-FPN constructs a bottom-up and top-down bidirectional feature Fusion path, namely a bottom-up path and a top-down path, wherein the top-down path starts with the deepest layer feature, sequentially upsamples the current layer feature and then performs weighted Fusion with the previous layer feature to generate an intermediate feature of the path, the bottom-up path starts with the shallow layer output feature of the top-down path, sequentially downsamples the current layer feature and then performs weighted Fusion with the next layer feature to generate a final multi-scale Fusion feature, and the Bi-FPN processes the multi-scale feature through the bidirectional feature Fusion path and a learnable weighted Fusion mechanism, wherein the weighted feature Fusion calculation mode is as follows: ; In the formula, For the output of the ith layer, For the input of the i-th layer, And The i-th and j-th learnable weights respectively, And finally, carrying out one-step processing on the normalized fusion characteristic by using a convolution layer, and generating a characteristic diagram which is finally used for outputting or transmitting to the next layer.
4. The method for detecting the high-altitude operation safety belt based on the improved YOLOv network according to claim 1, wherein in step S3, firstly, all convolution layers to be pruned in a reference model are traversed, weights of convolution kernels of each layer are regarded as independent sets respectively, then, weight tensors of each layer are flattened into one-dimensional vectors and are arranged in descending order according to absolute values, and on the basis, the pruning weight fraction of each weight, namely, the LAMP fraction, is calculated according to the following formula: ; where u denotes the u-th index, v denotes the v-th index, The LAMP score is indicated as the LAMP score, Representing the weights mapped by the index u, The LAMP scores obtained by calculation of all convolution layers are gathered to the same global list to form a complete set containing all weights to be pruned and corresponding scores thereof, then all weights are arranged in an ascending order according to the LAMP scores to obtain a sequencing list, the number of weights to be pruned is calculated according to a preset global sparsity, the weight with the lowest score of the first 50% is selected from the sequencing list, the weight is judged to be the least important parameter of the global, and zero setting operation is carried out, so that the structured pruning of the model is realized; In the fine tuning process, firstly, initializing a model, wherein the LAMP score in the model is completely reserved according to the weight of a convolution kernel and a channel which are judged to be important and is used as an optimized starting point, and the newly added or adjusted parameters of adjacent layers with unmatched connection dimensions generated by pruning are initialized randomly in a small scale, and in the training strategy, a preheating scheduling strategy with low initial learning rate is adopted, namely, an extremely low learning rate 1e-4 is used in the initial training stage, and is accompanied by 10 rounds of linear preheating Warmup, so that the model can be optimized stably in the state after pruning, and after the preheating is finished, the learning rate is gradually adjusted according to a cosine annealing strategy, and the pruned optimal model is finally obtained after 300 rounds of training.
5. The method for detecting the safety belt of the aerial work based on the improved YOLOv network according to claim 1 is characterized in that in step S4, firstly, the optimal model obtained by training in step S3 is exported to be ONNX in a general intermediate representation format to obtain ONNX model, then the ONNX model is analyzed and optimized based on an inference acceleration framework TensorRT to generate a TRT efficient inference model facing edge GPU equipment, which is called an acceleration model, so as to complete deployment of the model on the edge GPU equipment, in an actual prediction stage, the original image collected by field Gao Qingbu control balls is preprocessed by using openCV, including image graying, scale scaling and region clipping, the input image is converted into an image to be detected consistent with the image size in training dataset, then, the preprocessed image is input into the acceleration model for acceleration reasoning, a model output result adopts a non-maximum suppression NMS algorithm to remove a redundancy detection frame, and finally, the safety belt target of the aerial work meeting the requirements is screened according to a set confidence threshold value to obtain and output the detection result of the safety belt.

Description

High-altitude operation safety belt detection method based on improved YOLOv network Technical Field The invention relates to the technical field of power engineering construction safety monitoring, in particular to an overhead operation safety belt detection method based on an improved YOLOv network. Background In the high-altitude construction scene of the electric power capital construction, the real-time accurate identification of the position, the state and the behavior of the operators is realized, and the method is a core link for guaranteeing the construction safety and reducing the falling risk. At present, safety monitoring mainly depends on manual inspection and fixed monitoring equipment, however, the method is not only easily interfered by complex operation background, but also is more difficult to effectively cope with long-distance and small-size targets common in electric power infrastructure scenes, and the problems of low detection precision, insufficient real-time performance, low model efficiency and the like generally exist, so that the method cannot be suitable for changeable outdoor high-altitude operation environments. In recent years, the rapid development of computer vision and deep learning technology promotes the application of an automatic monitoring method based on a target detection model in the field of personnel safety identification, and provides a new technical path for high-altitude operation safety supervision. However, the existing algorithm still has a plurality of defects in practical engineering application, and the main defects are as follows: 1. In high-altitude electric power scenes such as towers and power transformation equipment, the working environment is obviously influenced by terrain, illumination and climate, so that strong light, shadow, shielding and complex background interference exist in image data. These problems not only result in weakening of target features and blurring of boundaries, but also provide serious challenges for feature extraction and resolution capability of detection models, which directly result in insufficient detection stability and low accuracy of detection on operators and safety belts. 2. Small targets such as safety belts have low detection accuracy. Under high-altitude scenes, the occupation ratio of key safety components such as safety belts, safety hooks and the like in images is extremely small, and the key safety components are often difficult to identify due to shooting angles, partial shielding or the similarity with the background. The general detection model has insufficient feature extraction capability on the small targets, the effective receptive field is difficult to cover and can not learn enough distinguishing features, so that serious missed detection and false detection of the model occur, the correct wearing state of the safety belt can not be accurately judged, and the reliability of the safety monitoring system is seriously weakened. 3. The model calculation efficiency is low, and the real-time performance is not enough. The existing deep learning detection model is complex in structure and large in parameter quantity, so that the calculation complexity is high, and a large amount of calculation resources are required to be occupied in an reasoning stage. In practical deployment, the high-delay characteristic makes it difficult to meet the real-time early warning requirement required by high-altitude safety monitoring, and severely restricts the timeliness and practical value of the safety protection system. 4. The complexity of the model is too high, and the lightweight design facing edge calculation is lacking. The current mainstream detection model is designed to take the precision index into priority, and introduces a large number of redundant parameters and complex network layers. When the structure is deployed towards the edge side, serious inadaptability is exposed, a huge model volume occupies precious storage resources, and complicated calculation flow causes power consumption rising and reasoning delay, so that real-time detection is difficult to realize finally. In summary, the existing power infrastructure high-altitude operation safety belt monitoring method has the defects in three key dimensions of coping with the interference of the complex power environment, accurately capturing the small-size target and realizing efficient real-time calculation. This technical bottleneck results in the fact that the conventional method cannot meet the real-time, accurate and high-reliability detection requirements for safety critical equipment (especially safety belts) in the power overhead operation scene. Therefore, an innovative solution is needed in the art, and the solution must be capable of effectively balancing the detection precision, speed and model complexity, and has strong environmental adaptability and engineering practicability, so as to realize intelligent and real-time monitoring of the we