CN-121982591-A - Unmanned aerial vehicle scene-oriented electric power pole tower part detection method

CN121982591ACN 121982591 ACN121982591 ACN 121982591ACN-121982591-A

Abstract

The invention discloses an unmanned aerial vehicle scene-oriented electric power pole and tower part detection method. The method solves the key technical problem in unmanned aerial vehicle power inspection by constructing a target detection model comprising a self-adaptive multi-receptive field module, a hierarchical wavelet interaction unit and a distributed perception confidence optimization head. In the feature extraction stage, a parallel multi-expansion depth convolution and channel attention mechanism is adopted to enhance the multi-scale feature perception capability, in the feature fusion stage, semantic alignment and edge detail reservation are realized through wavelet transformation decomposition and cross-scale channel interaction, in the prediction stage, the classification confidence coefficient is adaptively calibrated based on statistical analysis of boundary box distribution, and the consistency of positioning accuracy and classification score is improved. The invention effectively solves the problems of missed detection and false detection caused by large scale difference and complex background of the components, remarkably improves the accuracy and reliability of the detection result, and provides effective technical support for unmanned aerial vehicle electronic inspection.

Inventors

ZHANG LUQI
ZHANG YUNZUO
GAO KANG
ZHAO XINYU

Assignees

石家庄铁道大学

Dates

Publication Date: 20260505
Application Date: 20260129
Priority Date: 20251124

Claims (8)

1. The electric power pole tower part detection method facing the unmanned aerial vehicle scene is characterized by comprising the following steps of: s1, acquiring an unmanned aerial vehicle aerial photographing power tower image dataset; s2, preprocessing a power pole tower data set; s3, establishing a target detection model of the power tower part of the unmanned aerial vehicle scene; S3.1, in a feature extraction stage, a self-adaptive multi-receptive field module is adopted, the multi-scale feature extraction is carried out by the block through a parallel multi-expansion depth convolution submodule in the block, and the submodule covers receptive fields with different scales on the same channel dimension through a preset plurality of groups of convolution branches with fixed expansion rate so as to enhance the context modeling capability of a target; S3.2, in a feature fusion stage, wavelet transformation decomposition, detail enhancement and channel interaction are carried out on adjacent scale features by adopting a hierarchical wavelet interaction unit, wavelet transformation decomposition is carried out on large scale features to obtain high-frequency components in three directions of low frequency, horizontal, vertical and diagonal, the high-frequency details are fused and enhanced, the low-frequency components and the small scale features are interactively compressed in a channel dimension, and then the low-frequency components and the small scale features are reconstructed into unified scale fusion features through wavelet inverse transformation, so that semantic alignment is realized under a complex background, and edge details are reserved; S3.3, in the prediction stage, a distribution perception confidence optimization head is adopted, and the module adjusts an initial classification score based on Top-k probability and average statistical characteristics thereof by carrying out statistical analysis on predicted boundary frame distribution, so as to optimize the confidence of a target; s4, training the model by utilizing the training set of the data set to obtain an optimal model; and S5, testing the test set image of the data set by utilizing the optimal model to obtain a final test result.
2. The unmanned aerial vehicle scene-oriented power tower part detection method is characterized by comprising the steps of cutting InsPLAD remote sensing data sets into 640x640 image blocks, enabling sliding window sections and blocks not to overlap, filling zero into 640x640 areas with insufficient 640 pixels, converting tag files into a YOLO detection format, and completing coordinate mapping and normalization according to scaling and filling relations.
3. The method for detecting the power tower component oriented to the unmanned aerial vehicle scene of claim 1, wherein the self-adaptive multi-receptive field module is composed of a multi-expansion depth convolution sub-module, a channel attention sub-module and a dimension transformation sub-module.
4. The unmanned aerial vehicle scene-oriented power tower part detection method according to claim 3, wherein the adaptive multi-receptive field module is improved on the basis of a C3k2 frame, a bottleneck unit is replaced by a multi-expansion depth convolution submodule, the submodule is custom designed according to different convolution kernel sizes, when the convolution kernel size is more than or equal to 7, multi-branch expansion depth convolution is adopted, expansion rate combination is adapted according to the convolution kernel size, when the convolution kernel size is 7, a multi-branch convolution structure with expansion rates of [1,2, 3] is adopted, and when the convolution kernel size is 13, expansion rate combination [1,2,3,4,5] is adopted so as to adapt to detection requirements of power tower parts with different sizes.
5. The method for detecting the electric power tower component facing the unmanned aerial vehicle scene according to claim 3, wherein the self-adaptive multi-receptive field module further comprises a channel attention sub-module and a dimension conversion sub-module, wherein the channel attention sub-module is configured to adaptively generate and weight channel weights of the input feature map, the channel attention sub-module is used for laminating the spatial dimension of the feature map through a global average pooling layer, generates weight vectors representing the importance of each channel through convolution operation and Sigmoid activation function, multiplies the weight vectors by the original input feature map channel by channel, and comprises a dimension conversion sub-module, the sub-module is used for converting the feature tensor from a batch-channel-height-width format to a batch-height-width-channel format through layout conversion of NCHW to NHWC, realizing feature dimension expansion and introducing GELU activation through a linear layer, performing global response normalization on the expanded features through a global response normalization layer to stabilize feature distribution, completing feature dimension compression through the linear layer, and finally recovering features of inverse transformation of NHWC to NCHW into the batch-channel-height-width format.
6. The method for detecting the power tower component oriented to the unmanned aerial vehicle scene according to claim 1, wherein the hierarchical wavelet interaction unit is configured to fuse a large-scale feature with a small-scale feature, decompose the large-scale feature into a low-frequency component and a high-frequency component in three directions of horizontal, vertical and diagonal through wavelet transformation, respectively strengthen the high-frequency component, perform channel interaction on the low-frequency component and the small-scale feature, and finally reconstruct the processed component into a unified-scale fusion feature through wavelet inversion.
7. The method for detecting the electric power pole tower component oriented to the unmanned aerial vehicle scene is characterized by comprising the steps of carrying out wavelet transformation in a hierarchical wavelet interaction unit, decomposing large-scale features into approximate low-frequency components and high-frequency components in three directions of horizontal, vertical and diagonal, splicing the three-direction high-frequency components and carrying out high-frequency fusion enhancement through residual blocks containing continuous 3x3 convolution, splicing the approximate low-frequency components and the small-scale features in a channel dimension, carrying out cross-channel interaction through a channel transformation module formed by 1x1 convolution, compressing the approximate low-frequency components and the small-scale features to 3 times of the large-scale feature dimension, adapting to the input dimension requirement of wavelet inverse transformation, splicing the enhanced high-frequency features and the low-frequency small-scale features subjected to interactive compression, and carrying out wavelet inverse transformation to obtain fusion features.
8. The unmanned aerial vehicle scene-oriented electric power pole and tower part detection method is characterized in that the distribution perception confidence optimization head is configured to conduct parallel processing on input features, generate discrete coordinate distribution features of a boundary frame through a first convolution branch, the first convolution branch comprises two continuous 3x3 standard convolution layers and one final 1x1 convolution layer, generate initial classification scores through a second convolution branch, remodel the discrete coordinate distribution features into tensor structures of [ B, M,4, H, W ], wherein M is a regression maximum value, the probability distribution of boundary frame coordinates is obtained through Softmax function processing, the first K maximum probability values on each coordinate dimension are extracted from the probability distribution, the mean value of the K probability values is calculated, the first K probability values are spliced with the mean value to form distribution statistical features, the distribution statistical features are subjected to nonlinear transformation by a multilayer perception machine to generate quality score adjustment values, the initial quality score adjustment values and the initial classification scores are added to the convolution score adjustment values, the output score values are subjected to linear transformation by the multilayer perception machine, and the second convolution score adjustment values are subjected to the multilayer perception machine comprises the 4-level classification score optimization structure, and the first 4 is subjected to the linear transformation is used for obtaining the multilayer classification score layer 1.

Description

Unmanned aerial vehicle scene-oriented electric power pole tower part detection method Technical Field The invention relates to an unmanned aerial vehicle scene-oriented electric power tower part detection method, and belongs to the field of computer vision. Background The electric energy is used as a core energy source for operation in the modern society, and the stable supply of the electric energy directly depends on the safe operation of a high-voltage transmission line. As a key infrastructure of the power transmission line, the power tower is exposed to a complex natural environment for a long time, and inspection maintenance is required to be performed regularly to ensure the reliability of the power grid. The traditional manual inspection method has the inherent defects of high labor intensity, low detection efficiency, high safety risk and the like, and particularly has higher implementation difficulty in the manual inspection in areas such as mountain areas and forest lands with complex terrains. In recent years, with the rapid development of unmanned aerial vehicle technology, an electric power inspection scheme based on an unmanned aerial vehicle platform shows remarkable advantages, and the unmanned aerial vehicle provides a brand new technical means for electric power inspection by virtue of flexible maneuverability, wide monitoring view angle and efficient operation mode, so that the inspection mode not only greatly improves the operation efficiency, but also effectively reduces personnel safety risks, and becomes an important development direction of electric power facility inspection. In the unmanned aerial vehicle power inspection task, accurately identifying various key components on a pole tower, including insulators, hardware fittings, bolts and the like, is a basic link for evaluating the state of equipment. Compared with the whole identification of the pole tower, the component level detection faces more serious technical challenges that firstly, the component scale difference is obvious, a plurality of meters of insulator strings and centimeter-level fasteners coexist in the same scene, extremely high requirements are put on the multi-scale feature sensing capability of a detection model, secondly, the component is various in form and is easily influenced by factors such as angle change, local shielding and illumination conditions under the aerial view angle, and in addition, the difficulty of accurate detection is further increased by complex environmental backgrounds such as mountain forests, fields, buildings and the like and factors such as weather changes, and the factors jointly form the technical bottleneck of the detection of the electric power component. Currently, a target detection method based on deep learning has been widely used in the field of power component detection, and researchers have proposed various improvement strategies including introducing attention mechanisms to enhance feature expression capability, adopting feature pyramid networks to fuse multi-scale information, and optimizing detection performance by improving network structures and loss functions. However, the method still has obvious defects in a plurality of key links that the receptive field design of the traditional convolutional neural network is relatively fixed in terms of feature extraction, the demand of feature learning of parts with different dimensions is difficult to adapt, the limitation causes that a model can not simultaneously and effectively capture target features with obvious differences, such as a large insulator, a micro bolt and the like, so that the omission rate of the parts with small dimensions is higher, meanwhile, the feature expression of the parts with large dimensions is insufficient, and Tan et al adopt a directional target detection and deformable convolutional structure, thereby realizing high-precision identification of the power transmission tower in remote sensing images. Li et al propose a cross-scale spatial attention detector that effectively identifies fine components in the transmission line mechanical connection without involving receptive field adaptive adjustment. In the aspect of feature fusion, the existing multi-scale fusion method mostly adopts simple up-sampling and feature splicing operation, lacks deep excavation of semantic association among cross-scale features, easily generates the problem of feature mismatch in a complex scene, causes loss of key detail information in the transmission process, and directly influences the positioning capability of accurate boundaries of components. At the prediction output level, obvious mismatch phenomenon exists between classification confidence and positioning quality, and the existing method lacks an accurate assessment mechanism for the quality of a prediction frame, so that correct detection with accurate positioning but lower confidence is filtered in a detection result, and incorrect detection with inacc