CN-122023787-A - Electric power image semantic segmentation and encoding and decoding fusion method and system based on deep learning

CN122023787ACN 122023787 ACN122023787 ACN 122023787ACN-122023787-A

Abstract

The invention discloses a deep learning-based power image semantic segmentation and encoding/decoding fusion method and system, and relates to the technical field of power image processing. The method comprises the steps of sequentially denoising, enhancing and normalizing the power inspection image, segmenting the image by adopting an improved UNet model comprising a depth separable convolution and attention module, outputting segmentation masks of a key area and a background area, training the model by utilizing the segmentation masks, realizing accurate segmentation of the real-time power image, constructing a semantic fusion codec and a dynamic code rate allocation strategy, completing compression and reconstruction on different areas by adopting differential coding, and verifying the quality of the reconstructed image by subjective and objective evaluation. The invention realizes the precise segmentation, the high efficiency of encoding and decoding of the power image, reduces the complexity of the model and the data storage and transmission cost, adapts to the requirements of power inspection scenes, and provides reliable support for equipment fault detection.

Inventors

WANG JIAXIN
FU LIXIANG
WAN ZHENJUN
YU ZHONGSHU
GONG ZHENZHOU
ZHOU LU
ZHOU HAIPING

Assignees

国网江西省电力有限公司南昌供电分公司

Dates

Publication Date: 20260512
Application Date: 20251211

Claims (10)

1. The electric power image semantic segmentation and encoding and decoding fusion method based on deep learning is characterized by comprising the following steps of: s1, preprocessing an electric power image, namely sequentially denoising, enhancing and normalizing an input electric power inspection image and outputting a preprocessed image; s2, semantic segmentation of an electric power image, namely, constructing an improved UNet semantic segmentation model, which is used for segmenting the preprocessed image and outputting a segmentation mask comprising a key area and a background area, wherein the key area comprises a pole tower, a cable and an insulator, and the background area comprises leaves, grasslands and sky; The improved UNet semantic segmentation model comprises an encoder, an attention module, a decoder and an output layer, wherein the encoder adopts a depth separable convolution layer, the attention module is added between the encoder and the decoder, the decoder adopts an up-sampling layer to up-sample a feature image output by the encoder to restore the resolution of the feature image, and the output layer adopts a1 multiplied by 1 convolution layer to convolve the feature image output by the decoder; s3, training the improved UNet semantic segmentation model by utilizing the preprocessed electric image output by the S1, and inputting the real-time electric image into the trained improved UNet semantic segmentation model to obtain a segmentation mask; s4, encoding and decoding processing of fusion semantics, namely constructing an encoder and a decoder which are used for constructing fusion semantics information and a dynamic code rate allocation strategy, inputting a segmentation mask of the real-time power image obtained in the S3, and adopting a differential encoding strategy for a key area and a background area to finish image compression and reconstruction; and S5, evaluating the quality of the reconstructed image in a mode of combining subjective evaluation and objective evaluation, wherein objective evaluation indexes comprise peak signal-to-noise ratio, structural similarity index and compression ratio.
2. The deep learning-based power image semantic segmentation and coding and decoding fusion method according to claim 1, wherein in the step S2, The encoder is composed of a plurality of convolution blocks, wherein each convolution block comprises two depth-separable convolution layers and a batch normalization layer, and a ReLU activation function is adopted to downsample a feature map through a pooling layer, so that the resolution of the feature map is gradually reduced, and the semantic information of the feature map is improved; The attention module is used for adding an attention module between the encoder and the decoder, highlighting the characteristic information of the key region, improving the segmentation precision, adopting a mode of combining channel attention and space attention, firstly weighting the characteristic importance of different channels through the channel attention, then weighting the characteristic importance of different space positions through the space attention, and finally inputting the weighted characteristic diagram into the decoder; The structure of the attention module comprises a channel attention sub-module and a space attention sub-module; The channel attention sub-module is used for carrying out global average pooling on the input feature images to obtain global feature vectors of each channel; the method comprises the steps of obtaining a channel attention weight by converting global feature vectors through two full connection layers and adopting a Sigmoid activation function, multiplying the channel attention weight by an input feature map to obtain a feature map after channel weighting, and obtaining two feature maps by a space attention sub-module, wherein the two feature maps are spliced together, channel fusion is carried out through a 1X 1 convolution layer and the spatial attention weight is obtained by adopting the Sigmoid activation function; The decoder is composed of a plurality of convolution blocks, wherein each convolution block comprises two depth-separable convolution layers and a batch of normalization layers, and a ReLU activation function is adopted to fuse the feature images with the same resolution in the encoder with the feature images in the decoder through jump connection, supplement detail information and improve the edge precision of a segmentation result; The output layer is used for convolving the feature image output by the decoder by adopting a convolution layer of 1 multiplied by 1, adjusting the channel number of the feature image into the category number, and then adopting a softmax activation function to obtain category probability distribution of each pixel and outputting a segmentation mask; The depth separable convolution is decomposed into two steps of depth convolution and point convolution, the depth convolution independently carries out convolution operation on each input channel to extract characteristic information of each channel, and the point convolution adopts And the convolution kernel of the depth convolution output is subjected to channel fusion to obtain a final feature image, the calculated amount and the parameter number of the depth separable convolution are far smaller than those of the traditional convolution operation, the complexity of a model can be effectively reduced, and the calculation efficiency is improved.
3. The deep learning-based power image semantic segmentation and codec fusion method according to claim 1, wherein in S3, the training comprises: The data set construction comprises the steps of collecting power inspection images shot by different scenes, different weather and different equipment, marking the images by using Labelme tools, classifying targets into 4 categories of towers, cables, insulators and backgrounds, and generating corresponding segmentation masks; the loss function is selected by adopting a combined loss function of weighted crossover point loss and Dice loss, the weighted crossover point loss relieves the problem of unbalanced categories by distributing weights for different categories, the Dice loss improves the segmentation precision of a small target by calculating the overlapping degree of a prediction area and a real area, and a combined loss function formula is as follows: Loss DiceLoss, Wherein, the As the weight coefficient of the light-emitting diode, For weighted cross entropy loss DiceLoss is the Dice loss.
4. The deep learning-based power image semantic segmentation and encoding/decoding fusion method according to claim 1 or 3, wherein in the step S3, training parameters are set by adopting an Adam optimizer, and the initial learning rate is The learning rate of every 5 epochs decays to 0.5, the batch size is set to 8, the training round is set to 50, and the training is stopped when the loss of the verification set is reduced by 5 epochs continuously by adopting an early-stop strategy, so that the model is prevented from being fitted excessively.
5. The method for combining semantic segmentation and coding and decoding of electric power images based on deep learning as set forth in claim 1, wherein in the step S4, a code rate allocation strategy dynamically adjusts the code rate ratio of a key area to a background area based on a semantic segmentation result when the encoder is constructed, and the allocation rule is as follows: calculating the ratio of code rate, namely setting the total code rate as The critical area pixel duty cycle is The code rate of the key area is distributed as follows by statistics of the segmentation mask Code rate allocation for background area is as follows Wherein Code rate weight of the key region; Dynamic adjustment mechanism, when the key region has defect, identifying the defect region based on pre-trained ResNet-50 model by defect detection module, and weighting the code rate of the defect region The reconstruction quality of the defect area is further improved by being increased to 0.9, and support is provided for subsequent defect analysis.
6. The depth learning-based power image semantic segmentation and codec fusion method according to claim 1, wherein in S4, constructing the encoder comprises: Based on an improved self-encoder, taking a segmentation mask output by a semantic segmentation module as an additional input, and integrating semantic information into a feature encoding process through a semantic embedding layer; The semantic embedding layer is used for mapping the segmentation mask into semantic feature images with the same channel number as the image feature images through a1 multiplied by 1 convolution layer, and fusing the semantic feature images with the image feature images through residual connection to realize the association of semantic information and image features; hierarchical coding structure, namely adopting a hierarchical coding strategy to divide the feature map fusing semantic information into 3 layers: Shallow layer characteristics, namely detail information of an image is contained, fine quantization is adopted for the shallow layer characteristics of a corresponding key area, and details are reserved; The middle layer features comprise structural information of the image, and a key area and a background area respectively adopt quantization strategies of quantization step sizes; deep features, which are to contain global semantic information of the image, and ensure the overall semantic-inducibility by adopting unified quantization; entropy coding, namely entropy coding the quantized feature map by adopting arithmetic coding, further compressing data quantity, and generating a final compressed code stream.
7. The deep learning-based power image semantic segmentation and codec fusion method according to claim 1, wherein in S4, the constructing a decoder comprises: The semantic de-embedding layer is used for resolving semantic information from the compressed code stream and mapping the semantic information into a semantic guidance feature map through a structure symmetrical to the semantic embedding layer of the encoder; hierarchical decoding and attention fusion, namely reconstructing feature graphs of different levels by adopting a hierarchical decoding strategy, and strengthening feature recovery of a key region through a semantic attention module: deep feature decoding, namely upsampling the deep features, fusing the deep features with quantized residual errors of middle layer features, and generating preliminary structural features; The middle layer feature decoding, namely upsampling the fused middle layer features, focusing a key region through a semantic attention module, and supplementing structural details; Shallow feature decoding, namely upsampling the shallow features and reconstructing key region details and reasonable background regions with high fidelity by combining a semantic guidance feature map; Image reconstruction, namely mapping the final feature map into a reconstructed image consistent with the size of the input image through a 3X 3 convolution layer and a Sigmoid activation function.
8. The deep learning-based power image semantic segmentation and encoding/decoding fusion method according to claim 1, wherein in S4, a decoder analyzes semantic information through a semantic de-embedding layer, adopts a hierarchical decoding strategy, combines a semantic attention module to strengthen key region feature recovery, and finally outputs a reconstructed image through a3×3 convolution layer and a Sigmoid activation function.
9. The deep learning-based power image semantic segmentation and codec fusion method according to claim 1, wherein in S5, Peak signal-to-noise ratio: Assume that the original image is Reconstructing the image into All of the dimensions are The single-channel image and the color image need to be divided into channels to calculate and then get the average value; Calculating the mean square error of the two images: ; Calculating PSNR: ; Structural similarity index: from the brightness Contrast ratio Structural correlation Three-dimensional modeling: Brightness contrast: ; contrast ratio: ; Structural contrast: ; SSIM comprehensive formula: Default to , To avoid a constant with a denominator of 0; compression ratio (Compression Ratio, CR); compression ratio = original image file size +.restructure image file size; evaluation flow: subjective evaluation, namely grading according to 5 minutes, and calculating average subjective scores; objective evaluation, namely calculating PSNR, SSIM and compression ratio; And (5) result fusion, namely carrying out weighted fusion on subjective scores and objective indexes to form a final quality rating.
10. The electric power image semantic segmentation and encoding and decoding fusion system based on deep learning is characterized by comprising a preprocessing module, a semantic segmentation module, a model training module, an encoding and decoding fusion module and a quality evaluation module, wherein the modules realize linkage control through a data bus and a control unit; The preprocessing module is used for sequentially performing denoising, enhancing and normalizing operations on the input power inspection image and outputting a preprocessed image; the semantic segmentation module is internally provided with an improved UNet semantic segmentation model and is used for carrying out semantic segmentation on the preprocessed image and outputting a segmentation mask comprising a key area and a background area, wherein the key area comprises a pole tower, a cable and an insulator, and the background area comprises leaves, grasslands and sky; the model training module is used for training the improved UNet semantic segmentation model by utilizing the preprocessed power image, so that the trained model can output a segmentation mask to the power image input in real time; The coding and decoding fusion module is used for constructing an encoder, a decoder and a dynamic code rate allocation strategy which are used for fusing semantic information, receiving the segmentation mask outputted by the model training module, and adopting a differential coding strategy for a key area and a background area to finish image compression and reconstruction; The quality evaluation module is used for performing quality evaluation on the reconstructed image in a mode of combining subjective evaluation and objective evaluation, and indexes of the objective evaluation comprise peak signal-to-noise ratio, structural similarity index and compression ratio.

Description

Electric power image semantic segmentation and encoding and decoding fusion method and system based on deep learning Technical Field The invention belongs to the technical field of intelligent processing of power system images, and particularly relates to a deep learning-based power image semantic segmentation and encoding/decoding fusion method and system. Background Along with the rapid development of digital power grids, the power system generates mass data in various links such as power generation, power transmission, transformation, power distribution and power consumption, wherein the image data in the power inspection scene is particularly prominent. As an important means for maintaining and managing the power grid system, the power inspection has gradually changed to an intelligent and automatic direction, and technologies such as unmanned aerial vehicle, machine vision and the like are widely applied, so that a large number of power inspection images are generated. Efficient transmission and storage of these image data is one of the key issues in the intelligent construction of power systems. When the traditional image compression technology processes the power image, key areas (such as towers, cables and the like) and background areas (such as leaves, grasslands, sky and the like) in the image cannot be effectively distinguished, and a unified compression strategy is adopted, so that the compression ratio is low, the storage and transmission requirements of mass data cannot be met, or the quality of the image in the key areas is damaged, and the follow-up intelligent analysis tasks such as defect detection and the like are influenced. The image semantic segmentation technology can accurately classify different areas in the image, and provides possibility for differentially processing the image. The image semantic segmentation technology is fused with the grid scene image encoding and decoding technology, so that the defects of the traditional compression technology are hopeful to be overcome, and the efficient compression and high-quality recovery of the power image are realized. At present, the power image coding and decoding technology is mainly based on traditional image compression standards, such as JPEG, JPEG2000 and the like. The JPEG standard adopts Discrete Cosine Transform (DCT) and other technologies, has the advantage of high compression speed, but can generate block effect under high compression ratio, so that the image quality is reduced, and the subsequent defect recognition accuracy can be influenced especially for key areas in the power image. The JPEG2000 standard adopts a wavelet transformation technology, improves compression performance to some extent, supports progressive transmission, has higher computational complexity, and is not beneficial to being applied to terminal equipment with limited computational power such as an electric unmanned aerial vehicle. In recent years, image codec technology based on deep learning has been rapidly developed. The deep learning model can learn complex features of images through training of a large amount of data, and achieve more efficient image compression. For example, based on an image compression model from an encoder, an image is mapped to a low-dimensional feature space by the encoder, and then the image is reconstructed by a decoder, which is superior to the conventional method in terms of compression ratio and image quality. However, the existing image coding and decoding technology based on deep learning is less applied to power scenes, and scene characteristics of power images are not fully considered, so that differential compression of key areas and background areas cannot be realized. In the aspect of semantic segmentation of electric power images, a great deal of researches are carried out by students at home and abroad. Early semantic segmentation methods are based on traditional computer vision technologies such as threshold segmentation, edge detection and region growth, but the methods have poor adaptability to complex backgrounds, low segmentation precision and difficulty in meeting the accurate segmentation requirements of key regions in power images. With the development of deep learning technology, a semantic segmentation method based on Convolutional Neural Network (CNN) becomes a research hotspot. The FCN (full convolution network) replaces a full connection layer in the traditional CNN with a convolution layer, so that end-to-end image semantic segmentation is realized, and a segmentation result with the same size as an input image can be output. The UNet adopts an encoder-decoder structure based on the FCN, fuses the characteristic information of different levels through jump connection, improves the segmentation precision, and is widely applied to the fields of medical image segmentation and the like. Aiming at the characteristics of the electric power image, researchers improve the model such as UNet, for examp