CN-122023432-A - Semantic segmentation method and system for transmission line inspection image

CN122023432ACN 122023432 ACN122023432 ACN 122023432ACN-122023432-A

Abstract

Firstly, constructing a semantic segmentation model based on a multi-scale self-adaptive network and state space attention, wherein the model is embedded into a CSSP module in sequence in an encoder-decoder framework to realize self-adaptive fusion of adjacent level characteristics, and an E-LASS module processes local details and global context of content perception in parallel; and model training is performed by adopting a multi-scale self-adaptive focus loss function. According to the invention, the CSSP module is used for enhancing the multi-scale feature fusion capability, the E-LASS module is used for efficiently modeling the long-range structure dependence with approximate linear complexity, and the MS-AFL loss dynamic optimization training process is used for effectively solving the semantic segmentation problems of variable target scale, strong structure dependence and easy loss of details of an insulator under a complex power transmission background, and the segmentation precision and robustness are remarkably improved.

Inventors

ZHANG KEFEI
Ai Xun

Assignees

湖北工业大学

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (8)

1. The transmission line inspection image semantic segmentation method is characterized by comprising the following steps of: acquiring a transmission line inspection image containing an insulator, and carrying out pixel-level labeling on an insulator region in the transmission line inspection image to obtain a training sample; Establishing a MANSA-based semantic segmentation network model, wherein the semantic segmentation network model takes an encoder-decoder as a backbone, and embeds at least one CSSP module and at least one E-LASS module; according to the training sample, performing iterative training on the semantic segmentation network model by adopting a multi-scale self-adaptive focus loss function to obtain a trained target semantic segmentation model; inputting the transmission line inspection image to be segmented into the target semantic segmentation model, and outputting the target semantic segmentation model to obtain an insulator pixel-level semantic segmentation result graph corresponding to the transmission line inspection image to be segmented.
2. The semantic segmentation method for the transmission line inspection image according to claim 1, wherein the constructing of the MANSA-based semantic segmentation network model comprises: Extracting multi-level features of an input image by adopting a pre-trained convolutional neural network backbone to obtain a first-level feature map, a second-level feature map, a third-level feature map and a fourth-level feature map; inputting the second-level feature map into a first CSSP module, and fusing and enhancing the second-level feature map with the first-level feature map to obtain a first fused feature map; Inputting the third-level feature map into a second CSSP module, and fusing and enhancing the third-level feature map and the first fused feature map to obtain a second fused feature map; Inputting the fourth-level feature map into a third CSSP module, and fusing and enhancing the fourth-level feature map with the second fusion feature map F to obtain a third fusion feature map; inputting the first fusion feature map into a first E-LASS module for processing to obtain a first enhancement feature map; Inputting the second fusion feature map into a second E-LASS module for processing to obtain a second enhancement feature map; inputting the third fusion feature map into a third E-LASS module for processing to obtain a third enhancement feature map; based on the encoder-decoder architecture, taking the third enhancement feature map as a starting point, gradually restoring to the resolution of the input image by step up-sampling and jumping connection and feature fusion with the enhancement feature map of the corresponding level, and finally outputting a prediction probability map of each pixel belonging to an insulator or a background through a convolution layer and a Softmax function.
3. The semantic segmentation method of a transmission line inspection image according to claim 1, wherein the at least one CSSP module takes a current-stage feature map and a previous-stage feature map as inputs, and performs the following operations to obtain an output feature map: Performing bilinear upsampling operation on the previous stage feature map to align the space size of the previous stage feature map with the current stage feature map, thereby obtaining space alignment features; Performing 1×1 convolution operation on the current stage feature map to adjust the number of channels, thereby obtaining channel alignment features; after the space alignment feature is subjected to 1 multiplied by 1 convolution to unify the channel number, the space alignment feature and the channel alignment feature are spliced along the channel dimension to obtain a spliced feature ; Flattening the spliced features in the space dimension to form a feature sequence, inputting a light structured state space sequence model to perform long-range dependency modeling, outputting an enhanced target feature sequence, and reconstructing the target feature sequence into a state space modeling feature map ; Based on the splice characteristics Generating a space self-adaptive weight graph G through a gating network, and modeling the spliced characteristic and the state space by utilizing the weight graph G Weighting fusion is carried out to obtain a fusion result The expression is: Wherein, as follows; and carrying out 1 multiplied by 1 convolution on the fusion result to adjust the channel number, and obtaining a final output characteristic diagram of the at least one CSSP module.
4. The method for semantic segmentation of transmission line inspection images according to claim 1, wherein the at least one E-LASS module is configured to input a feature map As input, an output feature map is obtained through parallel processing of a double-branch structure and cross-branch fusion, and the method specifically comprises the following steps: Parallel processing of the input feature map using S depth separable convolution kernels with different expansion rates S multi-scale local feature graphs are obtained; Analyzing the input feature map through a lightweight subnetwork Generating S pieces of space attention map with the same size as the input; multiplying the S multi-scale local feature images with corresponding space attention patterns element by element and summing to obtain a local detail enhancement feature image; Analyzing the input profile over a lightweight routing network Generating a content importance map I; Will input a feature map Flattening the input sequence to obtain an input sequence, carrying out differential descending rearrangement on elements in the input sequence according to the value of the content importance graph I, scanning the rearranged input sequence in a plurality of preset space directions to form a plurality of subsequences, respectively inputting independent state space sequence models for processing, and recombining and adding the outputs in all directions to obtain a preliminary global context characteristic; Processing input feature graphs using a neighborhood attention module Obtaining local supplementary features, and fusing the preliminary global context features with the local supplementary features through a gating mechanism to obtain a global context feature map ; And carrying out self-adaptive weighted fusion on the local detail enhancement feature map and the global context feature map through a final gating network to obtain a final output feature map of the at least one E-LASS module.
5. The semantic segmentation method of a transmission line inspection image according to claim 1, wherein the multi-scale adaptive focus loss function calculates a loss L of a single sample image during training, comprising the steps of: Calculating the area of each insulator target instance k in the sample image according to the real segmentation mask of the sample image Setting the maximum example area in the batch as And assigning scale weights to all pixels within each insulator target instance k Wherein ε is a very small constant; according to the current training period T and the total period T, calculating a dynamic focusing parameter Calculating dynamic balance factors according to the average intersection ratio mIoU of the semantic segmentation network model on the latest verification set , wherein, As a basis for the balancing factor, For training the maximum focus parameter in the later stage, The minimum focusing parameter is the initial stage of training; for pixel i, let the real label of pixel i be The probability that the semantic segmentation network model predicts pixel i as an insulator is The scale weight of the instance to which pixel i belongs is Then the loss contribution of pixel i The expression of (2) is: , , In the formula, As the focus parameter after the scale modulation, Is a modulation intensity coefficient; the loss contributions of all pixels of the sample image are summed to obtain a total loss L of said sample image.
6. The utility model provides a transmission line inspection image's semantic segmentation system which characterized in that includes: the acquisition module is configured to acquire a power transmission line inspection image containing an insulator, and carry out pixel-level labeling on an insulator region in the power transmission line inspection image to obtain a training sample; a construction module configured to construct MANSA-based semantic segmentation network models, which take an encoder-decoder as a backbone and embed at least one CSSP module and at least one E-LASS module; The training module is configured to perform iterative training on the semantic segmentation network model by adopting a multi-scale self-adaptive focus loss function according to the training sample to obtain a trained target semantic segmentation model; the output module is configured to input the transmission line inspection image to be segmented into the target semantic segmentation model, and the target semantic segmentation model outputs an insulator pixel-level semantic segmentation result graph corresponding to the transmission line inspection image to be segmented.
7. An electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of any one of claims 1 to 5.

Description

Semantic segmentation method and system for transmission line inspection image Technical Field The invention belongs to the technical field of computer vision and intelligent inspection of power equipment, and particularly relates to a semantic segmentation method and a semantic segmentation system of an inspection image of a power transmission line. Background The safe and stable operation of the transmission line is important to national economy and social life. The insulator is used as a key component in a circuit, and the state (such as breakage, pollution and missing) of the insulator directly affects the power supply reliability. With the widespread use of Unmanned Aerial Vehicle (UAV) inspection techniques, massive inspection images are generated. How to automatically and accurately identify and segment insulators from the images is the first step of performing subsequent defect detection and state evaluation, namely a semantic segmentation task. The existing semantic segmentation scheme mainly has the following problems: The model (such as U-Net) based on the encoder-decoder architecture fuses shallow details and deep semantics through jump connection, but the characteristic interaction between different stages inside the encoder is usually insufficient, the modeling capability on long-distance context dependence is limited, and the targets with remarkable long-range structures such as insulator strings are difficult to process. Models (e.g., HRNet) that maintain high-resolution feature maps throughout the network to preserve detail, but capture global context information is inefficient and subject to interference when dealing with complex contexts. The transducer-based model can effectively model global dependence by using a self-attention mechanism, but the computational complexity is proportional to the square of the image size, so that the real-time processing of a high-resolution inspection image is challenging, and a large amount of data is required for training. A State Space Model (SSM) based on a fixed scanning sequence is used as an emerging sequence model, has the potential of linear computation complexity, but the fixed scanning sequence (such as a grating sequence) cannot be adaptively adjusted according to image content, and has insufficient modeling capability for insulator segmentation scenes with uneven information distribution and variable target scale. In summary, the existing general segmentation model is difficult to simultaneously meet the severe requirements of robustness, long-range structure dependence, detail retention capability and calculation efficiency of a multi-scale target in a transmission line inspection scene. Therefore, there is a need for a semantic segmentation method that is designed specifically for insulator segmentation, can adaptively fuse multi-scale information, efficiently model long-range context, and compromise both detail and efficiency. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides a semantic segmentation method and a semantic segmentation system for a transmission line inspection image. According to the method, through an innovative network architecture and a loss function design, accurate and efficient segmentation of the multi-scale insulator targets in a complex background is achieved. In a first aspect, the present invention provides a semantic segmentation method for a transmission line inspection image, including: acquiring a transmission line inspection image containing an insulator, and carrying out pixel-level labeling on an insulator region in the transmission line inspection image to obtain a training sample; Establishing a MANSA-based semantic segmentation network model, wherein the semantic segmentation network model takes an encoder-decoder as a backbone, and embeds at least one CSSP module and at least one E-LASS module; according to the training sample, performing iterative training on the semantic segmentation network model by adopting a multi-scale self-adaptive focus loss function to obtain a trained target semantic segmentation model; inputting the transmission line inspection image to be segmented into the target semantic segmentation model, and outputting the target semantic segmentation model to obtain an insulator pixel-level semantic segmentation result graph corresponding to the transmission line inspection image to be segmented. In a second aspect, the present invention provides a semantic segmentation system for a transmission line inspection image, including: the acquisition module is configured to acquire a power transmission line inspection image containing an insulator, and carry out pixel-level labeling on an insulator region in the power transmission line inspection image to obtain a training sample; a construction module configured to construct MANSA-based semantic segmentation network models, which take an encoder-decoder as a backbone and embed at least one CSSP