CN-121999228-A - Street maintenance segmentation method based on full-scale perception and pool-associated attention
Abstract
The invention discloses a street maintenance segmentation method based on full-scale perception and pool associated attention in the technical field of street scene semantic segmentation, which comprises the following steps of S1, acquiring a street scene data set and preprocessing, S2, constructing a GAPA-Seg segmentation model, wherein the GAPA decoder comprises a feature extraction backbone network and a GAPA decoder, the GAPA decoder comprises a full-scale perception module FSAM, a pool associated attention module PAAM and a double-solution wharf, S3, training the GAPA-Seg segmentation model by using the preprocessed data set, and S4, segmenting a street image to be processed by using the trained GAPA-Seg segmentation model, and judging facility type abnormality. According to the segmentation method, through the decoder based on the full-scale perception FSAM and the pool associated attention PAAM, the full fusion among the features of different scales is realized, the significant features and surrounding hidden features are modeled, the segmentation precision of the model is improved, the understanding capability of street scenes is enhanced, and the maintenance efficiency of street facilities is improved.
Inventors
- WANG WENHAO
- DAI LINHONG
- LIU LUPING
- An Mengke
- CAO ZHENGYANG
- ZHANG YANRU
- ZHU HONG
Assignees
- 淮阴工学院
Dates
- Publication Date
- 20260508
- Application Date
- 20260210
Claims (9)
- 1. The street maintenance segmentation method based on full-scale perception and pool associated attention is characterized by comprising the following steps of: S1, acquiring a street scene data set and preprocessing; S2, constructing a GAPA-Seg segmentation model, wherein the GAPA-Seg segmentation model comprises a feature extraction backbone network and a GAPA decoder, the feature extraction backbone network is used for extracting multi-scale features from an input image and inputting the multi-scale features into the GAPA decoder, the GAPA decoder is used for outputting a segmentation map according to the input multi-scale features, and the GAPA decoder comprises a full-scale perception module FSAM, a pool associated attention module PAAM and a double-solution wharf; s3, training a GAPA-Seg segmentation model by using the preprocessed data set; S4, segmenting the street image to be processed by using the trained GAPA-Seg segmentation model, and judging facility type abnormality.
- 2. The street maintenance segmentation method as set forth in claim 1, wherein the preprocessing in S1 comprises performing random scaling, fixed size cropping, and color dithering on the image.
- 3. The method of claim 1, wherein S2 the feature extraction backbone network uses STDC-Net split network backbone to extract the multi-scale feature map of the second, third and fourth levels and input the multi-scale feature map to the GAPA decoder.
- 4. The street maintenance segmentation method according to claim 1, wherein the full-scale perception module FSAM in S2 comprises a detail perception module DAM and a semantic perception module SAM which are connected in parallel, and a progressive scale interaction module PSIM which is connected in series, wherein the DAM performs spatial attention weighting strengthening details on shallow downsampled features, the SAM performs channel attention weighting strengthening semantics on deep upsampled features, and the PSIM realizes progressive fusion from global scale to local scale.
- 5. The street maintenance segmentation method according to claim 4, wherein the progressive scale interaction module PSIM comprises a plurality of pooled feature maps generated by using average pooling of different kernel sizes and step sizes, gradually upsampling and fusing are started from the pooled feature map with small space dimension, 3 x 3 depth convolution is performed after each step of fusion, and finally channel interaction is realized through 1 x 1 convolution and residual connection is constructed.
- 6. The street maintenance segmentation method according to claim 5, wherein the pool-associated attention module PAAM in S2 comprises a cascade pooled attention CPA and a feedforward neural network FFN which are connected in series, wherein the CPA respectively captures average characteristic information and maximum characteristic information in the horizontal direction and the vertical direction in the characteristic map and carries out reverse-axis modeling, the influence of the significant information on the local characteristics is enhanced, and the FFN enhances the expression of the potential characteristics through up-dimension activation.
- 7. The street maintenance segmentation method according to claim 6, wherein S2 the double solution wharf comprises an online difficult-case mining auxiliary solution wharf OHEMA and a class-weighted online difficult-case mining decoding head WOHEMD, wherein OHEMA directly predicts the output of the FSAM into a segmentation map, dynamically identifies pixels with high loss values in the training process, calculates gradients and performs counter-propagation, and WOHEMD serially connects a pool-associated attention module PAAM to process the final output of the GAPA-Seg segmentation network.
- 8. The street maintenance segmentation method according to claim 1, wherein the facility category abnormality determination in S4 comprises determining the bending or the falling of the road by combining a minimum circumscribed rectangle method with multivariate time series analysis on the rod category and determining the degradation by using univariate time series analysis on the road and the pavement category.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the street maintenance segmentation method as claimed in any one of claims 1-8 when the program is executed.
Description
Street maintenance segmentation method based on full-scale perception and pool-associated attention Technical Field The invention relates to the technical field of street scene semantic segmentation, in particular to a street maintenance segmentation method based on full-scale perception and pool associated attention. Background Street image segmentation is the core application direction of image segmentation technology, and aims to solve the semantic analysis task of complex street scenes. The technology aims to accurately extract a specific semantic region from random and changeable real scenes by classifying objects at pixel level on natural street images, and provides important support for key fields such as automatic driving, intelligent maintenance of urban facilities and the like. In recent years, a deep learning driven segmentation model makes breakthrough progress, namely researchers successfully realize efficient deployment of a high-precision segmentation model on edge equipment by optimizing a network structure and reducing the quantity of parameters while continuously improving the feature extraction capability and the multi-scale feature fusion efficiency of the model, and the technology is promoted to span from a laboratory to practical application. Although the strip convolution can effectively reduce the parameter number and the deployment difficulty of the model, the prior art still has the defects: first, significant features and potential features have inadequate modeling capabilities. The salient region of the feature map corresponds to a high-matching visual mode of the target object, and is expressed as an activation peak, and is a core basis for identification and positioning. There are a large number of potential features around it that are not fully focused, implying context semantics, local structural cues, and texture transition information. As the encoder downsamples and iterates through the nonlinear activation function, these potential features are progressively compressed, normalized, or suppressed, forming a characterization isolation from the salient features. This isolation results in local information loss, impairing the modeling ability for object boundaries, fine deformations, and complex contexts, thereby affecting the performance of the model in fine-grained recognition and occlusion scenarios. Secondly, the association between different levels of features is not tight. Shallow features contain rich detailed information such as object edges and textures, and deep features bear high-level semantics. Classical multi-scale fusion methods generally fuse semantic information in the deep layer of a network, and lose matching relations of detail information corresponding to the semantic information. And the inference speed is reduced due to the fact that the fusion samples are sampled on the layer-by-layer decoding structure for multiple times, feature interaction is insufficient easily caused by independently processing the features of different layers and splicing the features, and the segmentation accuracy is affected. Disclosure of Invention The application provides a street maintenance segmentation method based on full-scale perception and pool associated attention, which solves the problems of insufficient object segmentation precision caused by insufficient association excavation of significant features and local potential features, insufficient fusion of different scale features and poor attention effect in the existing lightweight network, realizes sufficient fusion of different scale features, models the significant features and surrounding hidden features, improves the segmentation precision of a model, further enhances the understanding capability of the model on street scenes and improves the maintenance efficiency of facilities. The embodiment of the application provides a street maintenance segmentation method based on full-scale perception and pool associated attention, which comprises the following steps: S1, acquiring a street scene data set and preprocessing; S2, constructing a GAPA-Seg segmentation model, wherein the GAPA-Seg segmentation model comprises a feature extraction backbone network and a GAPA decoder, the feature extraction backbone network is used for extracting multi-scale features from an input image and inputting the multi-scale features into the GAPA decoder, the GAPA decoder is used for outputting a segmentation map according to the input multi-scale features, and the GAPA decoder comprises a full-scale perception module FSAM, a pool associated attention module PAAM and a double-solution wharf; s3, training a GAPA-Seg segmentation model by using the preprocessed data set; S4, segmenting the street image to be processed by using the trained GAPA-Seg segmentation model, and judging facility type abnormality. The street maintenance segmentation method has the beneficial effects that the decoder based on the full-scale sensing FSAM and the poo