CN-122023830-A - Post-fusion feature enhancement decoding method for semantic segmentation of remote sensing image
Abstract
The invention discloses a post-fusion feature enhancement decoding method for semantic segmentation of a remote sensing image, and belongs to the technical field of intelligent interpretation and computer vision semantic segmentation of the remote sensing image. The method comprises the steps of inputting remote sensing images, extracting multi-scale features through a transducer encoder, carrying out channel unified mapping on the multi-scale features through a decoder to unify channel dimensions, carrying out up-sampling operation on the features subjected to S3 channel unified mapping, splicing output results of S4 in the channel dimensions, obtaining fusion features through fusion convolution, obtaining enhancement features through local modeling, channel recalibration and residual fusion by a fusion feature enhancement module after construction, and carrying out classification prediction output segmentation logits on the enhancement features input into a classification layer. According to the method, a post-fusion enhancement module is introduced between the fusion convolution and the classification layer, so that the decoder has the post-fusion local refinement capability, and the boundary and small target segmentation quality can be improved.
Inventors
- CHEN GANG
- LIAN DONGJIE
- CHEN YILEI
- LI ZHIHAO
- WU BIAO
- XIE SIYU
- LIU ZONGYI
- Shang Lihao
- DONG QUANYI
Assignees
- 南京大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260408
Claims (8)
- 1. The post-fusion feature enhancement decoding method for semantic segmentation of the remote sensing image is characterized by comprising the following steps of: S1, inputting a remote sensing image; S2, extracting multi-scale features through a transducer encoder ; S3, channel unified mapping is carried out on each scale feature through a decoder so as to unify channel dimensions; s4, carrying out up-sampling operation on the features subjected to the unified mapping of the S3 channels, and aligning up-sampling to the same spatial scale; s5, splicing the output result of the S4 in the channel dimension, and obtaining fusion characteristics through fusion convolution; s6, a built fusion characteristic enhancement module obtains enhancement characteristics through local modeling, channel recalibration and residual fusion; and S7, inputting the enhanced features into a classification layer for classification prediction, and outputting segmentation logits.
- 2. The post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images according to claim 1, wherein in S1, specific implementation contents are: The remote sensing image is a satellite remote sensing image or an aerial remote sensing image, the remote sensing image is stored in a grid mode and comprises a plurality of spectrum bands, the spectrum bands comprise visible light bands and near infrared bands, and the remote sensing image is derived from public remote sensing data, commercial remote sensing data and self-collected remote sensing data.
- 3. The post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images according to claim 1, wherein in S3, specific implementation contents are: Performing channel unified mapping on each scale feature to ensure that the channel number is unified as Setting the first The characteristic diagram of each scale is Then (1) The feature map obtained by uniformly mapping the feature map of each scale through the channels is that The calculation process is as follows: ; Wherein, the An index representing feature scales/levels; Representation of And (5) convolution operation.
- 4. The post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images according to claim 1, wherein in S4, specific implementation contents are: Will be the first Feature map obtained by uniformly mapping feature maps of individual scales through channels Upsampling to target spatial dimensions Obtaining alignment features The calculation process is as follows: ; Wherein, the Representing an upsampling operation; A height representing a target space size; Representing the width of the target space dimension.
- 5. The post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images according to claim 1, wherein in S5, specific implementation contents are: Alignment feature after S4 processing Splicing along the channel dimension to obtain splicing characteristics The calculation process is as follows: ; Wherein, the Representing a splice operation along a channel dimension; subsequently, for splice features Application of Fusion convolution to perform channel fusion to obtain fusion characteristics The calculation process is as follows: ; Wherein, the Representation of And (5) convolution operation.
- 6. The post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images according to claim 1, wherein in S6, specific implementation contents are: S601, channel expansion by Convolution routes a channel from Expanded to Obtaining an intermediate feature diagram after channel expansion ; S602, local space mixing, namely expanding the channel to obtain an intermediate characteristic diagram Application of Deep convolution is carried out to obtain an intermediate feature map after local space modeling Capturing local boundaries and texture contexts; s603, mixing channels, namely modeling the local space to obtain an intermediate feature map Application of Point-by-point convolution to obtain an intermediate feature map after channel mixing Cross-channel semantic interaction is realized; S604, channel reduction by Convolution mixes the intermediate feature map of the channel Lowering back A channel, obtaining an intermediate characteristic diagram after channel reduction ; S605, channel recalibration, namely, intermediate feature diagram after channel reduction Applying the channel attention to obtain a weighted characteristic diagram after channel recalibration For adaptively emphasizing channels with higher information content; S606, residual error fusion, output ; Wherein, the Representing an enhancement feature map output by the post-fusion feature enhancement module; Representing a fusion feature; And (5) representing a weighted characteristic diagram after channel recalibration.
- 7. The post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images according to claim 6, wherein in S605, the channel attention module is an efficient channel attention ECA module, specifically comprising global pooling, one-dimensional convolution, sigmoid mapping, channel weighting; intermediate feature map after channel reduction Carrying out global average pooling to obtain a channel description vector, carrying out one-dimensional convolution operation on the channel description vector, obtaining channel weight through Sigmoid function, and carrying out intermediate feature graph after channel restoration based on the channel weight Weighting channel by channel to obtain a weighted feature map after channel recalibration 。
- 8. The post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images as claimed in claim 7, wherein the channel restored intermediate feature map is The specific calculation process for obtaining the channel description vector by global average pooling comprises the following steps: ; Wherein, the Representing intermediate feature maps after channel restoration Is the first of (2) Carrying out global average pooling on all spatial positions on each channel to obtain channel description values; And Respectively represent intermediate feature graphs after channel reduction Height and width in the spatial dimension; representing the intermediate feature diagram after channel reduction; Representing a channel restored intermediate feature map Is used for the channel index of (a), Representing a channel restored intermediate feature map The position index in the height direction, Representing a channel restored intermediate feature map Position index in the width direction.
Description
Post-fusion feature enhancement decoding method for semantic segmentation of remote sensing image Technical Field The invention belongs to the technical field of remote sensing image intelligent interpretation and computer vision semantic segmentation, and particularly relates to a post-fusion feature enhancement decoding method for remote sensing image semantic segmentation. Background The semantic segmentation task aims at classifying each pixel in the image, and can be used for land utilization/coverage drawing, city element extraction, change monitoring and the like in a remote sensing scene. Because the high-resolution remote sensing image generally has the conditions of large target scale difference, dense distribution of small targets, complex ground object boundaries, strong background texture interference and the like, the existing segmentation method is easy to generate boundary blurring, slender structure fracture and local false segmentation in areas such as roads, building edges, water boundaries and the like. In the existing remote sensing image semantic segmentation method, a technical route based on a hierarchical encoder and a lightweight decoder is already applied. Taking SegFormer methods as an example, segFormer and other lightweight decoders generally adopt All-MLP type linear projection and upsampling splicing fusion, namely, carrying out 1X 1 projection unified channels on each scale feature, upsampling to the same scale, splicing, carrying out 1X 1 convolution fusion, and then directly carrying out pixel-level classification. In the decoding stage, the channels of the features with different scales are unified, then the features with different scales are up-sampled to the same spatial resolution, and then are spliced and fused, and the segmentation result is directly output based on the fusion result. The method can realize multi-scale feature aggregation, but the fused features generally directly enter a classification layer, and further enhancement processing on the fusion result is absent, so that the utilization of local spatial neighborhood relation, boundary detail information and cross-channel semantic interaction is still limited. Further, since the channel unification and the splicing fusion mainly finish scale alignment and information aggregation, and it is difficult to fully coordinate high-level semantic information and low-level space detail information at the same time, the problem of insufficient feature expression easily occurs in a texture complex region, a small target region and a boundary region. In view of the above problems, if a more complex convolutional stacking or enhancing module is directly added at the decoding end, the number of parameters and the amount of calculation may be increased, thereby affecting the application requirement of the lightweight decoding structure. Therefore, how to further enhance the features after multi-scale fusion on the premise of excessively increasing the computational complexity as much as possible so as to improve the local detail expression and semantic interaction capability is a technical problem to be solved in the prior art. Disclosure of Invention Aiming at the defects of the existing method in the background art, the invention provides a post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images, and a post-fusion enhancement module is introduced between a fusion convolution layer and a classification layer, so that a decoder has the capability of locally refining after fusion, the segmentation quality of boundaries and small targets can be improved, and the expression capability of multi-scale fused features on local space details (especially the boundary and the small targets) is enhanced. The technical scheme adopted by the invention is as follows in order to solve the technical problems: a post-fusion feature enhancement decoding method for semantic segmentation of remote sensing images comprises the following steps: S1, inputting a remote sensing image; S2, extracting multi-scale features through a transducer encoder ; S3, channel unified mapping is carried out on each scale feature through a decoder so as to unify channel dimensions; s4, carrying out up-sampling operation on the features subjected to the unified mapping of the S3 channels, and aligning up-sampling to the same spatial scale; s5, splicing the output result of the S4 in the channel dimension, and obtaining fusion characteristics through fusion convolution; s6, a built fusion characteristic enhancement module obtains enhancement characteristics through local modeling, channel recalibration and residual fusion; and S7, inputting the enhanced features into a classification layer for classification prediction, and outputting segmentation logits. Preferably, in S1, the specific implementation contents are: The remote sensing image is a satellite remote sensing image or an aerial remote sensing image, the remote