CN-121999216-A - Remote sensing image segmentation method based on light vision scanning and frequency domain discrimination feedforward
Abstract
The invention discloses a remote sensing image segmentation method based on light vision scanning and frequency domain discrimination feedforward. And constructing a remote sensing image segmentation network adopting a multi-scale coding-decoding architecture, and embedding a shortcut visual scanning step and a frequency-selective feedforward step which are sequentially executed in a coding path and a decoding path. The method comprises a step of carrying out long-distance dependence modeling on input features based on geometric transformation and selective scanning, and fusing input and scanning results through residual connection, and a step of carrying out multi-scale convolution, local segmentation, frequency domain screening and recombination on the obtained features by a step of frequency-selective feedforward, so as to realize feature enhancement. The steps process the multi-scale features step by step, and finally generate pixel level division results based on the features output by the decoding paths. The method effectively improves the long-distance structure modeling capability and detail retaining effect in the remote sensing image while keeping light calculation, and is suitable for high-resolution remote sensing image segmentation in complex scenes.
Inventors
- LI GUANQUN
Assignees
- 耕宇牧星(北京)空间科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260120
Claims (10)
- 1. A remote sensing image segmentation method based on light vision scanning and frequency domain discrimination feedforward is characterized by comprising the following steps: Acquiring a remote sensing image to be segmented, inputting the remote sensing image into a remote sensing image segmentation network, wherein the remote sensing image segmentation network adopts a multi-scale coding-decoding architecture, and embeds a feature enhancement processing mechanism into at least one feature extraction branch in a multi-scale feature coding path and at least one feature reconstruction branch in a multi-scale feature decoding path of the architecture; The characteristic enhancement processing mechanism comprises a light visual scanning step and a frequency-selective feedforward step which are sequentially executed, wherein, The method comprises the steps of performing long-distance dependent modeling on input features based on a geometric transformation strategy and a selective scanning mechanism, and fusing a selective scanning result with the input features through residual connection to obtain first enhancement features, wherein the input features comprise features extracted from the remote sensing image or second enhancement features output by a last feature extraction branch/feature reconstruction branch; The frequency-selective feedforward step comprises the steps of carrying out multi-scale convolution processing on the first enhancement feature, dividing the first enhancement feature into local feature blocks through a sliding window, then carrying out frequency domain screening and inverse transformation, and carrying out recombination fusion on the corresponding local feature blocks after the inverse transformation to obtain the second enhancement feature, wherein the second enhancement feature is used as the input feature of a next feature extraction branch/feature reconstruction branch or used as the final feature to generate a pixel-level remote sensing image dividing result.
- 2. The remote sensing image segmentation method based on the light vision scanning and the frequency domain discrimination feedforward of claim 1, wherein the light vision scanning step specifically comprises: s21, normalizing and geometrically transforming the input features; S22, performing linear projection on the transformed features, and performing feature modeling through sequence operation comprising depth separable convolution and nonlinear activation; S23, scanning the modeled features by using a selective scanning mechanism to generate a first weight map; And S24, weighting the features subjected to the linear projection in S22 by utilizing the first weight mapping, and carrying out residual connection on the weighted result and the input features to output first enhancement features.
- 3. The remote sensing image segmentation method based on the light vision scanning and the frequency domain discrimination feedforward of claim 2 is characterized in that in the step S22, after linear projection is carried out on the transformed features, the transformed features sequentially pass through a first depth separable convolution layer, a GELU activation function, a linear layer and a second depth separable convolution layer, wherein the first depth separable convolution layer and the GELU activation function are used for extracting local texture and cross-region context information, and the linear layer and the second depth separable convolution layer are used for integrating global features to obtain modeled features.
- 4. The remote sensing image segmentation method based on the light vision scanning and the frequency domain discrimination feedforward of claim 1, wherein the frequency-selective feedforward step specifically includes: S31, sequentially carrying out normalization, first expansion convolution and third depth separable convolution on the first enhancement feature to obtain a multi-scale perception feature; S32, performing nonlinear activation on the multi-scale sensing characteristics to obtain a second weight mapping after nonlinear enhancement, and weighting the multi-scale sensing characteristics by using the second weight mapping; S33, performing second expansion convolution processing on the weighted features, and dividing the processed features into a plurality of local feature blocks; S34, after performing fast Fourier transform on each local feature block in sequence, multiplying the local feature blocks by a learnable parameter matrix to perform frequency domain screening, and obtaining a local feature block with enhanced frequency domain by the fast Fourier transform of the screened result; S35, recombining and splicing all the frequency domain enhanced local feature blocks, and outputting second enhanced features.
- 5. The remote sensing image segmentation method based on the light vision scanning and the frequency domain discrimination feedforward of claim 1, wherein in the multi-scale feature coding path, after the remote sensing image extracts shallow features through a convolution layer, a feature enhancement processing mechanism is executed, and the output second enhancement features sequentially pass through N feature extraction branches, and each feature extraction branch is followed by executing the feature enhancement processing mechanism once.
- 6. The remote sensing image segmentation method based on the light vision scanning and the frequency domain discrimination feedforward of claim 1, wherein in the multi-scale feature decoding path, the output result of the multi-scale feature encoding path sequentially passes through N feature reconstruction branches, and a feature enhancement processing mechanism is executed after each feature reconstruction branch.
- 7. The method of claim 1, wherein the geometric transformation strategy in the step of the shortcut vision scanning comprises at least one of transposition, horizontal inversion or vertical inversion of the feature map.
- 8. The remote sensing image segmentation method based on the light vision scanning and the frequency domain discrimination feedforward of claim 1, wherein the selective scanning mechanism in the light vision scanning step is an SSM-S6 mechanism.
- 9. The remote sensing image segmentation method based on the light vision scanning and the frequency domain discrimination feedforward as claimed in claim 1, wherein the final features are input to a classification head to generate a pixel-level classification probability map as a segmentation result.
- 10. The method for remote sensing image segmentation based on light vision scanning and frequency domain discrimination feedforward of claim 1, further comprising a training step of employing a composite loss function that is a weighted sum of cross entropy loss and Dice loss.
Description
Remote sensing image segmentation method based on light vision scanning and frequency domain discrimination feedforward Technical Field The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image segmentation method based on light vision scanning and frequency domain discrimination feedforward. Background The remote sensing image is used as an important source for acquiring ground surface information, and is widely applied to the fields of urban planning, land utilization monitoring, environmental protection, disaster assessment, military reconnaissance and the like. The remote sensing image segmentation is one of basic tasks, and aims to finely divide different ground objects (such as buildings, roads, farmlands, water bodies, woodlands and the like). However, due to the complexity and diversity of remote sensing data, the existing image segmentation method still faces many challenges in practical application. Firstly, the remote sensing image often has the characteristics of high resolution, large scale span, complex category and the like. Different sensors and imaging conditions can cause spectral differences and noise interference, so that the same type of ground object presents obvious differences in different scenes. For example, buildings are in regular rectangular distribution in urban central areas, and may be in scattered form in suburban areas, and spectral reflection characteristics of farmlands vary significantly in different seasons. The problems that the traditional segmentation method based on the Convolutional Neural Network (CNN) is unstable in cross-scale and cross-scene and boundary blurring and fine-granularity target omission easily occur are caused by the conditions. Secondly, long-distance dependence and a complex topological structure exist in the remote sensing image. For example, road networks are crisscrossed, rivers and mountains have large-scale ductility, and building clusters are often distributed in slices. The traditional convolution operator is limited by a local receptive field, so that the cross-regional semantic relation is difficult to fully capture, and although a Self-Attention mechanism (Self-Attention) can model long-distance dependence, the quadratic complexity of the convolution operator can cause huge calculation and display memory expenditure on a large-format remote sensing image, so that the convolution operator is unfavorable for actual deployment. Again, remote sensing image segmentation also faces the difficult problem of small target recognition and class imbalance. In a large number of scenes, targets such as roads, vehicles, isolated buildings and the like occupy only a small proportion of pixels, and large-area ground features such as farmlands, woodlands and the like are dominant. Conventional penalty functions tend to be largely generic in the training process, ignoring the boundary and semantic details of small objects, resulting in a segmentation result that lacks integrity and robustness. Therefore, how to provide a remote sensing image segmentation method capable of improving the recognition accuracy of small targets and the boundary characterization effect while reducing the calculation cost is a technical problem that needs to be solved by those skilled in the art. Disclosure of Invention In view of the above problems, the invention provides a remote sensing image segmentation method based on light vision scanning and frequency domain discrimination feedforward to overcome the problems or at least partially solve the problems, and by introducing a light vision scanning unit in a space domain and introducing a frequency-end optimal feedforward unit in a frequency domain, a space-frequency cooperative enhancement mechanism is established, so that the problems of insufficient long-distance dependence, overlarge calculation overhead of an attention mechanism and the like of a traditional CNN model are solved, obvious advantages are presented in the aspects of small target identification and boundary characterization, and a new efficient, fine and robust scheme is provided for remote sensing image segmentation tasks. In order to achieve the above purpose, the present invention adopts the following technical scheme: The embodiment of the invention provides a remote sensing image segmentation method based on light vision scanning and frequency domain discrimination feedforward, which comprises the following steps: Acquiring a remote sensing image to be segmented, inputting the remote sensing image into a remote sensing image segmentation network, wherein the remote sensing image segmentation network adopts a multi-scale coding-decoding architecture, and embeds a feature enhancement processing mechanism into at least one feature extraction branch in a multi-scale feature coding path and at least one feature reconstruction branch in a multi-scale feature decoding path of the architecture; The