CN-121982515-A - Remote sensing image water body extraction method and system based on RGB+X data visual angle

CN121982515ACN 121982515 ACN121982515 ACN 121982515ACN-121982515-A

Abstract

The invention discloses a remote sensing image water body extraction method and a remote sensing image water body extraction system based on RGB+X data visual angles, which relate to the technical field of computer vision and remote sensing, and mainly comprise the following steps: and extracting an X-mode characteristic image according to red, green, blue and near infrared bands of the remote sensing image, extracting multi-mode, multi-scale and global characteristics by utilizing a dual-complexity backbone network and a hybrid fusion module, obtaining a water body segmentation result by utilizing a multi-scale MLP decoder, and up-sampling to obtain a water body extraction result with original image resolution. By implementing the remote sensing image water body extraction method and system based on the RGB+X data visual angle, the accuracy and generalization capability of water body extraction can be improved.

Inventors

ZHANG WANGLE
YE YAN
YU TING
Shen Laiyin
SHEN YIMING
Yang Zhechen
MENG YAN

Assignees

湖北省水利水电科学研究院
湖北省水利经济管理办公室
湖北大学

Dates

Publication Date: 20260505
Application Date: 20251215

Claims (10)

1. A remote sensing image water body extraction method and system based on RGB+X data visual angles are characterized by comprising the following steps: S1, extracting an X-mode characteristic image according to red, green, blue and near infrared bands of a remote sensing image; S2, extracting multi-mode, multi-scale and global features by utilizing a dual-complexity backbone network and a hybrid fusion module according to the X-mode feature image and the RGB-mode image; s3, based on the multi-mode, multi-scale and global features, obtaining a water body segmentation result by using a multi-scale MLP decoder; and S4, up-sampling the water body segmentation result to obtain a water body extraction result with original image resolution.
2. The method and system for extracting a remote sensing image water body based on an rgb+x data perspective according to claim 1, wherein the dual-complexity backbone network is configured to: Performing feature extraction on the RGB modal image by using a first-layer Swin Transformer module, performing feature extraction on the X modal feature image by using a first-layer convolutional neural network module, and performing feature fusion on the output of the first-layer Swin Transformer module and the output of the first-layer convolutional neural network module by using a first-layer feature fusion module to obtain a first-layer fused multi-modal feature; The method comprises the steps of performing feature extraction on output of a first-layer Swin Transformer module by using a second-layer Swin Transformer module, performing feature extraction on the first-layer fused multi-modal features by using a second-layer convolutional neural network module, and performing feature fusion on the output of the second-layer Swin Transformer module and the output of the second-layer convolutional neural network module by using a second-layer feature fusion module to obtain second-layer fused multi-modal features; The method comprises the steps of performing feature extraction on output of a second-layer Swin Transformer module by using a third-layer Swin Transformer module, performing feature extraction on the multi-modal features fused on the second layer by using a third-layer convolutional neural network module, and performing feature fusion on the output of the third-layer Swin Transformer module and the output of the third-layer convolutional neural network module by using a third-layer feature fusion module to obtain the multi-modal features fused on the third layer; and performing feature extraction on the output of the third-layer Swin Transformer module by using a fourth-layer Swin Transformer module, performing feature extraction on the multi-modal features fused on the third layer by using a fourth-layer convolutional neural network module, and performing feature fusion on the output of the fourth-layer Swin Transformer module and the output of the fourth-layer convolutional neural network module by using a fourth-layer feature fusion module to obtain the multi-modal features fused on the fourth layer.
3. The method and system for extracting a remote sensing image water body based on an rgb+x data view angle according to claim 2, wherein the feature fusion module is a hybrid fusion module combining CNN fusion and cross attention mechanism fusion, and specifically configured to: Performing convolution type fusion on the RGB modal characteristics and the X modal characteristics by using a CNN fusion module to obtain characteristics F; performing attention mechanism-based cross fusion on the RGB modal characteristics and the X modal characteristics by using a cross attention fusion module to obtain the RGB modal characteristics and the X modal characteristics after cross fusion; The characteristic F and the RGB modal characteristic after cross fusion are spliced and fused, and a new RGB modal characteristic is obtained through convolution; and (3) splicing and fusing the feature F and the X-mode feature subjected to cross fusion, and convolving to obtain a new X-mode feature.
4. The remote sensing image water body extraction method and system based on the rgb+x data perspective of claim 3, wherein the CNN fusion module is configured to: Splicing the RGB modal characteristics and the X modal characteristics to obtain spliced characteristics, and carrying out convolution, reLU function activation, convolution again and Sigmoid function activation on the spliced characteristics to obtain a weight matrix W1 of an RGB branch and a weight matrix W2 of an X branch; Multiplying the RGB modal characteristics and the X modal characteristics with the RGB branched weight matrix W1 and the X branched weight matrix W2 pixel by pixel to obtain two new matrixes, and adding the two new matrixes pixel by pixel to obtain the characteristic F.
5. The method and system for extracting a water body from a remote sensing image based on an rgb+x data perspective of claim 3, wherein the cross-attention fusion module is configured to: Obtaining query Q_RGB, key K_RGB and value V_RGB according to the RGB mode characteristics, and obtaining query Q_X, key K_X and value V_X according to the X mode characteristics; According to the key K_X, the value V_X and the query Q_RGB, RGB modal characteristics after cross fusion are obtained; and obtaining the X mode characteristics after cross fusion according to the key K_RGB, the value V_RGB and the query Q_X.
6. The remote sensing image water body extraction system based on RGB+X data viewing angles is characterized by comprising the following subsystems: The X-mode characteristic subsystem is configured to extract an X-mode characteristic image according to red, green, blue and near infrared bands of the remote sensing image; The multi-mode feature extraction subsystem is configured to extract multi-mode, multi-scale and global features by utilizing a dual-complexity backbone network and a hybrid fusion module according to the X-mode feature image and the RGB-mode image; The water body segmentation subsystem is configured to obtain a water body segmentation result by utilizing a multi-scale MLP decoder based on the multi-mode, multi-scale and global features; and the water body extraction up-sampling subsystem is configured to up-sample the water body segmentation result to obtain a water body extraction result with original image resolution.
7. The remote sensing image water extraction system based on rgb+x data perspective of claim 6, wherein the dual complexity backbone network is configured to: Performing feature extraction on the RGB modal image by using a first-layer Swin Transformer module, performing feature extraction on the X modal feature image by using a first-layer convolutional neural network module, and performing feature fusion on the output of the first-layer Swin Transformer module and the output of the first-layer convolutional neural network module by using a first-layer feature fusion module to obtain a first-layer fused multi-modal feature; The method comprises the steps of performing feature extraction on output of a first-layer Swin Transformer module by using a second-layer Swin Transformer module, performing feature extraction on the first-layer fused multi-modal features by using a second-layer convolutional neural network module, and performing feature fusion on the output of the second-layer Swin Transformer module and the output of the second-layer convolutional neural network module by using a second-layer feature fusion module to obtain second-layer fused multi-modal features; The method comprises the steps of performing feature extraction on output of a second-layer Swin Transformer module by using a third-layer Swin Transformer module, performing feature extraction on the multi-modal features fused on the second layer by using a third-layer convolutional neural network module, and performing feature fusion on the output of the third-layer Swin Transformer module and the output of the third-layer convolutional neural network module by using a third-layer feature fusion module to obtain the multi-modal features fused on the third layer; and performing feature extraction on the output of the third-layer Swin Transformer module by using a fourth-layer Swin Transformer module, performing feature extraction on the multi-modal features fused on the third layer by using a fourth-layer convolutional neural network module, and performing feature fusion on the output of the fourth-layer Swin Transformer module and the output of the fourth-layer convolutional neural network module by using a fourth-layer feature fusion module to obtain the multi-modal features fused on the fourth layer.
8. The remote sensing image water body extraction system based on the RGB+X data viewing angle of claim 7, wherein the feature fusion module is a hybrid fusion module combining CNN fusion and cross-attention mechanism fusion, and is specifically configured to: Performing convolution type fusion on the RGB modal characteristics and the X modal characteristics by using a CNN fusion module to obtain characteristics F; performing attention mechanism-based cross fusion on the RGB modal characteristics and the X modal characteristics by using a cross attention fusion module to obtain the RGB modal characteristics and the X modal characteristics after cross fusion; The characteristic F and the RGB modal characteristic after cross fusion are spliced and fused, and a new RGB modal characteristic is obtained through convolution; and (3) splicing and fusing the feature F and the X-mode feature subjected to cross fusion, and convolving to obtain a new X-mode feature.
9. The remote sensing image water extraction system based on rgb+x data perspective of claim 8, wherein the CNN fusion module is configured to: Splicing the RGB modal characteristics and the X modal characteristics to obtain spliced characteristics, and carrying out convolution, reLU function activation, convolution again and Sigmoid function activation on the spliced characteristics to obtain a weight matrix W1 of an RGB branch and a weight matrix W2 of an X branch; Multiplying the RGB modal characteristics and the X modal characteristics with the RGB branched weight matrix W1 and the X branched weight matrix W2 pixel by pixel to obtain two new matrixes, and adding the two new matrixes pixel by pixel to obtain the characteristic F.
10. The remote sensing image water extraction system based on rgb+x data perspective of claim 8, wherein the cross-attention fusion module is configured to: Obtaining query Q_RGB, key K_RGB and value V_RGB according to the RGB mode characteristics, and obtaining query Q_X, key K_X and value V_X according to the X mode characteristics; According to the key K_X, the value V_X and the query Q_RGB, RGB modal characteristics after cross fusion are obtained; and obtaining the X mode characteristics after cross fusion according to the key K_RGB, the value V_RGB and the query Q_X.

Description

Remote sensing image water body extraction method and system based on RGB+X data visual angle Technical Field The invention relates to the technical field of computer vision and remote sensing, in particular to a remote sensing image water body extraction method and system based on RGB+X data viewing angles. Background The remote sensing image water body extraction has important significance for water resource investigation, land utilization planning, ecological protection, flood detection and other applications. The water extraction is intended to distinguish whether each pixel in the image is a water body, or a non-water body. There are various data sources available for water extraction, such as aviation RGB images, SAR images, multispectral remote sensing satellite images, and hyperspectral remote sensing images. The present invention focuses on common multispectral remote sensing satellite images with four bands of red, green, blue and near infrared, because for common satellite images there is typically near infrared band in addition to RGB band. Compared with RGB remote sensing images, when the satellite images with four red, green, blue and near infrared bands are used for water extraction, besides the near infrared bands, the water index (such as normalized differential water index NDWI) can be calculated based on the four red, green, blue and near infrared bands. Many treatment methods achieve extraction of the body of water by binarizing NDWI. However, the binarization threshold value required for different water bodies or even different parts of the same water body may be different under the influence of the difference in depth of the water body, the difference in water body composition, the difference in imaging conditions, the difference in image acquisition equipment, and the like. Therefore, the fixed threshold is adopted to binarize the water body index, and the water body extraction effect is often limited. The common approach to adaptively thresholding Otsu algorithm （Otsu, N., 1979. A threshold selection method from gray-level histogram, IEEE Transactions on Systems, Man, and Cybernetics, pp. 62-66.） has been demonstrated in other fields to tend to make the ratio of the binarized negative and positive classes (bodies of water) similar. In practice, the proportion of the water in the remote sensing image may vary greatly. In addition, other features (e.g., buildings and shadows) often present in NDWI images have spectral characteristics that are highly similar to the body of water, which further results in limited effectiveness of the water index-based method. Deep learning has made significant progress in the field of remote sensing image segmentation in recent years extraction of water regions from remote sensing images is essentially a binary semantic segmentation. Therefore, the semantic segmentation method based on deep learning gradually becomes a mainstream method for extracting the water body of the remote sensing image. The algorithm can better utilize the context information in the image, so that the ground features of the water body similar to other spectrums can be better distinguished. However, the existing algorithm mainly focuses on the condition that the data source is an RGB remote sensing image. Even when processing a remote sensing image with more wavebands, such as a red, green, blue and near infrared four-wavebands image, the existing algorithm only increases the input wavebands of the neural network, and the processing mode is the same as that when processing an RGB image, or three most suitable combinations are selected from the four wavebands, and then the processing is performed according to the processing mode of the RGB image, so that the method is unfavorable for effectively and fully utilizing the information of other wavebands except the red, green and blue wavebands. The field of rgb+x data, in particular RGB-D (D representing depth) and RGB-Thermal (Thermal representing Thermal infrared band) image processing, has now been greatly developed. The near infrared band and the band or index calculated by combining red, green, blue and near infrared four bands are very similar to the X part of RGB+X data, such as the water body index, vegetation index and shadow index. However, in the existing research, there is little work of extracting the water body with high precision by feature fusion from the perspective of rgb+x data, which results in that the existing water body extraction algorithm does not fully utilize the information in the satellite image, and further results in limited generalization performance. Therefore, how to design a neural network for satellite images with four red, green, blue and near infrared bands from the perspective of rgb+x data processing to improve the generalization performance of the water extraction algorithm is a technical problem to be solved. Disclosure of Invention The invention aims to provide a remote sensing image water bod