CN-121999386-A - Small sample hyperspectral change detection method based on ground object category prompt

CN121999386ACN 121999386 ACN121999386 ACN 121999386ACN-121999386-A

Abstract

The invention discloses a small sample hyperspectral change detection method based on a ground object type prompt, which relates to the technical field of remote sensing image processing and comprises the following steps of S1, constructing the ground object type prompt and preprocessing hyperspectral images; S2, extracting semantic modal features corresponding to the feature class prompt and visual modal features corresponding to the preprocessed hyperspectral image, S3, fusing the semantic modal features and the visual modal features to obtain modified deep visual features, S4, obtaining multi-scale change features with enhanced attention by the multi-modal features, and S5, decoding the multi-scale change features with enhanced attention by the multi-modal features by a decoder of a hyperspectral change detection network to obtain a binary change graph. The invention can ensure that the network can still obtain reliable change discrimination capability when only a small amount of labeling samples are used, thereby reducing the dependence on large-scale hyperspectral manual labeling data or additional source domain data.

Inventors

LI CHAO
JING WEIPENG
Geng Linkang
YUAN YE
ZOU WEITAO
CHEN GUANGSHENG

Assignees

东北林业大学

Dates

Publication Date: 20260508
Application Date: 20260127

Claims (9)

1. The small sample hyperspectral change detection method based on the ground object category prompt is characterized by comprising the following steps of: s1, constructing a ground object category prompt, and preprocessing a hyperspectral image; S2, extracting semantic modal features corresponding to the ground object category prompts and visual modal features corresponding to the preprocessed hyperspectral images by utilizing a multi-modal feature extractor of the hyperspectral change detection network; S3, fusing semantic guidance feature and visual mode features by using a semantic guidance feature fusion module of the hyperspectral change detection network to obtain modified deep visual features; S4, based on the reconstructed deep visual features, obtaining multi-scale change features subjected to multi-modal feature attention enhancement by utilizing a change perception feature enhancement module of a hyperspectral change detection network; S5, decoding the multi-scale change characteristics with the multi-mode characteristic attention enhanced by using a decoder of the hyperspectral change detection network to obtain a binary change graph.
2. The method for detecting hyperspectral variation of small samples based on feature class cues according to claim 1, wherein in the step S1, the hyperspectral images are subjected to maximum and minimum normalization along the channel dimension, and patch is divided by taking each pixel as a center to obtain paired double-time-phase patch pairs, so that preprocessing is completed.
3. The method for detecting hyperspectral changes in small samples based on the prompt of the category of ground objects according to claim 1, wherein S2 comprises the following substeps: s21, converting the surface feature type prompt into tokens, and extracting semantic modal characteristics by using a text encoder; s22, extracting visual mode characteristics of the preprocessed hyperspectral image by using a visual encoder.
4. The method for detecting hyperspectral changes in small samples based on feature class cues according to claim 3, wherein in S22, the expression of the visual modality features is: ; ; Wherein, the Is shown in Time of day (time) The fused visual features in the individual dimensions, Represents a set of real numbers, The number of channels that represent a patch is, Indicating that the channel of the patch is high, Indicating the channel width of the patch, Representing the characteristics of the visual modality, Represent the first The fused visual features of the individual scales, Represent the first Original unfused visual features on the individual scales, Represent the first Original unfused visual features on the individual scales, The time of day is indicated as such, Indicating the moment before the change occurs, Indicating the moment after the occurrence of the change, A bi-linear interpolation is represented and, Representing the convolution kernel as Is used in the convolution operation of (1), Representing the convolution kernel as Is performed by the convolution operation of (a).
5. The method for detecting hyperspectral changes in small samples based on the prompt of the category of ground objects according to claim 1, wherein S3 comprises the following substeps: s31, calculating the similarity of the scaling dot product according to the semantic modal characteristics and the visual modal characteristics; S32, calculating KL divergence of the image patch on each semantic prompt; S33, calculating cross-modal context characteristics; S34, normalizing the zoom dot product similarity and the KL divergence of each semantic prompt by the image patch, and obtaining multi-modal characteristics according to the cross-modal context characteristics; s35, modifying the deepest visual features by utilizing the multi-mode features to obtain modified deep visual features.
6. The method for detecting hyperspectral changes in small samples based on feature class cues as recited in claim 5, characterized in that in S31, dot product similarity is scaled The expression of (2) is: ; Wherein, the The scale factor is represented as such, Representing the characteristics of the visual modality, Representing text modality features; In S32, the pair of image pads KL divergence of individual semantic cues The expression of (2) is: ; ; ; Wherein, the Represents the probability distribution of the visual features calculated by the Log Softmax, Representing the probability distribution of text features calculated by Softmax, Represents a set of real numbers, Representing the number of high similarity semantic cues selected, Representing the characteristics of the text modality, Representing the feature encoding dimension of the text encoder, The number of channels that represent a patch is, Representing a Softmax operation, Representing a Log Softmax operation; in the S33, cross-modal context feature The expression of (2) is: ; Wherein, the Representing the deepest visual features; In the S34, multi-modal features The expression of (2) is: ; ; Wherein, the The gate control coefficient is represented by a number of bits, Weights representing text modality features; in S35, the expression for modifying the deepest visual feature is: ; Wherein, the Representing a stitching operation.
7. The method for detecting hyperspectral changes in small samples based on feature class cues as claimed in claim 1, wherein said S4 comprises the substeps of: S41, extracting the spatial variation characteristics of the hyperspectral image after pretreatment; S42, performing discrete Fourier transform on the preprocessed hyperspectral image based on the reconstructed deep visual features to obtain multi-scale features; S43, filtering the multi-scale features to obtain filtered high-frequency features and low-frequency features, and determining corresponding high-frequency change features and low-frequency change features; S44, fusing the high-frequency change characteristic and the low-frequency change characteristic, and performing inverse discrete Fourier transform to obtain a frequency domain change characteristic; s45, calculating context change characteristics; S46, calculating semantic change characteristics; S47, splicing the spatial variation characteristics, the frequency domain variation characteristics, the context variation characteristics and the semantic variation characteristics to obtain comprehensive variation characteristics; s48, obtaining a guiding feature diagram containing semantic prior according to the comprehensive change features; s49, obtaining the multi-scale change feature with the multi-mode feature attention enhancement according to the guidance feature map containing the semantic prior.
8. The method for detecting hyperspectral changes in small samples based on feature class cues as recited in claim 7, characterized in that in S41, the feature of the spatial change The expression of (2) is: ; Wherein, the A non-linear mapping is represented and, Representation of The visual characteristics of the moment in time, Representation of The visual characteristics of the moment in time, Representing a feature scale; In the S43, the filtered low frequency characteristic The expression of (2) is: ; Wherein, the Representing the multiplication by element, A gaussian low-pass filter is represented and, Representing a multi-scale feature; In the S43, the filtered high frequency characteristic The expression of (2) is: ; In the S43, the high frequency variation characteristic The expression of (2) is: ; Wherein, the Representation of The high-frequency characteristics of the time of day, Representation of High frequency characteristics of time; in the S43, the low frequency variation characteristic The expression of (2) is: ; Wherein, the Representation of The low-frequency characteristic of the moment in time, Representation of High frequency characteristics of time; In S44, the expression for fusing the high-frequency variation feature and the low-frequency variation feature is: ; Wherein, the Representing the characteristics of the variation in the fused frequency domain, A set of parameters that can be learned is represented, The Sigmoid function is represented as a function, Representing a gating factor; In the S45, a context change feature The expression of (2) is: ; Wherein, the Representing the convolution kernel as Is used in the convolution operation of (1), Representing the convolution kernel as Is used in the convolution operation of (1), Represent the first The two-time phase visual characteristics of each scale are spliced according to the dimensions to obtain the characteristics, Represent the first The two-time phase visual characteristics of each scale are spliced according to the dimensions to obtain the characteristics, Representing bilinear interpolation; in the S46, semantically changing features The expression of (2) is: ; Wherein, the Indicating that the maximum value is taken; in the S48, a guidance feature map containing semantic priors is included The expression of (2) is: ; Wherein, the The number of channels that represent a patch is, Representing a Softmax operation, Representing linear projection of multi-modal features of dual phase to A result obtained in the dimension space; In the S49, the multi-scale variation feature with multi-modal feature attention enhancement The expression of (2) is: ; Wherein, the Representing the function of the ReLU activation, A linearly varying parameter representing the first full connection, Representing a linearly changing parameter of the second full connection.
9. The method for detecting hyperspectral change of small samples based on feature class cues according to claim 1, wherein in the step S5, multi-scale change features with enhanced attention of multi-mode features are spliced along the channel dimension to obtain change features, and the feature map is reformed by using a stacking block to obtain a binary change map.

Description

Small sample hyperspectral change detection method based on ground object category prompt Technical Field The invention relates to the technical field of remote sensing image processing, in particular to a small sample hyperspectral change detection method based on ground object category prompt. Background With the development of hyperspectral remote sensing technology, a hyperspectral change detection (HSI-CD) task can rapidly detect the change condition of different time-phase ground object types, and compared with RGB and multispectral images, the hyperspectral image has 150-250 wave bands, can effectively capture the abundant spectral response characteristics of ground objects, and is particularly good at capturing the ground object types with clear spectral characteristics such as vegetation, water body and the like. The traditional HSI-CD method uses algebraic operation and threshold segmentation means to detect a changed region and a non-changed region, wherein the method firstly carries out pixel-by-pixel computation on a double-phase hyperspectral image to obtain a segmentation threshold value by using an OTSU self-adaptive threshold algorithm, and a region with the spectral angle smaller than the threshold value is regarded as a non-changed region and a region larger than the threshold value is regarded as a changed region. In addition, there is an algorithm using clustering, which first performs band dimension reduction using PCA principal component analysis and calculates pixel-by-pixel interpolation, and then uses a K-Means clustering method to divide each pixel into two types of variation and non-variation. The deep learning-based method comprises the steps of firstly carrying out spectrum unmixing to obtain sub-pixel information, then constructing a mixed matrix, extracting features by using a Convolutional Neural Network (CNN), and detecting. With the development of deep learning, extraction and enhancement of changing features using more complex feature extractors or using a attentional mechanism has become mainstream, but the above method still faces the following problems: 1. Traditional threshold-dependent or clustered algorithms are extremely sensitive to the hyper-parameters of their algorithms, and it is difficult to migrate efficiently to different data, and simple CNN-based methods typically require spectral unmixing or pseudo-label generation, which can introduce unnecessary errors, resulting in error accumulation. 2. Methods based on complex feature extractors or attention mechanisms require a large number of reliable training samples, while hyperspectral data acquisition and labeling costs are high, and stable training of complex neural networks is difficult to support in small sample scenes. The small sample method requires extra ultra-high resolution images as a source domain for training, and the data cost is not really reduced. Disclosure of Invention The invention provides a small sample hyperspectral change detection method based on a ground object category prompt, which aims to solve the problems of high dependence of the existing deep learning method on hyperspectral annotation data or other additional data and error accumulation of the traditional multi-stage method. The technical scheme of the invention is that the small sample hyperspectral change detection method based on the ground object category prompt comprises the following steps: s1, constructing a ground object category prompt, and preprocessing a hyperspectral image; S2, extracting semantic modal features corresponding to the ground object category prompts and visual modal features corresponding to the preprocessed hyperspectral images by utilizing a multi-modal feature extractor of the hyperspectral change detection network; S3, fusing semantic guidance feature and visual mode features by using a semantic guidance feature fusion module of the hyperspectral change detection network to obtain modified deep visual features; S4, based on the reconstructed deep visual features, obtaining multi-scale change features subjected to multi-modal feature attention enhancement by utilizing a change perception feature enhancement module of a hyperspectral change detection network; S5, decoding the multi-scale change characteristics with the multi-mode characteristic attention enhanced by using a decoder of the hyperspectral change detection network to obtain a binary change graph. Further, in S1, the hyperspectral image is normalized to the maximum and minimum along the channel dimension, and the patch is divided with each pixel as the center, so as to obtain a paired dual-time-phase patch pair, and the preprocessing is completed. Further, S2 comprises the following sub-steps: s21, converting the surface feature type prompt into tokens, and extracting semantic modal characteristics by using a text encoder; s22, extracting visual mode characteristics of the preprocessed hyperspectral image by using a visual encoder. Further