CN-121837940-B - Cross-domain image classification based on self-supervision spectrum space modeling

CN121837940BCN 121837940 BCN121837940 BCN 121837940BCN-121837940-B

Abstract

The invention relates to the field of image processing, in particular to cross-domain image classification based on self-supervision spectral space modeling, which comprises the following steps of S1, acquiring a source domain hyperspectral image, preprocessing the source domain hyperspectral image, S2, pre-training a space-spectrum transducer model by adopting the preprocessed source domain hyperspectral image to acquire a trained space-spectrum transducer model, S3, constructing a student model which is identical in structure and can update parameters by adopting the trained space-spectrum transducer model as a teacher characteristic network, introducing a diffusion alignment fine-tuning distillation mechanism, training the student model to acquire a trained student model, S4, preprocessing a target domain HSI image block, inputting the preprocessed target domain HSI image block into the trained student model for image classification, classifying the HSI image block based on a characteristic representation corresponding to class mark fine tuning in a model output sequence, and relieving semantic drift by diffusion alignment distillation.

Inventors

SHEN HUIFANG
LV TIANHUA
CHAO JIANSHU
Lai Jiahua
YUAN JIANYA

Assignees

泉州装备制造研究所
中国科学院福建物质结构研究所

Dates

Publication Date: 20260512
Application Date: 20260313

Claims (5)

1. The cross-domain image classification based on self-supervision spectrum space modeling is characterized by comprising the following steps of: S1, in a data preprocessing stage, acquiring a source domain hyperspectral image, performing dimensionality reduction and serialization processing on the source domain hyperspectral image by adopting principal component analysis, respectively constructing a spatial mask feature and a spectral feature, and adding noise disturbance based on a Markov chain to the spatial mask feature; S2, in a pre-training stage, pre-training a space-spectrum converter model by adopting the pre-processed source domain hyperspectral image to obtain a trained space-spectrum converter model; The space-spectrum converter model adopts a four-layer conditional converter encoder and a four-layer converter decoder, the first two layers of the encoder adopt parallel space-spectrum double branches which respectively capture local space background and spectrum distribution characteristics, the middle layer of the encoder adopts a bidirectional cross attention mechanism to realize depth cross-mode complementation, fusion characteristics are obtained, the last two layers of the encoder extract characteristics of the fusion characteristics to obtain modeling characteristics, and the converter decoder carries out denoising reconstruction on noise characteristics of the modeling characteristics; S3, in the fine tuning stage, a trained space-spectrum converter model is adopted as a teacher characteristic network, a student model with the same structure and updatable parameters is constructed, a diffusion alignment fine tuning distillation mechanism is introduced, and the student model is trained to obtain a trained student model; S4, in the testing stage, preprocessing the target domain HSI image blocks, inputting the preprocessed target domain HSI image blocks into a trained student model for image classification processing, and classifying the HSI image blocks based on characteristic representations corresponding to class marks in a model output sequence.
2. The cross-domain image classification based on self-supervised spectral space modeling as defined in claim 1, wherein in step S1, the principal component analysis is adopted for dimension reduction and serialization processing to obtain an image Image is formed Divided into Image blocks of size, which are rearranged into a marking sequence For small-batch training, wherein, 、 And Respectively represent Is defined by the height, width and spectral dimensions of (c), The size of the batch is indicated and, Representing the number of marks, each mark Corresponding to one Is used for the image blocks of the (c), Representing the matrix transpose.
3. The cross-domain image classification based on self-supervised spectral spatial modeling as set forth in claim 2, wherein in step S2, the spatial signature sequence of the spatio-spectral transducer model is constructed by masking each marker in the marker sequence according to a predetermined mask scale, dividing the marker sequence into visible portions And mask part Encoding diffusion time And a spatial embedding matrix Added to the visible part In (1) obtaining a spatial signature sequence , And Respectively representing the number of visual marks and the feature dimension; Spatial signature sequences Performing hierarchical feature transformation by two conditional transform encoders to obtain spatial marker ; Spectral signature sequence construction, processing the signature sequence along the spectral dimension using one-dimensional convolution To extract spectral features Spectral features of each mark by linear projection Mapping to target feature space to obtain projection spectrum feature Encoding diffusion time Added to the projected spectral features In (1) obtaining a sequence of spectral features The spectral feature sequence After transformation by two conditional transducers, the spectrum mark is obtained ; The method adopts a bidirectional cross attention mechanism to perform double cross and cross mode fusion, and the specific operation steps of the spectrum-space channel are as follows: Calculating the spectral signature Average value of (2) : ; Forming a cross-attention input: ; Wherein, the The projection matrix is represented by a matrix of projections, Is the dimension of the attention head; calculating frequency guided spatial attention The method comprises the following steps: ; Wherein, the Representing normalizing each row of the matrix; the spatial attention is expressed using the following formula And the spectrum mark Is a fusion residual of: ; the specific operation of the space-spectral channels is as follows: Computing spatial signatures Average value of (2) : ; Building a spectral attention input: ; Wherein, the Representing a projection matrix; The frequency attention to get spatial guidance is: ; The spectral mark The fusion residual with this spatially directed frequency attention is expressed as: ; to space features And spectral features Connecting and obtaining a final fusion characteristic representation through linear transformation: ; Wherein, the The input features are converted to target features by linear mapping, Splicing the multiple features along the appointed dimension to form a new feature representation; The fusion feature And then carrying out feature refinement through two conditional transform encoders, a layer normalization LN and a Linear layer, and outputting decoding features.
4. A cross-domain image classification based on self-supervised spectral space modeling as defined in claim 3, wherein: In step S2, a mask reconstruction stage of the space-spectrum converter model comprises the steps of performing mask reconstruction on the decoded features by adopting a double-layer converter, and performing real fast Fourier transform on mask reconstruction results by adopting a frequency domain constraint module, wherein the specific operation is as follows: The decoder in the mask reconstruction process is realized by placing the decoder in a visible position Placing a leachable marker in the shielded area To initialize the tag sequence, embed the shared location Added to all marks to form Using a dual layer transducer decoder pair Processing to generate final output Wherein reconstructed features are extracted from the mask locations ; Mask part is masked using the following formula And the reconstruction feature Performing real fast fourier transform: ; Wherein, the Is a set of mask marks that are to be marked, And Respectively shown in the first True and reconstructed values for each mask position, rFFT operations Obtaining a dimension along a channel dimension as Is a complex spectrum of (2); a band mask is used to highlight the high frequency part: ; Wherein, the Determines the starting value of the high frequency band, Representing the frequency channel index (f-channel) and, The high frequency part is selected as the loss calculation, Excluding low frequency channels.
5. The cross-domain image classification based on self-supervised spectral space modeling as recited in claim 4, wherein the fine tuning stage of step S3 is performed as follows: dividing an input sample of a diffusion alignment trimming module into a set of labels Tag set with diffuse noise The disturbance is performed by the following formula: ; Wherein, the , Representing the cumulative retention factor, Representing standard Gaussian noise, random time step Wherein Representing the total diffusion step size; At each time step Teacher model And student model The following types of marks are output respectively: ; Wherein, the And Is the time step Class labels for teacher model and student model; the pre-training loss function is that the encoder is optimized by two self-supervision objective functions of signal guiding classification loss and diffusion loss in the pre-training stage, and the specific calculation steps of the loss function in the pre-training stage are as follows: the signal guide classification loss corresponds to a frequency domain constraint FDC module, consists of a spatial reconstruction item and a frequency domain constraint item, and guides the reconstruction of the spatial structure and the frequency spectrum characteristic of the encoder; The spatial reconstruction term employs pixel-by-pixel Loss function : ; Wherein, the Representation of Loss; the frequency domain constraint adopts a high-frequency loss function shown in the following formula : ; Wherein, the And Respectively representing spectral amplitude and element-by-element multiplication; Reconstruction loss The function is expressed by the following formula: ; Wherein the super parameter Controlling the weight of frequency domain supervision; The diffusion loss is processed by a four-layer diffusion decoder by jumping with an encoder Outputting the noise-reduced visual characteristics By and with clean target Is compared to evaluate the mean square error loss: ; Wherein, the Is the mean square error loss; The pre-training loss can be expressed as: ; fine tuning loss the goal of the fine tuning stage is to enhance both the classification loss and the diffusion trace aggregation loss by using the following signal-to-noise ratio; signal to noise ratio enhanced classification loss : ; Wherein, the Class labels representing the student's model, Representing class labels obtained from diffuse disturbance inputs, target labels Is a true value, and is a true value, Representing measuring the noise level of each sample and for weighting the loss; diffusion trajectory aggregation loss-defining the feature consistency loss per time step as : ; Wherein the method comprises the steps of And Respectively represent the time step as The characteristics of the student model and the teacher model, Is an optional linear projection, only when And Is used when the dimensions of the (c) are not consistent, Representing cosine similarity; forcing an entire time sequence with aggregation loss Global consistent alignment of the feature trajectories: ; The fine tuning loss is: ; Wherein, the Alignment and sorting are balanced.

Description

Cross-domain image classification based on self-supervision spectrum space modeling Technical Field The invention relates to the field of image processing, in particular to cross-domain image classification based on self-supervision spectrum space modeling. Background The cross-domain image classification aims at enabling a classification model trained on source domain, such as a certain region or certain sensor data, to be directly and effectively applied to target domain, such as new region and new sensor data, and avoiding expensive and time-consuming re-labeling and model reconstruction of target domain data when classifying data acquired by different scenes, time and equipment. The core challenge is 'domain shift', namely that the source domain model has a sudden performance drop on the target domain due to the fact that the same features have a significant difference in spectral characteristics due to different imaging conditions, illumination, seasons, geographic differences or sensor physical characteristics. The prior art method has the defects that the field self-adaptive method generally relies on source field data and target field data to carry out complex joint training, is inapplicable when the target field data cannot be acquired in advance or is required to be processed in real time, and has low data efficiency; the main stream method is to learn the unchanged characteristics of the domains to align and distribute, however, the boundary of different categories in the characteristic space is blurred by simple integral alignment, so that the inter-category separability is reduced, the target domain in the real scene often contains unknown categories, and the overall alignment can cause 'negative migration', so that the performance is reduced; most methods fail to fully and synergistically utilize both spectral and spatial information, resulting in insufficient discrimination of learned features. Disclosure of Invention The invention aims to provide cross-domain image classification based on self-supervision spectrum space modeling, which improves the accuracy of cross-domain image classification. In order to achieve the above purpose, the present invention adopts the following technical scheme: a cross-domain image classification based on self-supervised spectral-spatial modeling, comprising the steps of, in order: S1, in a data preprocessing stage, acquiring a source domain hyperspectral image, performing dimensionality reduction and serialization processing on the source domain hyperspectral image by adopting principal component analysis, respectively constructing a spatial mask feature and a spectral feature, and adding noise disturbance based on a Markov chain to the spatial mask feature; S2, in a pre-training stage, pre-training a space-spectrum converter model by adopting the pre-processed source domain hyperspectral image to obtain a trained space-spectrum converter model; The space-spectrum converter model adopts a four-layer conditional converter encoder and a four-layer converter decoder, the first two layers of the encoder adopt parallel space-spectrum double branches which respectively capture local space background and spectrum distribution characteristics, the middle layer of the encoder adopts a bidirectional cross attention mechanism to realize depth cross-mode complementation, fusion characteristics are obtained, the last two layers of the encoder extract characteristics of the fusion characteristics to obtain modeling characteristics, and the converter decoder carries out denoising reconstruction on noise characteristics of the modeling characteristics; S3, in the fine tuning stage, a trained space-spectrum converter model is adopted as a teacher characteristic network, a student model with the same structure and updatable parameters is constructed, a diffusion alignment fine tuning distillation mechanism is introduced, and the student model is trained to obtain a trained student model; S4, in the testing stage, preprocessing the target domain HSI image blocks, inputting the preprocessed target domain HSI image blocks into a trained student model for image classification processing, and classifying the HSI image blocks based on characteristic representations corresponding to class marks in a model output sequence. Preferably, in step S1, the main component analysis is used to perform dimension reduction and serialization processing to obtain an imageImage is formedDivided intoImage blocks of size, which are rearranged into a marking sequenceFor small-batch training, wherein,、AndRespectively representIs defined by the height, width and spectral dimensions of (c),The size of the batch is indicated and,Representing the number of marks, each markCorresponding to oneIs used for the image blocks of the (c),Representing the matrix transpose. Preferably, in step S2, the spatial signature sequence of the space-spectrum converter model is constructed by masking each marker in the marke