CN-121982576-A - Remote sensing image semi-supervised semantic segmentation method, device, equipment and medium
Abstract
The application discloses a method, a device, equipment and a medium for semi-supervised semantic segmentation of a remote sensing image, which relate to the technical field of remote sensing image processing and comprise the steps of inputting tagged and untagged data to perform feature extraction and prediction by constructing an initial semi-supervised semantic segmentation model, calculating and updating tagged and untagged category prototypes, obtaining category weights by statistical category distribution, and constructing a bidirectional prototype loss constraint. Model training is carried out by combining supervision loss, unsupervised consistency loss and bidirectional prototype loss constraint, so that the problem of unbalanced categories is effectively relieved by finally realizing efficient segmentation of the remote sensing images, the recognition capability of the model to minority categories is improved, the performance and generalization capability of semi-supervised learning are enhanced, and the precision and efficiency of semantic segmentation of the remote sensing images are improved.
Inventors
- ZHANG LI
- Zhu jianghan
- LI YAOYU
- WANG JIANJIANG
Assignees
- 中国人民解放军国防科技大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260409
Claims (10)
- 1. The semi-supervised semantic segmentation method for the remote sensing image is characterized by comprising the following steps of: Constructing an initial semi-supervised semantic segmentation model, acquiring tagged data and untagged data, respectively inputting the tagged data and the untagged data into the initial semi-supervised semantic segmentation model, and performing feature extraction and prediction on the tagged data and the untagged data to obtain a tagged data prediction result, a strong enhancement untagged data prediction result and a weak enhancement untagged data prediction result; Respectively calculating and updating based on the characteristics of the tagged data and the characteristics of the untagged data to obtain a tagged type prototype and an untagged type prototype; Counting the class distribution of the tagged data and the untagged data, calculating and updating to obtain class weights; constructing a bidirectional prototype loss constraint based on the labeled class prototype, the unlabeled class prototype, and the class weight; Calculating according to the tagged data prediction result and the real tag by adopting a cross entropy loss function to obtain supervision loss; calculating by adopting a cross entropy loss function according to the strong enhancement non-tag data prediction result and a pseudo tag generated by weak enhancement non-tag data to obtain an unsupervised consistency loss; Obtaining a total loss value based on the supervision loss, the unsupervised consistency loss and the bidirectional prototype loss constraint, and training the initial semi-supervision semantic segmentation model based on the total loss value to obtain a semi-supervision semantic segmentation model; And acquiring remote sensing image data, and inputting the remote sensing image data into a semi-supervised semantic segmentation model to obtain a segmentation result, wherein the semi-supervised semantic segmentation model comprises an encoder and a decoder.
- 2. The method of claim 1, wherein the steps of obtaining tagged data and untagged data and inputting the tagged data and the untagged data into the initial semi-supervised semantic segmentation model, respectively, performing feature extraction and prediction on the tagged data and the untagged data to obtain a tagged data prediction result, a strong enhanced untagged data prediction result, and a weak enhanced untagged data prediction result, comprise: Acquiring a labeled data set and a non-labeled data set, and respectively carrying out strong enhancement and weak enhancement on non-labeled data in the non-labeled data set to obtain strong enhanced non-labeled data and weak enhanced non-labeled data; And respectively inputting the tagged data, the strong enhancement non-tagged data and the weak enhancement non-tagged data into an encoder and a decoder of the initial semi-supervised semantic segmentation model, and sequentially carrying out feature extraction and classification prediction to obtain a tagged data prediction result, a strong enhancement non-tagged data prediction result and a weak enhancement non-tagged data prediction result.
- 3. The method of claim 2, wherein the step of inputting the tagged data, the strongly enhanced non-tagged data, and the weakly enhanced non-tagged data into the encoder and decoder of the initial semi-supervised semantic segmentation model, respectively, sequentially performing feature extraction and classification prediction to obtain a tagged data prediction result, a strongly enhanced non-tagged data prediction result, and a weakly enhanced non-tagged data prediction result, comprises: Inputting the tagged data into an encoder of the initial semi-supervised semantic segmentation model, and extracting deep semantic features through convolution and pooling operations to obtain tagged data features; Inputting the tagged data features into a decoder of the initial semi-supervised semantic segmentation model, performing feature reconstruction through upsampling and feature fusion operation, and obtaining a tagged data prediction result through calculation of a classification layer; Inputting the strong enhanced non-tag data into an encoder of the initial semi-supervised semantic segmentation model, and extracting deep semantic features through convolution and pooling operations to obtain strong enhanced non-tag data features; Inputting the strong enhanced non-tag data features into a decoder of the initial semi-supervised semantic segmentation model, performing feature reconstruction through upsampling and feature fusion operation, and obtaining a strong enhanced non-tag data prediction result through classification layer calculation; inputting the weak enhancement non-tag data into an encoder of the initial semi-supervised semantic segmentation model, and extracting deep semantic features through convolution and pooling operations to obtain weak enhancement non-tag data features; inputting the weak enhancement non-tag data features into a decoder of the initial semi-supervised semantic segmentation model, performing feature reconstruction through upsampling and feature fusion operation, and obtaining a weak enhancement non-tag data prediction result through classification layer calculation.
- 4. The method of claim 1, wherein the step of calculating and updating based on the characteristics of the tagged data and the characteristics of the untagged data, respectively, to obtain tagged category prototypes and untagged category prototypes comprises: initializing a prototype statistics storage unit when each training period is started, and resetting a feature accumulation buffer and a pixel count buffer to zero values, wherein the feature accumulation buffer is used for storing a class-by-class feature sum, and the pixel count buffer is used for storing the number of class-by-class effective pixels; Traversing the characteristics of the tagged data according to pixel dimensions, and combining the real tags of the tagged data to obtain first pixel characteristics; Classifying each first pixel feature into a feature accumulation buffer under a corresponding category, and simultaneously adding an operation to a pixel counting buffer corresponding to the category to obtain a tagged feature accumulation result and a tagged pixel counting result for each category; Traversing the features of the label-free data according to pixel dimensions, extracting pseudo labels and prediction probabilities corresponding to each pixel, and screening out second pixel features with prediction probabilities higher than a preset confidence threshold; classifying the second pixel characteristics into characteristic accumulation caches corresponding to the categories, and simultaneously adding an operation to the pixel counting caches corresponding to the categories to obtain category-by-category unlabeled characteristic accumulation results and unlabeled pixel counting results; Normalizing the class-by-class labeled feature accumulation result and the labeled pixel count result in a mode of dividing feature sum by corresponding class pixel count to obtain labeled initial prototype vectors of each class; Normalizing the class-by-class unlabeled feature accumulation result and the unlabeled pixel counting result in a mode of dividing feature summation by corresponding class pixel counting to obtain an unlabeled initial prototype vector of each class; And respectively carrying out standardization processing on the labeled initial prototype vector and the unlabeled initial prototype vector by adopting an L2 normalization algorithm to obtain a final labeled class prototype and an unlabeled class prototype.
- 5. The method of claim 1, wherein said step of calculating and updating a category distribution of said tagged data and said untagged data to obtain a category weight comprises: acquiring a real tag set of the tagged data and a high-confidence pseudo tag set of the untagged data, wherein the high-confidence pseudo tag set is formed by pseudo tags with prediction probability higher than a preset confidence threshold; The real tag set is subjected to pixel counting according to categories to obtain the pixel number of each category of tagged data, the category-by-category duty ratio of the tagged data is calculated by combining a preset smoothing factor, and the category-by-category duty ratio of the tagged data is the ratio of the pixel number of the corresponding category to the sum of the total pixel number of the tagged data and the preset smoothing factor; Performing pixel counting on the high-confidence pseudo tag set according to categories to obtain the number of pixels of the unlabeled data category by category; Calculating based on the number of the pixels of the type-by-type of the unlabeled data and the preset smoothing factor to obtain the duty ratio of the type-by-type of the unlabeled data, wherein the duty ratio of the type-by-type of the unlabeled data is the ratio of the number of the pixels of the corresponding type to the sum of the number of the total pixels of the unlabeled data plus the preset smoothing factor; adopting a weighted fusion strategy to fuse the class-by-class proportion of the tagged data and the class-by-class proportion of the untagged data and combining weights to obtain a fused class-by-class distribution proportion, wherein the weights are set according to the labeling reliability of the tagged data; calculating basic weights based on the fused class-by-class distribution duty ratio, wherein the basic weights are the ratio of a preset reference value to the duty ratio of the fused corresponding class; normalizing the basic weights of all the categories to obtain normalized basic weights; judging all the categories by a statistical test method to obtain minority categories and majority categories; Performing optimization strategy processing on the normalized basic weights of the minority classes and the normalized basic weights of the majority classes to obtain adjusted weights, wherein the optimization strategy is to optimize the normalized basic weights of the minority classes multiplied by a preset enhancement coefficient, and the normalized basic weights of the majority classes are kept unchanged; And taking the adjusted weight as a category weight.
- 6. The method of claim 1, wherein the step of constructing a bi-directional prototype loss constraint based on the labeled class prototype, the unlabeled class prototype, and the class weight comprises: Acquiring a feature matrix of the tagged data and a corresponding real tag vector, and performing dimension matching on the feature matrix of the tagged data and the untagged category prototype to obtain a tagged feature matrix and an untagged category prototype matrix with unified dimensions; Calculating cosine similarity of each pixel feature in the tagged feature matrix after the dimension is unified and a corresponding category prototype in the untagged category prototype matrix to obtain a first similarity matrix; Carrying out weighted adjustment on the similarity matrix by combining the category weights, multiplying the similarity value of each category by the category weight of the corresponding category to obtain a weighted first similarity matrix; Acquiring a high-confidence feature matrix and a corresponding pseudo-tag vector of the unlabeled data, and performing dimension matching on the high-confidence feature matrix of the unlabeled data and the labeled category prototype to obtain an unlabeled feature matrix and a labeled category prototype matrix with unified dimensions; Calculating cosine similarity of each pixel feature in the label-free feature matrix with unified dimensions and a corresponding class prototype in the labeled class prototype matrix to obtain a second similarity matrix, and carrying out weighted adjustment on the second similarity matrix by combining the class weights to obtain a weighted second similarity matrix; Calculating a loss value for the weighted first similarity matrix and the weighted second similarity matrix by adopting a cross entropy loss function to obtain a first loss value and a second loss value respectively; and weighting and summing the first loss value and the second loss value according to a preset proportion to obtain the bidirectional prototype loss constraint.
- 7. The method of claim 1, wherein the steps of deriving a total loss value based on the supervised loss, the unsupervised consistency loss, and the bi-directional prototype loss constraint, and training the initial semi-supervised semantic segmentation model based on the total loss value, comprise: respectively carrying out standardization processing on the supervision loss, the unsupervised consistency loss and the bidirectional prototype loss constraint by adopting a normalization algorithm to obtain a normalization supervision loss, a normalization unsupervised consistency loss and a normalization bidirectional prototype loss; Setting a first weight, a second weight and a third weight in an initial training stage, wherein the first weight, the second weight and the third weight correspond to the normalized supervision loss, the normalized unsupervised consistency loss and the normalized bidirectional prototype loss respectively, and the first weight is kept unchanged when the third weight is increased and the second weight is reduced along with the increase of training iteration times; Weighting and summing the normalized supervision loss, the normalized unsupervised consistency loss and the normalized bidirectional prototype loss according to the adjusted weight to obtain a total loss value; Calculating the gradient of the total loss value on all the trainable parameters in the initial semi-supervised semantic segmentation model, and limiting the gradient by adopting a gradient clipping algorithm to obtain a clipped gradient; And calling the gradient after clipping to update model parameters through an optimizer until the preset maximum iteration times or the total loss value does not exceed a preset loss threshold value, and obtaining the trained semi-supervised semantic segmentation model.
- 8. A remote sensing image semi-supervised semantic segmentation apparatus, the apparatus comprising: The initialization module is used for constructing an initial semi-supervised semantic segmentation model, acquiring tagged data and untagged data, respectively inputting the tagged data and the untagged data into the initial semi-supervised semantic segmentation model, and carrying out feature extraction and prediction on the tagged data and the untagged data to obtain a tagged data prediction result, a strong enhancement untagged data prediction result and a weak enhancement untagged data prediction result; The updating module is used for respectively calculating and updating based on the characteristics of the tagged data and the characteristics of the untagged data to obtain a tagged type prototype and an untagged type prototype; The loss calculation module is used for constructing a bidirectional prototype loss constraint based on the labeled category prototype, the unlabeled category prototype and the category weight; the method is used for obtaining the label data prediction result of the strong enhancement label data and the weak enhancement label data, and comprises the steps of obtaining a label data prediction result of the strong enhancement label data and a label data of the weak enhancement label data; the model training module is used for obtaining a total loss value based on the supervision loss, the unsupervised consistency loss and the bidirectional prototype loss constraint, and training the initial semi-supervision semantic segmentation model based on the total loss value to obtain a semi-supervision semantic segmentation model; the result module is used for acquiring remote sensing image data and inputting the remote sensing image data into the semi-supervised semantic segmentation model to obtain a segmentation result, wherein the semi-supervised semantic segmentation model comprises an encoder and a decoder.
- 9. A remote sensing image semi-supervised semantic segmentation apparatus, characterized in that the apparatus comprises a memory, a processor and a remote sensing image semi-supervised semantic segmentation program stored on the memory and running on the processor, the remote sensing image semi-supervised semantic segmentation program being configured to implement the steps of the remote sensing image semi-supervised semantic segmentation method as set forth in any of claims 1-7.
- 10. A medium, wherein a remote sensing image semi-supervised semantic segmentation procedure is stored on the medium, and the remote sensing image semi-supervised semantic segmentation procedure, when executed by a processor, implements the steps of the remote sensing image semi-supervised semantic segmentation method as set forth in any of claims 1-7.
Description
Remote sensing image semi-supervised semantic segmentation method, device, equipment and medium Technical Field The invention relates to the technical field of remote sensing image processing, in particular to a semi-supervised semantic segmentation method, device, equipment and medium for remote sensing images. Background At present, the semi-supervised semantic segmentation mainly relieves the problem of insufficient annotation data through data enhancement, loss function design, training strategy optimization and other modes. For example, the data enhancement technology generates more various training samples through operations such as rotation, scaling, cutting and the like, in the aspect of loss function design, if consistency regularization loss is introduced, a model is required to output consistent prediction results for unlabeled data of different enhancement versions, and the training strategy optimization comprises methods such as self-adaptive threshold adjustment, pseudo-label generation and the like so as to improve the utilization efficiency of the unlabeled data. In addition, some studies have attempted to improve the recognition of a minority class by improving the model structure, such as introducing a mechanism of attention or decoupling the network structure. Although the existing method alleviates the problem of insufficient annotation data to a certain extent, the method still has the defect. In practical applications, there is often a difficulty in establishing this assumption that there is far more unlabeled data than labeled data. The class distribution of the remote sensing image is naturally unbalanced, and a part of few classes can be extremely rare in tagged data, so that the model is difficult to learn the effective characteristics of the classes. In addition, there may be a rich minority of samples in the unlabeled data, but due to the lack of an explicit supervisory signal, these samples cannot generate reliable pseudo labels and cannot be effectively utilized. The distribution inconsistency not only weakens the effectiveness of semi-supervised learning, but also further amplifies the inherent class imbalance problem in remote sensing semantic segmentation. Therefore, how to effectively alleviate the pseudo tag deviation caused by the class imbalance, fully mine the minority class information in the non-tag data and strengthen the recognition capability of the model to the minority class under the limited labeling condition becomes a problem to be solved urgently. Disclosure of Invention The application mainly aims to provide a semi-supervised semantic segmentation method, device, equipment and medium for remote sensing images, and aims to solve the technical problem of how to alleviate pseudo tag deviation and improve the recognition capability of few categories under limited labeling. In order to achieve the above purpose, the present application provides a semi-supervised semantic segmentation method for remote sensing images, comprising: Constructing an initial semi-supervised semantic segmentation model, acquiring tagged data and untagged data, respectively inputting the tagged data and the untagged data into the initial semi-supervised semantic segmentation model, and performing feature extraction and prediction on the tagged data and the untagged data to obtain a tagged data prediction result, a strong enhancement untagged data prediction result and a weak enhancement untagged data prediction result; Respectively calculating and updating based on the characteristics of the tagged data and the characteristics of the untagged data to obtain a tagged type prototype and an untagged type prototype; Counting the class distribution of the tagged data and the untagged data, calculating and updating to obtain class weights; constructing a bidirectional prototype loss constraint based on the labeled class prototype, the unlabeled class prototype, and the class weight; Calculating according to the tagged data prediction result and the real tag by adopting a cross entropy loss function to obtain supervision loss; calculating by adopting a cross entropy loss function according to the strong enhancement non-tag data prediction result and a pseudo tag generated by weak enhancement non-tag data to obtain an unsupervised consistency loss; Obtaining a total loss value based on the supervision loss, the unsupervised consistency loss and the bidirectional prototype loss constraint, and training the initial semi-supervision semantic segmentation model based on the total loss value to obtain a semi-supervision semantic segmentation model; And acquiring remote sensing image data, and inputting the remote sensing image data into a semi-supervised semantic segmentation model to obtain a segmentation result, wherein the semi-supervised semantic segmentation model comprises an encoder and a decoder. In an embodiment, the steps of obtaining tagged data and untagged data and inputting the tagged d