CN-121837644-B - Semi-supervised semantic segmentation method and system based on double-view consistent perception

CN121837644BCN 121837644 BCN121837644 BCN 121837644BCN-121837644-B

Abstract

The invention discloses a semi-supervised semantic segmentation method and a semi-supervised semantic segmentation system based on dual-view consistent perception, which belong to the technical field of computer vision and semantic segmentation. And obtaining the hardness value of each unlabeled sample by calculating the consistency difference between the pseudo tag and the student model weak enhancement prediction, and weighting unlabeled loss according to the hardness value. Finally, the student model is optimized by combining the supervision loss of the labeling data and the weighted unlabeled loss, and the teacher model is updated through the exponential sliding average. And based on the updated student model and teacher model, completing semantic segmentation. Through the synergistic effect of the technical means, the invention realizes more efficient, more stable and more accurate semi-supervised semantic segmentation under the condition of a small quantity of marked samples.

Inventors

SHI MIN
CHEN RUI
LUO AIWEN

Assignees

暨南大学

Dates

Publication Date: 20260508
Application Date: 20260316

Claims (8)

1. The semi-supervised semantic segmentation method based on double-view consistency perception is characterized by comprising the following steps of: acquiring a training data set containing marked images and unmarked images; Constructing a teacher model and a student model with the same network structure, and initializing model parameters; Weak data enhancement is applied to the marked image and the unmarked image of the training data set, so that the corresponding weak enhancement marked image and weak enhancement unmarked image are obtained; applying strong data enhancement to the weak enhanced unlabeled image to obtain a strong enhanced unlabeled image; inputting the weak enhanced unlabeled image into the teacher model to obtain a corresponding pseudo tag; Respectively inputting the strong enhanced unlabeled image and the weak enhanced unlabeled image into the student model to respectively obtain a first prediction result and a second prediction result; Calculating the hardness value of each unlabeled image sample based on the pseudo tag and the second prediction result; calculating unlabeled image loss according to the prediction result of the pseudo tag and the student model on the strongly enhanced unlabeled image; weighting the unlabeled image loss by using the hardness value to obtain a weighted unlabeled image loss; Calculating the loss of the marked image according to the real label of the marked image and the prediction result of the weak enhanced non-marked image; constructing total loss according to the marked image loss and the weighted unmarked image loss, updating parameters of the student model by using the total loss, and updating parameters of the teacher model in an exponential moving average mode; Based on the updated student model and the updated teacher model, completing semantic segmentation; the strong data enhancement comprises random intensity enhancement and sparse dual view saliency mixed enhancement; The random intensity enhancement comprises the steps of respectively and independently carrying out random intensity enhancement on the same unlabeled image twice to generate a first enhanced view angle image and a second enhanced view angle image, and respectively sending the first enhanced view angle image and the second enhanced view angle image of the unlabeled image in the same batch to the sparse double-view angle saliency mixed enhancement for parallel processing; The sparse dual-view saliency mixed enhancement comprises performing saliency analysis on partial images in the first enhanced view images sequentially input through a first view enhancement branch to obtain a first saliency region, performing saliency analysis on partial images in the second enhanced view images sequentially input through a second view enhancement branch to obtain a second saliency region, and mixing the first enhanced view images and the second enhanced view images according to the first saliency region and the second saliency region to generate a final strong enhanced image; Mixing the first enhanced view image and the second enhanced view image according to the first salient region and the second salient region, specifically: Performing saliency analysis on images in odd order in the sequentially input first enhanced view images to obtain the first saliency region; performing saliency analysis on even-order images in the second enhanced view images which are sequentially input to obtain a second saliency region; intercepting an image block corresponding to the first salient region in the first enhanced view image in the current order, and superposing the image block on a corresponding position of the first salient region in the second enhanced view image in the current order to generate a first strong enhanced image; intercepting an image block corresponding to the second salient region in the second enhanced view image in the current order, and superposing the image block on a corresponding position of the second salient region in the second enhanced view image in the next order to generate a second strong enhanced image; the first and second strongly enhanced images constitute the final strongly enhanced image.
2. The dual view consistency aware-based semi-supervised semantic segmentation method of claim 1, wherein computing the stiffness value for each unlabeled image sample comprises: In a prediction result of the statistical teacher model, the proportion of pixels with the maximum prediction probability exceeding a preset confidence coefficient threshold value to the total image pixels is used as a first high confidence coefficient proportion; in the prediction result of the statistical student model, the proportion of pixels with the maximum prediction probability exceeding the preset confidence threshold value to the total image pixels is used as a second high confidence proportion; calculating class weighted intersection ratios between the teacher model prediction result and the student model prediction result, wherein the class weighted intersection ratios are obtained by taking pixel occupation ratios of all semantic classes as weights and carrying out weighted summation on the intersection ratios of all classes; And calculating the hardness value through a symmetrical evaluation function based on the first high confidence ratio, the second high confidence ratio and the class weighted cross ratio.
3. The dual view consistent perceptron-based semi-supervised semantic segmentation method of claim 1, wherein the stiffness value is used to weight the unlabeled image loss, in particular: obtaining a corresponding weight coefficient according to the hardness value of each unlabeled image sample obtained through calculation; And scaling the unlabeled image loss according to the weight coefficient to obtain a weighted loss value.
4. The dual view consistency aware-based semi-supervised semantic segmentation method of claim 1, wherein the unlabeled image loss and the annotated image loss are both calculated using a cross entropy loss function.
5. The dual view consistency aware-based semi-supervised semantic segmentation method of claim 1, wherein the total penalty is a weighted sum between the annotated image penalty and the weighted unlabeled image penalty, wherein the weight of the weighted unlabeled image penalty in the total penalty is adjusted by a hyper-parameter.
6. The semi-supervised semantic segmentation method based on dual view consistent awareness of claim 1, wherein updating parameters of the teacher model by exponential sliding average is specifically: calculating to obtain the update amount of the teacher model parameters based on the latest parameters of the student model, the current parameters of the teacher model and the preset momentum coefficient; and adjusting parameters of the teacher model according to the updating quantity to obtain the updated teacher model.
7. A dual-view consistency awareness based semi-supervised semantic segmentation system, characterized by being used for implementing the dual-view consistency awareness based semi-supervised semantic segmentation method as set forth in any one of claims 1-6, comprising: The data acquisition module is used for acquiring a training data set containing marked images and unmarked images; The model construction and initialization module is used for constructing a teacher model and a student model with the same network structure and initializing model parameters; The data enhancement module is used for applying weak data enhancement to the marked image and the unmarked image of the training data set to obtain a corresponding weak enhanced marked image and a corresponding weak enhanced unmarked image; the pseudo tag generation module is used for inputting the weak enhanced unlabeled image into the teacher model to generate a pseudo tag; the model prediction module is used for respectively inputting the strong enhanced unlabeled image and the weak enhanced unlabeled image into the student model to respectively obtain a first prediction result and a second prediction result; the hardness evaluation module is used for calculating the hardness value of each unlabeled image sample based on the pseudo tag and the second prediction result; the loss calculation and weighting module is used for calculating the marked image loss and the unmarked image loss and weighting the unmarked image loss by utilizing the hardness value; The model updating module is used for updating the student model parameters according to the total loss and updating the teacher model parameters in an index moving average mode; and the semantic segmentation module is used for completing semantic segmentation based on the updated student model and the updated teacher model.
8. An electronic device comprising a memory, a processor and a computing program stored in the memory and executable on the processor, wherein the processor is configured to implement the dual view consistency aware-based semi-supervised semantic segmentation method of any of claims 1-6 when the computing program is executed.

Description

Semi-supervised semantic segmentation method and system based on double-view consistent perception Technical Field The invention belongs to the technical field of computer vision and semantic segmentation, and particularly relates to a semi-supervised semantic segmentation method and system based on double-view consistent perception. Background Semantic segmentation is a key technology in the field of computer vision, and aims to allocate a semantic class label to each pixel in an image so as to realize pixel-level understanding of a scene. The technology is widely applied to the fields of automatic driving, medical image analysis, remote sensing image interpretation, industrial vision detection and the like, and is the basis for realizing environment perception and decision by a plurality of intelligent systems. At present, a semantic segmentation model with excellent performance usually relies on large-scale and high-quality pixel-level labeling data for full-supervision training. However, obtaining such labeling data requires significant manpower, time and economic costs. For example, on the disclosed urban scene dataset CITYSCAPES, fine labeling of a single image takes on average more than 1.5 hours. In the professional fields of medical imaging, remote sensing and the like, the data annotation is more dependent on the knowledge and experience of field experts, so that the annotation threshold is extremely high, the cost is more expensive, and the application and development of the semantic segmentation technology in the data scarcity scene are severely restricted. To alleviate the dependence on annotation data, semi-supervised learning methods are introduced into the semantic segmentation task. The core idea of such methods is to train the model with both small amounts of labeled data and large amounts of readily available unlabeled data. The mainstream method generally adopts a consistency regularization strategy, namely, different disturbance or enhancement is applied to the unlabeled data, and consistency of prediction results of different disturbance versions by the model is restrained, so that the model can learn useful characteristic representations from the unlabeled data. Nevertheless, existing semi-supervised semantic segmentation methods still have significant limitations. First, in terms of data enhancement strategies, existing methods often directly employ strong enhancement schemes designed for supervised learning. Such enhancements may include fixed combinations of operations or random local perturbations that under semi-supervised settings can easily lead to excessive distortion of the image, breaking its semantic consistency, and thus produce unreliable supervisory signals that limit the effective use of unlabeled data, even causing model validation bias. Secondly, in the aspect of loss function design, the existing method generally applies consistency constraint of equal weight to all unlabeled samples, and ignores inherent learning difficulty difference among different samples. The equalization processing makes model training easily interfered by difficult samples with unstable prediction and high noise, leads to fluctuation of the training process and finally influences generalization capability and segmentation accuracy of the model. In view of the above-mentioned problems in the prior art, it is needed to provide a semi-supervised semantic segmentation method and system based on dual-view consistent perception. Disclosure of Invention In order to solve the technical problems, the invention provides a semi-supervised semantic segmentation method and a semi-supervised semantic segmentation system based on dual-view consistency perception. Meanwhile, the invention adopts a teacher-student model architecture, and guides and constrains the learning process of the unlabeled sample by calculating the weighted cross-correlation ratio between the teacher model predicted result and the student model weakly-enhanced branch predicted result, thereby realizing the dynamic modeling of the sample reliability. By the aid of the technical means, adverse effects caused by noise pseudo labels can be effectively relieved, stability and generalization capability of model training are improved, and therefore a higher-precision semantic segmentation effect is achieved. The invention provides a semi-supervised semantic segmentation method based on double-view consistent perception, which comprises the following steps of: acquiring a training data set containing marked images and unmarked images; Constructing a teacher model and a student model with the same network structure, and initializing model parameters; Weak data enhancement is applied to the marked image and the unmarked image of the training data set, so that the corresponding weak enhancement marked image and weak enhancement unmarked image are obtained; applying strong data enhancement to the weak enhanced unlabeled image to obtain a strong enh