CN-121982510-A - Remote sensing image cross-domain target detection method based on self-supervision learning branch and pseudo tag time consistency
Abstract
The invention discloses a remote sensing image cross-domain target detection method based on time consistency of a self-supervision learning branch and a pseudo tag, which comprises the steps of 1, amplifying a source domain data set and a target domain data set, 2, forming a picture pair by two domains of pictures, inputting a feature extraction network to perform feature extraction and optimize the feature extraction network, 3, sending the extracted features into the self-supervision learning branch to perform self-supervision learning, sending the extracted features into a target detection network to perform target detection, 4, updating the weight of a teacher model by using the weight of the target detection network, 5, generating a prediction result of the target domain by using the trained teacher model, and generating a high-quality pseudo tag by time consistency filtering, and 6, training the target detection capability of the target domain by using the pseudo tag by using a student model. According to the method, the quality of the pseudo tag and the cross-domain feature extraction capability of the network are improved through the pseudo tag time consistency filtering module and the self-supervision learning branch, and the problem of performance degradation caused by cross-domain is reduced.
Inventors
- Zhuang shuo
- HOU YONGXING
- QI MEIBIN
- LI XIAOHONG
- LIU YIMIN
Assignees
- 合肥工业大学
Dates
- Publication Date
- 20260505
- Application Date
- 20250625
Claims (9)
- 1. A remote sensing image cross-domain target detection method based on self-supervision learning branches and pseudo tag time consistency is characterized by comprising the following steps of: the method comprises the steps of constructing a source domain for training and a target domain for cross-domain learning, amplifying the source domain to obtain a data set A, and amplifying the target domain to obtain a data set B; Combining the pictures in the data set A and the pictures in the data set B into pairs, inputting a feature extraction network to perform feature extraction, and simultaneously performing domain countermeasure learning on the feature extraction network according to the data set of each picture, and updating the feature extraction network; constructing a self-supervision learning branch and a target detection branch, wherein the self-supervision learning branch is used for receiving the characteristics extracted by the characteristic extraction network, predicting the covered characteristics after randomly covering the characteristics, and calculating the difference between a predicted value and a true value; Optimizing the performance of a self-supervision learning branch and a target detection branch model through multiple rounds of iterative training, and updating the weight of the teacher model in an EMA mode according to the network weight of the optimized model to obtain a trained teacher model; generating a prediction result of a target domain by using a trained teacher model, and filtering the time consistency of the prediction result of the round and the result of the previous round in a memory to generate a high-quality pseudo tag; The student model is trained by using the pseudo tag, a trained student model is obtained, and target detection is carried out on the target remote sensing image by using the trained student model.
- 2. The remote sensing image cross-domain target detection method based on self-supervision learning branch and pseudo-label time consistency of claim 1 is characterized in that the method for updating the feature extraction network is specifically characterized in that feature extraction is carried out on a picture group by using the feature extraction network to obtain a plurality of feature graphs with different scales, a convolution layer and LeakyRelu activation functions are used for each feature graph, the domain where the picture is located is judged, loss is calculated by using a cross entropy Loss function, and the weight of the updated feature extraction network is adjusted.
- 3. The method for detecting the remote sensing image cross-domain target based on the time consistency of the self-supervision learning branch and the pseudo tag according to claim 2, wherein the self-supervision learning branch randomly generates a mask with the same size as the picture, then sends the picture shielded by the mask into Encoder in the self-supervision learning branch, predicts the shielded area by using a lightweight Decoder after obtaining the coding result, calculates the difference between the prediction and the true value, and completes the self-supervision learning of the characteristics of the source domain and the target domain.
- 4. The remote sensing image cross-domain target detection method based on the time consistency of the self-supervision learning branch and the pseudo tag according to claim 3 is characterized in that the target detection branch acquires a multi-level feature map output by a feature extraction network aiming at a source domain picture, and the Encoder and a Decoder in the target detection branch are used for respectively predicting target classification and target position and calculating Loss to complete learning of source domain feature representation.
- 5. The remote sensing image cross-domain target detection method based on the time consistency of the self-supervision learning branches and the pseudo tag of claim 4 is characterized in that the step of updating the teacher model in an EMA mode comprises the steps of obtaining the latest network weight parameters obtained by training a current student model, and carrying out weighted fusion on the network weight parameters of the student model and the existing network weight parameters of the teacher model by combining a preset smoothing coefficient to achieve the updating of the teacher model.
- 6. The remote sensing image cross-domain target detection method based on the time consistency of the self-supervision learning branches and the pseudo labels is characterized in that a teacher model performs time consistency filtering, and the step of generating high-quality pseudo labels is that the teacher model receives a target domain picture, predicts results, if the result is the first round, the predicted results are directly used as the pseudo labels, if the result is not the first round, the predicted results generated by the round are spliced with the predicted results stored in the previous round, and the result is used as a whole to perform non-maximum suppression algorithm operation, so that more reliable candidate frames are screened.
- 7. The method for detecting the remote sensing image cross-domain target based on the time consistency of the self-supervision learning branch and the pseudo tag according to claim 6, wherein in order to avoid that a false prediction result generated by a certain round continuously affects the whole network when a teacher model carries out time consistency filtering, the result of the previous round is multiplied by an attenuation factor when the result of the previous round is taken out for use.
- 8. The remote sensing image cross-domain target detection method based on the time consistency of the self-supervision learning branches and the pseudo tags is characterized in that the training of a student model by using the pseudo tags comprises the steps of performing depth feature coding on a pre-trained feature extraction network by using a target domain image to obtain a feature representation with discriminant, inputting the feature of the target domain image into the student model to obtain probability distribution of a target class and coordinate regression values of a target boundary box, and updating trainable parameters of the student model by calculating loss and back propagation.
- 9. The remote sensing image cross-domain target detection method based on the time consistency of the self-supervision learning branches and the pseudo tags is characterized by comprising the specific steps of measuring the difference between category prediction and the pseudo tags by adopting a cross entropy loss function, evaluating the positioning accuracy by using a smooth L1 loss, forming a multi-task loss function by weighting and summing the two groups of losses, and carrying out back propagation by adopting an Adam optimizer to finish the updating of the student model.
Description
Remote sensing image cross-domain target detection method based on self-supervision learning branch and pseudo tag time consistency Technical Field The invention belongs to the technical field of computer vision, and particularly relates to a remote sensing image cross-domain target detection method based on self-supervision learning branches and pseudo tag time consistency. Background The target detection technology based on deep learning has wide application prospect in a plurality of important fields, in particular to the aspects of environment monitoring, city planning and management, resource detection and the like. Deep learning is a data-driven method that relies on a large number of high-quality, accurately labeled data sets to support model training and optimization. However, in the field of remote sensing images, acquiring high quality, manually labeled accurate image datasets remains a very challenging task. The scarcity of such data severely constrains the further development of deep learning techniques in remote sensing target detection. In addition, researchers also find that the performance of the same deep learning model on different remote sensing data sets may be significantly reduced due to differences in platform sensor types, imaging time, angle, illumination conditions and other changes in the acquisition environment. This phenomenon highlights the importance and urgency of studying cross-domain learning. Disclosure of Invention The application aims to provide a remote sensing image cross-domain target detection method based on the time consistency of a self-supervision learning branch and a pseudo tag. The application discloses a remote sensing image cross-domain target detection method based on self-supervision learning branches and pseudo tag time consistency, which comprises the following specific steps: the method comprises the steps of constructing a source domain for training and a target domain for cross-domain learning, amplifying the source domain to obtain a data set A, and amplifying the target domain to obtain a data set B; Combining the pictures in the data set A and the pictures in the data set B into pairs, inputting a feature extraction network to perform feature extraction, and simultaneously performing domain countermeasure learning on the feature extraction network according to the data set of each picture, and updating the feature extraction network; constructing a self-supervision learning branch and a target detection branch, wherein the self-supervision learning branch is used for receiving the characteristics extracted by the characteristic extraction network, predicting the covered characteristics after randomly covering the characteristics, and calculating the difference between a predicted value and a true value; Optimizing the performance of a self-supervision learning branch and a target detection branch model through multiple rounds of iterative training, and updating the weight of the teacher model in an EMA mode according to the network weight of the optimized model to obtain a trained teacher model; generating a prediction result of a target domain by using a trained teacher model, and filtering the time consistency of the prediction result of the round and the result of the previous round in a memory to generate a high-quality pseudo tag; The student model is trained by using the pseudo tag, a trained student model is obtained, and target detection is carried out on the target remote sensing image by using the trained student model. Preferably, the method for updating the feature extraction network specifically comprises the steps of performing feature extraction on a picture group by using the feature extraction network to obtain a plurality of feature graphs with different scales, judging the domain where the picture is located by using a convolution layer and LeakyRelu activation functions for each feature graph, calculating Loss by using a cross entropy Loss function, and adjusting the weight of the feature extraction network. Preferably, the self-supervision learning branch randomly generates a mask with the same size as the picture, then sends the picture masked by the mask into Encoder in the self-supervision learning branch, predicts the masked region by using a lightweight Decoder after obtaining the coding result, calculates the difference between the prediction and the true value, and completes the self-supervision learning of the source domain and the target domain features. Preferably, the target detection branch acquires a multi-level feature map output by the feature extraction network aiming at the source domain picture, and Encoder and a Decoder in the target detection branch are used for respectively predicting target classification and target position and calculating Loss to complete learning of source domain feature representation. Preferably, the step of updating the teacher model in the EMA mode comprises the steps of obtaining t