CN-122023772-A - Weak supervision camouflage target detection method and system based on graffiti labeling and depth information assistance
Abstract
The invention discloses a method and a system for detecting a weak supervision camouflage target based on graffiti labeling and depth information assistance, through extremely low-cost doodling marking, a collaborative training and feature fusion mechanism is designed by combining the complementary information of RGB and depth images. According to the invention, through a multi-mode information fusion and pseudo-label optimization mechanism, on the premise of only needing sparse doodling, high-precision camouflage target detection is realized while the data annotation burden is obviously reduced, scene understanding and detail keeping capability which are not inferior to those of a full-supervision model are realized, and a practical solution is provided for low-cost and high-precision camouflage target detection.
Inventors
- He Doudou
- YAN BIAO
Assignees
- 扬州大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260129
Claims (10)
- 1. The weak supervision camouflage target detection method based on graffiti labeling and depth information assistance is characterized by comprising the following steps of: step 1, acquiring an RGB image and a depth image containing a camouflage target, and carrying out graffiti labeling on the RGB image to obtain a graffiti labeling image; Step 2, respectively processing the RGB image and the depth image by using a super-pixel segmentation algorithm, and generating an initial RGB pseudo tag and an initial depth pseudo tag by combining the graffiti labeling image; Step 3, using two original PVTv networks as a single-mode prediction module to respectively process the RGB image and the depth image to generate a predicted RGB pseudo tag and a predicted depth pseudo tag; Step 4, constructing a prediction model, including an encoder for extracting image features, an aggregation module for fusing the features and a decoder for outputting a prediction graph, designing a composite loss function of the prediction model, and performing end-to-end training on the prediction model based on a public data set; the composite loss function comprises a graffiti supervision loss based on the graffiti annotation image, a single-mode prediction loss based on the initial RGB pseudo tag and the initial depth pseudo tag and an aggregation supervision loss based on the aggregation pseudo tag; and 5, inputting the RGB image and the depth image to be detected into a trained prediction model, and outputting a prediction graph of the camouflage target.
- 2. The weakly-supervised camouflage target detection method of claim 1, the processing the RGB image and the depth image respectively using a super-pixel segmentation algorithm, comprising: Using a super-pixel segmentation algorithm to the RGB image to obtain super-pixel areas of a plurality of RGB images, and obtaining an intersection of the graffiti labeling image and the super-pixel areas of the RGB image to obtain an initial RGB pseudo tag; And using a super-pixel segmentation algorithm to the depth image to obtain super-pixel areas of a plurality of depth images and super-pixel areas of the depth image, and taking an intersection of the graffiti annotation image and the super-pixel areas of the depth image to obtain an initial depth pseudo tag.
- 3. The weak supervision camouflage target detection method according to claim 2, wherein when the single-mode prediction module processes an RGB image, a plurality of RGB image features are extracted layer by layer from low to high by taking the RGB image as input, the initial RGB pseudo tag obtained in the step 2 is taken as supervision, and the extracted highest-layer RGB image features are decoded to generate a predicted RGB pseudo tag; and when the single-mode prediction module processes the depth image, taking the depth image as input, extracting a plurality of depth image features layer by layer from low to high, taking the initial depth pseudo tag obtained in the step 2 as supervision, and decoding the extracted highest layer depth image features to generate a predicted depth pseudo tag.
- 4. A weakly-supervised camouflage target detection method as recited in claim 3, the pixel adaptive refinement algorithm comprising: and taking the predicted RGB pseudo tag and the predicted depth pseudo tag as input, taking the color and the spatial affinity of the RGB image as guidance, performing iterative optimization and weighted average, and outputting an aggregation pseudo tag.
- 5. The method for detecting a weakly supervised camouflage target as set forth in claim 4, wherein the encoder of the predictive model is a PVTv network, several layers of RGB image features are extracted from RGB images of the input predictive model, several layers of depth image features are extracted from depth images of the input predictive model, and the number of layers of RGB image features is the same as the number of layers of the depth image features.
- 6. The weak supervision camouflage target detection method according to claim 5, wherein the aggregation module of the prediction model comprises a depth separable pyramid pooling module and a plurality of attention fusion modules, the depth separable pyramid pooling module carries out context enhancement on the highest-layer features extracted by the encoder, the attention fusion modules integrate RGB image features and depth image features of the same layer into a cross-mode fusion feature, and a decoder of the prediction model aggregates the fusion features generated by the aggregation module from top to bottom to output a prediction graph.
- 7. The weakly-supervised camouflage target detection method of claim 6, the graffiti supervision loss comprising: and obtaining partial cross entropy loss, partial coherence loss and structural consistency loss through a predicted graph of the graffiti labeling image supervision prediction model, and summing to obtain the graffiti supervision loss.
- 8. The weakly-supervised camouflage target detection method of claim 7, the single-mode predictive loss comprising: The method comprises the steps of carrying out supervision on a predicted RGB pseudo tag output by a single-mode prediction module based on an initial RGB pseudo tag to obtain a cross entropy loss, carrying out supervision on a predicted depth pseudo tag output by the single-mode prediction module based on an initial depth pseudo tag to obtain another cross entropy loss, and summing the two cross entropy losses to obtain a doodling supervision loss.
- 9. The weakly-supervised camouflage target detection method of claim 8, the aggregate supervision loss comprising: And (3) based on the aggregation pseudo tag in the step (4), supervising the prediction graph of the prediction model to obtain weighted cross entropy loss and weighted IoU loss, and summing to obtain aggregation supervision loss.
- 10. The utility model provides a weak supervision camouflage target detecting system based on scribble mark and degree of depth information is supplementary which characterized in that includes: The data preparation module is used for acquiring an RGB image and a depth image containing a camouflage target, and carrying out graffiti annotation on the RGB image to obtain a graffiti annotation image; the initial pseudo tag module is used for respectively processing the RGB image and the depth image by utilizing a super-pixel segmentation algorithm and generating an initial RGB pseudo tag and an initial depth pseudo tag by combining the graffiti labeling image; the aggregation pseudo tag module is used for respectively processing the RGB images and the depth images by taking two original PVTv networks as a single-mode prediction module to generate a predicted RGB pseudo tag and a predicted depth pseudo tag; The prediction model comprises an encoder for extracting image features, an aggregation module for fusing the features and a decoder for outputting a prediction graph, and a composite loss function of the prediction model is designed, and the prediction model is subjected to end-to-end training based on a public data set, wherein the composite loss function comprises a graffiti supervision loss based on the graffiti labeling image, a single-mode prediction loss based on the initial RGB pseudo tag and the initial depth pseudo tag and an aggregation supervision loss based on the aggregation pseudo tag; And the output module is used for processing the RGB image and the depth image to be detected through the prediction model and outputting a prediction graph of the camouflage target.
Description
Weak supervision camouflage target detection method and system based on graffiti labeling and depth information assistance Technical Field The invention belongs to the technical field of computer vision and image processing, and particularly relates to a camouflage target detection method. Background The camouflage target detection aims at identifying the object which is highly fused with the background, and has important application value in various fields. On one hand, the current mainstream method relies on full-supervised learning at the pixel level, a large amount of fine labeling data is needed, the labeling cost is extremely high, the expertise level of a labeling person is seriously relied on, popularization and application expansion of the technology are limited, for example, chinese patent CN202511209344.8 discloses a camouflage target detection method combining attention and pyramid decoding, but the camouflage target detection method relies on the fine pixel-level labeling data. On the other hand, many of the existing researches only use RGB images, and in extreme cases where the target is perfectly matched with the background color and texture, the detection performance suffers from bottlenecks. In order to reduce the labeling cost, weak supervision learning becomes a research direction. However, the existing weak supervision method faces the problems of sparse supervision signals, low quality of pseudo labels generated, fuzzy boundaries and the like in camouflage scenes, so that the detection precision is insufficient. In addition, how to make up the shortages of RGB information by effectively utilizing multi-mode information such as depth and the like and realize effective fusion of the multi-mode information under a weak supervision framework is still a technical problem to be solved. Therefore, a new method for realizing high-precision camouflage target detection by fully utilizing multi-mode information at extremely low labeling cost is urgently needed. Disclosure of Invention The invention aims to: the invention aims to overcome the defects of the prior art, provides a weak supervision camouflage target detection method and a weak supervision camouflage target detection system based on graffiti marking and depth information assistance, and the method and the system can detect the weak supervision camouflage target through the graffiti marking with extremely low cost, by combining the complementary information of RGB and depth images, a collaborative training and feature fusion mechanism is designed, and high-precision camouflage target detection is realized while the data annotation burden is remarkably reduced. The technical scheme is as follows: The invention provides a weak supervision camouflage target detection method based on graffiti labeling and depth information assistance, which comprises the following steps: step 1, acquiring an RGB image and a depth image containing a camouflage target, and carrying out graffiti labeling on the RGB image to obtain a graffiti labeling image; Step 2, respectively processing the RGB image and the depth image by using a super-pixel segmentation algorithm, and generating an initial RGB pseudo tag and an initial depth pseudo tag by combining the graffiti labeling image; Step 3, using two original PVTv networks as a single-mode prediction module to respectively process the RGB image and the depth image to generate a predicted RGB pseudo tag and a predicted depth pseudo tag; Step 4, constructing a prediction model, including an encoder for extracting image features, an aggregation module for fusing the features and a decoder for outputting a prediction graph, designing a composite loss function of the prediction model, and performing end-to-end training on the prediction model based on a public data set; the composite loss function comprises a graffiti supervision loss based on the graffiti annotation image, a single-mode prediction loss based on the initial RGB pseudo tag and the initial depth pseudo tag and an aggregation supervision loss based on the aggregation pseudo tag; and 5, inputting the RGB image and the depth image to be detected into a trained prediction model, and outputting a prediction graph of the camouflage target. Further, the processing the RGB image and the depth image using a super-pixel segmentation algorithm includes: Using a super-pixel segmentation algorithm to the RGB image to obtain super-pixel areas of a plurality of RGB images, and obtaining an intersection of the graffiti labeling image and the super-pixel areas of the RGB image to obtain an initial RGB pseudo tag; And using a super-pixel segmentation algorithm to the depth image to obtain super-pixel areas of a plurality of depth images and super-pixel areas of the depth image, and taking an intersection of the graffiti annotation image and the super-pixel areas of the depth image to obtain an initial depth pseudo tag. Furthermore, when the single-mode prediction module