CN-121982286-A - Camouflage target detection method and device, computer equipment and storage medium
Abstract
The application relates to a camouflage target detection method, a camouflage target detection device, computer equipment and a storage medium. The method comprises the steps of obtaining a target image to be detected, extracting to obtain feature images under different feature scales, sequentially selecting two feature image pairs under adjacent feature scales, combining the two feature image pairs to generate a plurality of feature image pairs and generate a plurality of masks, respectively performing explicit separation of foreground and background on the two feature images in the corresponding feature image pairs according to the masks to obtain a plurality of foreground features and a plurality of background features, performing feature enhancement and fusion to obtain a plurality of separation learning feature images, performing selective channel fusion on the plurality of separation learning feature images sharing the same mask to obtain a plurality of fusion feature images corresponding to the plurality of masks, performing local feature refinement processing to obtain a plurality of refinement feature images, and determining the refinement feature image corresponding to the highest feature scale as a camouflage target detection result image. The method can realize the detection of the camouflage target under the guidance of the multi-scale mask, and improves the detection accuracy.
Inventors
- LIU WEIBIN
- FU SHAOBIN
- XING WEIWEI
Assignees
- 北京交通大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251231
Claims (10)
- 1. A camouflage target detection method, comprising: Acquiring a target image to be detected, and executing multi-scale feature extraction on the target image to obtain feature images under a plurality of different feature scales; Sequentially selecting and combining two feature map pairs under adjacent feature scales to generate a plurality of feature map pairs, and generating a plurality of masks corresponding to the plurality of feature map pairs in space positions; Respectively performing explicit separation of a foreground and a background according to the mask corresponding to the two feature images in the feature image pair to obtain a plurality of foreground features and a plurality of background features, respectively performing feature enhancement and fusion on the foreground features and the background features to obtain a plurality of separation learning feature images; Based on channel correlation, respectively executing selective channel fusion on a plurality of separation learning feature graphs sharing the same mask to obtain a plurality of fusion feature graphs corresponding to the masks; and according to the feature scale, sequentially executing local feature refinement processing on the fusion feature graphs from the low feature scale to the high feature scale to obtain a plurality of refined feature graphs, and determining the refined feature graph corresponding to the highest feature scale as a camouflage target detection result graph of the target image.
- 2. The method for detecting a camouflage target according to claim 1, wherein the steps of obtaining a target image to be detected, and performing multi-scale feature extraction on the target image to obtain feature images under a plurality of different feature scales, include: Performing size normalization and pixel value normalization preprocessing on the target image to obtain a standard target image; inputting the standard target image into a preset feature extraction network, wherein the execution process of the feature extraction network comprises a first feature extraction stage, a second feature extraction stage and at least one subsequent feature extraction stage; responding to the feature extraction network to execute to the first feature extraction stage, and then executing convolution operation on the standard target image to obtain a first feature map under a first feature scale; In response to the feature extraction network executing to the second feature extraction stage, performing a downsampling operation on the first feature map to obtain a downsampled feature map, and performing a convolution operation on the downsampled feature map to obtain a second feature map having a space size smaller than the first feature map and a channel number greater than the first feature map; Responding to the feature extraction network to execute to the subsequent feature extraction stage, executing the downsampling operation on the feature image output by the previous feature extraction stage to obtain a target downsampled feature image, and executing the convolution operation on the target downsampled feature image to obtain a target feature image with a space size smaller than the feature image output by the previous feature extraction stage and a channel number greater than the feature image output by the previous feature extraction stage; and determining the first feature map, the second feature map and at least one target feature map output by the subsequent feature extraction stage as feature maps under the plurality of different feature scales.
- 3. A camouflage target detection method as recited in claim 1, wherein the sequentially selecting and combining two of the feature maps in adjacent feature scales to generate a plurality of feature map pairs and generating a plurality of masks corresponding to the plurality of feature map pairs in spatial positions includes: according to the feature scale, sequencing the feature images under the different feature scales to obtain an ordered feature image sequence; Sequentially selecting two adjacent feature graphs on a feature scale from the ordered feature graph sequence, generating a feature graph pair, determining a feature graph with a higher feature scale in the feature graph pair as a high-scale feature graph, and determining a feature graph with a relatively low feature scale in the feature graph as a low-scale feature graph; Aligning the dimensions of the low-scale feature map and the high-scale feature map, and splicing the aligned low-scale feature map and high-scale feature map in the channel dimension to obtain a combined feature map; Performing convolution operation on the combined feature map through a preset first convolution branch to obtain a first convolution branch feature map, and performing convolution operation on the combined feature map through a preset second convolution branch to obtain a second convolution branch feature map; Fusing the first convolution branch feature diagram and the second convolution branch feature diagram to obtain an intermediate mask diagram, and executing nonlinear activation operation on the intermediate mask diagram so that the output value of the intermediate mask diagram is limited in a preset value range to generate a mask diagram; the mask map is determined as a mask corresponding to the feature map pair in spatial location.
- 4. A camouflage target detection method according to claim 3, wherein the performing explicit separation of foreground and background according to the mask corresponding to the two feature maps in the feature map pair to obtain a plurality of foreground features and a plurality of background features includes: based on the spatial dimensions of the combined feature map, resizing the mask to generate an alignment mask that is spatially consistent with both the high-scale feature map and the low-scale feature map; expanding the channel dimension of the alignment mask based on the channel number of the high-scale feature map to obtain a first expansion mask, and performing element-by-element multiplication operation on the first expansion mask and the high-scale feature map to obtain a high-scale foreground feature map at least indicating a camouflage target area in the high-scale feature map; expanding the channel dimension of the alignment mask based on the channel number of the low-scale feature map to obtain a second expansion mask, and performing element-by-element multiplication operation on the second expansion mask and the low-scale feature map to obtain a low-scale foreground feature map at least indicating a camouflage target area in the low-scale feature map; Determining a first background weight map according to the complementary region of the first expansion mask, and performing element-by-element multiplication operation on the first background weight map and the high-scale feature map to obtain a high-scale background feature map at least indicating a background region in the high-scale feature map, wherein the value of the first background weight map on element by element is the difference between a constant 1 and the value of the corresponding element of the first expansion mask; Determining a second background weight map according to the complementary region of the second expansion mask, and performing element-by-element multiplication operation on the second background weight map and the low-scale feature map to obtain a low-scale background feature map at least indicating the background region in the low-scale feature map, wherein the value of the second background weight map on element by element is the difference between a constant 1 and the value of the element corresponding to the second expansion mask; And determining the high-scale foreground feature map and the low-scale foreground feature map as foreground features of the feature map pair, and determining the high-scale background feature map and the low-scale background feature map as background features of the feature map pair.
- 5. The method for detecting a camouflage target according to claim 1, wherein the feature enhancement and fusion of the foreground features and the background features respectively to obtain a plurality of separate learning feature maps comprises: according to the foreground features of the feature map pairs, determining a foreground feature map corresponding to the feature map in the feature map, and performing global pooling operation on the foreground feature map in a space dimension to generate a foreground channel description vector; Performing at least one layer of full-connection operation and nonlinear activation operation on the foreground channel description vector to obtain a foreground channel weight vector, and performing weighted scaling on the foreground feature map in a channel dimension based on the foreground channel weight vector to obtain an enhanced foreground feature map of the feature map; according to the background features of the feature map pairs, determining a background feature map corresponding to the feature map in the feature map, and performing global pooling operation on the background feature map in a space dimension to generate a background channel description vector; Performing at least one layer of full-connection operation and nonlinear activation operation on the background channel description vector to obtain a background channel weight vector, and performing weighted scaling on the background feature map in a channel dimension based on the background channel weight vector to obtain an enhanced background feature map of the feature map; And carrying out feature fusion on the enhanced foreground feature map and the enhanced background feature map to obtain a separation learning feature map corresponding to the feature map.
- 6. The method of claim 1, wherein the performing selective channel fusion on the plurality of separate learning feature maps sharing the same mask based on channel correlation to obtain a plurality of fusion feature maps corresponding to the plurality of masks includes: in response to obtaining a plurality of separation learning feature images corresponding to the mask, sequentially performing size alignment and splicing processing on the plurality of separation learning feature images to obtain an initial fusion feature image; Performing space dimension transformation on the initial fusion feature map so as to enable the features of a plurality of space positions in the initial fusion feature map to be aggregated according to channels, and obtaining a dimension transformation feature map; Performing linear transformation processing on the dimension transformation feature map through at least two linear mapping branches formed by different mapping parameters to at least obtain a first intermediate feature vector and a second intermediate feature vector; Determining a channel correlation matrix based on the similarity of the first intermediate feature vector and the second intermediate feature vector, and performing sparse selection processing on the channel correlation matrix to generate a channel selection matrix; Based on the channel selection matrix, weighting the characteristics of the corresponding channels in the dimension transformation characteristic map, and superposing the characteristics with the dimension transformation characteristic map to generate a channel selection characteristic map; and performing space dimension inverse transformation on the channel selection feature map so as to restore the channel selection feature map to a feature map form corresponding to the initial fusion feature map, and performing feature transformation operation processing on the channel selection feature map after inverse transformation to generate a fusion feature map corresponding to the mask.
- 7. The method for detecting a camouflage target according to claim 1, wherein the sequentially performing local feature refinement processing on the plurality of fused feature maps from a low feature scale to a high feature scale according to the feature scale to obtain a plurality of refined feature maps includes: Determining the feature map, extracted from the target image, of the corresponding lowest feature scale as an auxiliary feature map, and mapping the auxiliary feature map into a high-level auxiliary feature map; Sequencing a plurality of fusion feature images from low to high according to feature scales to obtain a fusion feature sequence, and placing the high-level auxiliary feature images into the forefront end of the fusion feature sequence; Selecting a next-before fusion feature map in the fusion feature sequence as a current refinement target, selecting a forefront fusion feature map in the fusion feature sequence as a current refinement auxiliary input of the current refinement target, carrying out local refinement processing on the current refinement target based on the current refinement auxiliary input to generate a refinement feature map of the current refinement target, and taking the refinement feature map as a refinement auxiliary input of a later fusion feature map of the current refinement target in the fusion feature sequence, and the like until a refinement feature map of a last fusion feature map in the fusion feature sequence is obtained; wherein, the local refinement treatment at least comprises the following steps: Performing interpolation up-sampling on the current refining auxiliary input in a space dimension to enable the space dimension of the current refining auxiliary input to be consistent with the space dimension of the current refining target, and sequentially performing splicing and channel compression on the current refining auxiliary input after up-sampling and the current refining target in a channel dimension to obtain a local fusion feature map; Dividing the local fusion feature map into a plurality of feature groups along a channel dimension, respectively performing convolution operation on the plurality of feature groups through cavity convolution kernels with different expansion rates, and setting annular residual connection among the plurality of feature groups so that the convolution output of at least one feature group serves as the additional input of another feature group to generate a plurality of updated feature groups; Splicing the plurality of updated feature groups in the channel dimension, and executing feature transformation operation processing to generate an inter-group fusion feature map; And carrying out feature addition on the inter-group fusion feature map and the current refinement target, and carrying out feature transformation operation processing on the added features to obtain a refinement feature map of the current refinement target.
- 8. A camouflage target detection device, the device comprising: the feature extraction module is used for acquiring a target image to be detected, and performing multi-scale feature extraction on the target image to obtain feature images under a plurality of different feature scales; The mask generation module is used for sequentially selecting and combining two feature map pairs under adjacent feature scales to generate a plurality of feature map pairs, and generating a plurality of masks corresponding to the plurality of feature map pairs in space positions; The separation learning module is used for respectively carrying out explicit separation of a foreground and a background according to the mask corresponding to the two feature images in the feature image pair to obtain a plurality of foreground features and a plurality of background features, respectively carrying out feature enhancement on the foreground features and the background features, and then fusing to obtain a plurality of separation learning feature images; The channel fusion module is used for respectively executing selective channel fusion on a plurality of the separation learning feature graphs sharing the same mask based on channel correlation to obtain a plurality of fusion feature graphs corresponding to the masks; And the local refinement module is used for sequentially executing local feature refinement processing on the multiple fusion feature images from the low feature scale to the high feature scale according to the feature scale to obtain multiple refinement feature images, and determining the refinement feature image corresponding to the highest feature scale as a camouflage target detection result image of the target image.
- 9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by the processor.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
Description
Camouflage target detection method and device, computer equipment and storage medium Technical Field The present application relates to the field of camouflage target detection technology, and in particular, to a camouflage target detection method, device, computer equipment, and storage medium. Background Along with the continuous improvement of the perception requirements of application scenes such as remote sensing imaging, military reconnaissance and intelligent monitoring on hidden targets, the detection of the camouflage targets gradually becomes an important research direction in the field of computer vision, wherein the camouflage targets are generally similar to the background in color and texture, the boundary of the camouflage targets in images is not obvious, the contrast is extremely low, the traditional detection method based on the saliency or single-scale features is difficult to simultaneously consider semantic discrimination capability and space detail expression capability, and stable and fine target segmentation is difficult to realize in complex background, so that the existing research gradually introduces a multi-scale feature fusion and mask guide mechanism so as to limit the learning process of the foreground and the background through masks, and the detection precision of the camouflage targets is improved. However, the existing mask guided camouflage target detection method generally only generates a single mask based on single-scale or few intermediate-scale features, and carries out cross-scale matching on the mask in a repeated up-sampling or down-sampling mode, in this mode, interpolation errors and information distortion are easily introduced in a frequent size conversion process, so that the mask boundary is blurred step by step, the explicit separation precision of a foreground and a background is reduced, meanwhile, the existing method does not fully utilize the natural complementary relation between adjacent feature scales on semantic information and space details, lacks a multi-scale mask collaborative guiding mechanism, is difficult to form stable and consistent space constraint on different-level features, and further limits the overall precision and robustness of camouflage target detection. Disclosure of Invention Based on the above, it is necessary to provide a camouflage target detection method, device, computer equipment and storage medium capable of constructing a multi-scale mask in pairs based on adjacent feature scales and performing explicit separation, selective fusion and progressive refinement learning on foreground and background in a multi-scale feature space, so as to at least solve the technical problems that interpolation distortion is introduced by single mask trans-scale repeated transformation, the foreground/background separation precision is insufficient and the collaborative modeling capability of multi-scale semantic and detail features is weak in the related art. In one aspect, there is provided a camouflage target detection method, the method comprising: Acquiring a target image to be detected, and executing multi-scale feature extraction on the target image to obtain feature images under a plurality of different feature scales; Sequentially selecting and combining two feature map pairs under adjacent feature scales to generate a plurality of feature map pairs, and generating a plurality of masks corresponding to the plurality of feature map pairs in space positions; Respectively performing explicit separation of a foreground and a background according to the mask corresponding to the two feature images in the feature image pair to obtain a plurality of foreground features and a plurality of background features, respectively performing feature enhancement and fusion on the foreground features and the background features to obtain a plurality of separation learning feature images; Based on channel correlation, respectively executing selective channel fusion on a plurality of separation learning feature graphs sharing the same mask to obtain a plurality of fusion feature graphs corresponding to the masks; and according to the feature scale, sequentially executing local feature refinement processing on the fusion feature graphs from the low feature scale to the high feature scale to obtain a plurality of refined feature graphs, and determining the refined feature graph corresponding to the highest feature scale as a camouflage target detection result graph of the target image. In one embodiment, the obtaining a target image to be detected, and performing multi-scale feature extraction on the target image to obtain feature graphs under a plurality of different feature scales includes: Performing size normalization and pixel value normalization preprocessing on the target image to obtain a standard target image; inputting the standard target image into a preset feature extraction network, wherein the execution process of the feature extraction net