CN-121999204-A - Camouflage target detection method, camouflage target detection system and camouflage target detection medium under complex background

CN121999204ACN 121999204 ACN121999204 ACN 121999204ACN-121999204-A

Abstract

The invention discloses a camouflage target detection method, a system and a medium under a complex background, which belong to the technical field of computer vision and target detection, and comprise the following steps of acquiring a camouflage target image, inputting the camouflage target image into a pre-training detection model, outputting target category and position information, replacing an original trunk network with a MobileViT lightweight network which is alternately connected by a convolution layer, an MV2 module and a Mobile ViT module based on YOLOv n improvement, replacing an original trunk end SPPF module with a Focal Modulation module, modulating characteristics by a hierarchical aggregation and gating mechanism, introducing a CPMS attention module after two C2f modules of the original neck network, executing multi-scale channel and spatial attention weighting, adding a 160X 160-scale detection head at a detection head part for detecting context sensing characteristics which are fused with Neck partial shallow details, and realizing high-precision, lightweight and real-time detection of the camouflage target under the complex scene.

Inventors

Lei Songze
DONG LEI
MA CHAOFAN
WANG QIWEN
ZHANG ZIYI
ZHANG CHONG

Assignees

西安工业大学

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (7)

1. A method for detecting a camouflage target in a complex background, the method comprising: Acquiring a camouflage target image under a complex background; Inputting the camouflage target image into a pre-trained camouflage target detection model, and outputting the category and position information of the target in the camouflage target image; the camouflage target detection model is an improved model based on YOLOv n, wherein the backfone part of the original YOLOv n model is replaced by a lightweight feature extraction network, the lightweight feature extraction network is based on a MobileViT framework, a convolution layer, a plurality of MV2 modules and a plurality of Mobile ViT modules are sequentially and alternately connected, the original SPPF module is replaced by a Focal modeling module, and CPMS attention mechanism modules are respectively added after two C2f modules in the Neck part; The light-weight feature extraction network extracts and fuses hierarchical features of a camouflage target image row to obtain multi-scale feature representation, the Focal modeling module modulates the multi-scale feature representation through a hierarchical context aggregation and space gating mechanism to obtain feature representation focused on a key target area, the CPMS module carries out parallel multi-scale channel attention and space attention weighting on features transferred by a Neck part to obtain context perception features, and a detection head detects the enhanced features fused with shallow details and deep semantics to output category and position information of targets in the camouflage target image.
2. The method for detecting a camouflage target in a complex background according to claim 1, further comprising adding a 160×160 scale detection head to the head part of the original YOLOv n model, and specifically comprising: The 160×160-scale detection head is configured to detect a context-aware feature fused with Neck parts of shallow details; The newly added 160×160 scale detection head forms an enhanced multi-scale detection system by cooperating with the original 80×80, 40×40 and 20×20 scale detection heads, and outputs the category and position information of the target in the camouflage target image.
3. The method for detecting a camouflage target in a complex background according to claim 1, wherein the lightweight feature extraction network is based on MobileViT architecture, and sequentially and alternately connected by a convolution layer, a plurality of MV2 modules and a plurality of Mobile ViT modules, and comprises the following specific procedures: the MV2 module adopts depth separable convolution extraction characteristics of an embedded residual design, RELU is used as a nonlinear activation function, and the MV2 module bears a downsampling function in a camouflage target detection model; The method comprises the steps of receiving an input feature map by a Mobile ViT module, completing local feature modeling through a first convolution layer, adjusting the number of feature channels through a second convolution layer, completing global feature modeling through Unfold operation, a Transformer module and Fold operation which are sequentially executed, restoring the number of channels of global features through a third convolution layer, splicing and fusing the restored feature map and the input feature map along channel dimensions through shortcut branches, carrying out convolution processing on the fused feature map to obtain an output result of the Mobile ViT module, and carrying out coding on local and global information in a camouflage target detection model by a Mobile ViT module.
4. The method for detecting a camouflage target in a complex background according to claim 1, wherein the Focal modeling module comprises the following specific procedures: Extracting inquiry features from an input feature map through an inquiry projection function, mapping the input feature map to a new feature space through a linear projection layer, carrying out layered context processing on the mapped features by using a plurality of depth separable convolution layers which are sequentially cascaded, sequentially carrying out GeLU activation function processing after each depth separable convolution layer, carrying out global average pooling operation after the last depth separable convolution layer, thereby generating context feature maps of a plurality of different receptive field levels, generating spatial and level perceived gating weights from the input feature map through another linear layer, carrying out weighted fusion on the gating weights and the context feature maps to obtain modulator features, carrying out element-by-element multiplication on the modulator features and the inquiry features, and outputting modulated features.
5. The method for detecting a camouflage target in a complex background according to claim 1, wherein the CPMS attention mechanism module comprises the following specific steps: Processing input features through a first multi-scale channel attention branch to obtain first weighted features, processing the input features through a second multi-scale channel attention branch to obtain second weighted features, adding the first weighted features and the second weighted features to obtain channel attention enhancement features, respectively convolving the input features by using a group of convolution kernels with different sizes by a multi-scale channel attention module to obtain a group of multi-scale feature graphs, splicing and fusing the multi-scale feature graphs to obtain fused features, carrying out global average pooling on the fused features, generating channel attention weights through a full-connection layer, and multiplying the channel attention weights and the input features element by element to obtain weighted features; Generating a spatial attention map on the channel attention enhancement features through a multi-scale depth separable convolution operation, multiplying the spatial attention map with the channel attention enhancement features element by element, and outputting final weighted features.
6. A computer system comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the camouflage target detection method of any one of claims 1 to 5 in a complex context.
7. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, which computer program, when being executed by a processor, is capable of performing the camouflage detection method in a complex background as defined in any one of claims 1 to 5.

Description

Camouflage target detection method, camouflage target detection system and camouflage target detection medium under complex background Technical Field The invention relates to the technical field of computer vision and target detection, in particular to a camouflage target detection method, a camouflage target detection system and a camouflage target detection medium under a complex background. Background Camouflage target detection is a key technical challenge in modern military reconnaissance and security monitoring, and because targets are highly fused with surrounding environments through textures, colors and forms a complex detection scene of feature confusion scale micro scene dynamics, the traditional visual method is difficult to realize reliable identification while maintaining high real-time performance. Currently, a target detection model based on deep learning has good performance in a general scene, but still faces a plurality of limitations in camouflage tasks. For example, the lightweight models such as YOLOv and the like have insufficient feature discrimination capability under a complex background, are easy to generate missed detection and false detection, the common attention mechanisms such as CBAM lack an efficient multi-scale feature interaction mechanism, are difficult to adapt to target scale change and complex background interference, and in addition, most methods do not design a special detection mechanism aiming at small targets and shielding conditions, so that the detection performance is obviously reduced under a real battlefield environment. In order to meet the requirements of edge equipment deployment and real-time processing, the model weight reduction has become an important research direction, but the existing weight reduction scheme always sacrifices detection precision while improving the speed, and particularly in a scene with a similar target and background, a small target size and dense distribution, the effective balance of the speed and the precision is difficult to realize. Disclosure of Invention The invention aims to provide a camouflage target detection method under a complex background, which realizes high-precision and light-weight real-time detection of a camouflage target under a complex scene. In order to solve the technical problems, the embodiment of the invention provides a camouflage target detection method under a complex background, which comprises the following steps: Acquiring a camouflage target image under a complex background; Inputting the camouflage target image into a pre-trained camouflage target detection model, and outputting the category and position information of the target in the camouflage target image; the camouflage target detection model is an improved model based on YOLOv n, wherein the backfone part of the original YOLOv n model is replaced by a lightweight feature extraction network, the lightweight feature extraction network is based on a MobileViT framework, a convolution layer, a plurality of MV2 modules and a plurality of Mobile ViT modules are sequentially and alternately connected, the original SPPF module is replaced by a Focal modeling module, and CPMS attention mechanism modules are respectively added after two C2f modules in the Neck part; The light-weight feature extraction network extracts and fuses hierarchical features of a camouflage target image row to obtain multi-scale feature representation, the Focal modeling module modulates the multi-scale feature representation through a hierarchical context aggregation and space gating mechanism to obtain feature representation focused on a key target area, the CPMS module carries out parallel multi-scale channel attention and space attention weighting on features transferred by a Neck part to obtain context perception features, and a detection head detects the enhanced features fused with shallow details and deep semantics to output category and position information of targets in the camouflage target image. In some alternative embodiments, the method further comprises adding a 160×160 scale detection head to the head part of the original YOLOv n model, and specifically comprises: The 160×160-scale detection head is configured to detect a context-aware feature fused with Neck parts of shallow details; The newly added 160×160 scale detection head forms an enhanced multi-scale detection system by cooperating with the original 80×80, 40×40 and 20×20 scale detection heads, and outputs the category and position information of the target in the camouflage target image. In some optional embodiments, the lightweight feature extraction network is based on MobileViT architecture, and is formed by sequentially and alternately connecting a convolution layer, a plurality of MV2 modules and a plurality of Mobile ViT modules, and the specific flow is as follows: the MV2 module adopts depth separable convolution extraction characteristics of an embedded residual design, RELU is used as a