CN-121982287-A - Infrared weak and small target-oriented detection and identification network model and building method thereof

CN121982287ACN 121982287 ACN121982287 ACN 121982287ACN-121982287-A

Abstract

The invention discloses a detection and identification network model for infrared dim targets and a construction method thereof, and relates to the technical field of infrared dim target detection. The design of the invention introduces an asymmetric deformed windmill convolution (PinwheelConvolution, PConv) which is more sensitive to the central area of an infrared weak target and can effectively focus on the obvious features of the infrared weak target, the windmill convolution adopts an asymmetric filling strategy in design, so that convolution kernels can respectively act on different areas of an image along the horizontal direction and the vertical direction, and the directional extraction of the sensitive target features is realized.

Inventors

LI LINGXIAO
WANG XIN
TANG YU
WANG SEN
WANG XUTAO

Assignees

重庆理工大学

Dates

Publication Date: 20260505
Application Date: 20260106

Claims (8)

1. A method for establishing a detection and identification network model for infrared weak targets is characterized by at least comprising the following steps: S1, constructing a windmill type convolution module, namely a PConv module, acquiring information of different directions of a weak and small target in an asymmetric filling and grouping convolution mode, and realizing preliminary directional extraction of potential target characteristics while expanding a target receptive field; S2, designing an ACE module, wherein the ACE module pertinently extracts target features conforming to human eye vision priori, and realizes the fusion of context features with different scales, so that the infrared weak small target features are further enhanced, and the condition that important features are lost in the downsampling process of the model is reduced; S3, designing a light-weight Module of the depth separable convolution, namely a DS-Module Module, expanding a receptive field through large-kernel convolution, and introducing residual connection into the light-weight Module of the depth separable convolution; s4, constructing an encoding stage, constructing four residual layers, namely Reslayer, wherein the number of output channels of each Reslayer is 32 xgamma, 64 xgamma, 128 xgamma and 256 xgamma respectively, The parameter coefficients representing the width of the control network, each residual layer consisting of a different number ResNeSt of blocks, resNeSt of blocks Embedding PConv modules in each Reslayer to realize multi-scale feature extraction and directional feature capture; s5, in the building enhancement stage, an ACE Module and a DS-Module Module are combined, and a DS-C3k Module is adopted as a low-order feature extraction unit in the ACE Module, so that multi-scale feature enhancement and light weight treatment are realized; S6, constructing a decoding stage, constructing three up-sampling fusion layers, configuring 1×1 convolution dimension reduction parameters, adopting DySample lightweight dynamic up-sampling operators, constructing a pre-measuring head, sequentially configuring 3×3 convolution, BN layer, reLU activation function, 1×1 convolution and sigmoid activation function, and setting a probability threshold value to be 0.5; And S7, network assembly and training parameter configuration, namely sequentially connecting an encoding stage, an enhancing stage and a decoding stage, constructing to obtain a ACEPNet network model, wherein the ACEPNet network model is a complete U-shaped network, and the ACEPNet network model is used for infrared weak and small target-oriented detection and identification.
2. The method for building the infrared weak and small target-oriented detection and identification network model according to claim 1, wherein the PConv module adopts an asymmetric filling strategy, so that convolution kernels can respectively act on different areas of an image along the horizontal direction and the vertical direction, and the direction sensitivity characteristic extraction is realized.
3. The method for building the infrared small target-oriented detection and identification network model according to claim 2, wherein the PConv module application at least comprises the following steps: The input image is recorded as Wherein The images represent height, width and channel number, respectively; In order to improve the stability of the training process and the convergence speed of the network, a PConv module introduces batch normalization and SiLU activation functions after each convolution operation, wherein the SiLU activation functions comprise Sigmoid activation functions and ReLU activation functions, and the batch normalization is BN; in the first layer operation, PConv modules realize the feature extraction of direction complementation through four groups of parallel convolution operations, different asymmetric filling modes respectively correspond to convolution kernel expansion in the horizontal direction and the vertical direction, and an input image is recorded as Through the process of The size of the convolution kernel after the pixel filling in the left, right, upper and lower directions is as follows The convolution operation of the BN and SiLU activation function output feature map is re-introduced See formula (1): (1) Wherein, the Representing a convolution operation; Weights representing convolution kernels, superscript parameters The height, width and channel number of the convolution kernel k are respectively represented; representing the input image, superscript Indicating the height, width and channel number of the input image, subscript Representing the number of filling pixels of the image in the left, right, upper and lower directions respectively; the first layer four groups of parallel convolution operation results are subjected to subsequent processing through splicing operation, and the following formula (2) is shown: (2) Wherein, the Representing PConv the number of the final output characteristic diagram channels of the module; the relation between the height, width and channel number of the output characteristic diagram after the first layer convolution operation and the input characteristic diagram is shown in a formula (3): (3) Where s represents the step size in the convolution operation; In the second layer operation, first a convolution kernel of size After convolution operation of (a), introducing BN and SiLU activation functions, and finally outputting characteristic diagram See formula (4): (4) the relation between the height and width of the output characteristic diagram and the input characteristic diagram after the second layer convolution operation is shown in a formula (5): (5) The PConv module introduces a grouping convolution strategy, so that parameter overhead is obviously reduced while receptive field expansion is ensured, and good balance between high performance and high calculation efficiency is realized.
4. The method for building the infrared weak and small target-oriented detection and recognition network model according to claim 2, wherein the ACE module is a characteristic enhancement module based on a self-adaptive context branch attention mechanism, and efficient visual correlation modeling and characteristic enhancement are realized through multi-scale characteristic input.
5. The method for building the infrared weak target-oriented detection and recognition network model according to claim 3, wherein the application of the ACE module at least comprises the following steps: Multi-scale feature generation: The output characteristic diagram is recorded as The average pooling operation of lighter weight is carried out once and twice respectively to obtain And , And And (3) with Two feature maps with different sizes; At this time, the liquid crystal display device, 、 And With a different level of semantic information, Preserving the local texture, edge and detail information of the feature, Aggregating mid-range semantic information, Aggregating global context semantic information, the three features providing hierarchically progressive context information; the ACE module will obtain three feature maps 、 And As input; feature alignment and stitching: Map the characteristic map And Respectively by downsampling and upsampling to AND Splicing the channel directions with the same space size; Feature fusion and grouping: fusion by 1 x1 convolution layer to generate fusion features ; Then along the channel dimension Split into three feature sets, respectively denoted as 、 And ; Attention branches for higher order correlation modeling, i.e. for CE modules; the method is used in local low-order related modeling branches, namely a DS-C3k module is adopted; the connection for the shortcut branch includes original information; multi-branch processing: in the attention branch of the CE module, Is input into 2 parallel modules, features are extracted by the attention mechanism of the convolution stack, see equation (6): (6) the 2 enhancement features are spliced along the channel dimension to form the output of the higher-order related modeling branch See formula (7): (7) in the local low-order correlation modeling branch, a DS-C3k module is adopted to capture fine-grained local information, and the formula (8) is shown as follows: (8) the connection of the shortcut branches directly preserves the original visual information, i.e ; Output fusion: the outputs of the three branches of the attention branch, the local low-order related modeling branch and the shortcut branch of the CE module are spliced along the channel dimension and fused through a1×1 convolution layer, see formula (9): (9) Obtaining the final output of the ACE module 。
6. The method for building the infrared weak and small target-oriented detection and identification network model according to claim 5, wherein the DS-Module Module comprises a large-core depth separable volume Module, a DS-Bottleneck Module and a DS-C3k Module, and a progressive lightweight feature extraction unit is formed.
7. The method for building the infrared small target-oriented detection and identification network model according to claim 6, wherein the DS-Module Module application at least comprises the following steps: Assume that As the input feature map of each module at this stage; The large core depth separable volume module is DSConv module, is taken as a basic unit and a series of lightweight characteristic extraction modules, and obviously reduces the parameter number and the calculation complexity on the premise of not sacrificing the performance of the model; The DSConv module first extracts features through the standard depth separable convolution layer, its DSConv module convolution kernel size is the convolution kernel size in which the depth-wise convolution is performed, and then generates an output result by combining batch normalization and ReLU activation functions, see formula (10): (10) in the DS-Bottleneck modules, two DSConv modules are cascaded, and the first DSConv module is One fixed convolution kernel size is a3 x 3DSConv convolution block, and the second DSConv module is a DSConv convolution block with a fixed convolution kernel size of 5 x 5, and formula (11): (11) Meanwhile, when the input and the output have the same number of channels, the residual error is added to skip the connection so as to reserve the low-frequency information; in the DS-C3k module, after the input feature map enters the module, the input feature map is logically sent to two parallel branches, features input in the first branch are sent to DSConv X1 convolution layers for dimension reduction and then processed by the DS-Bottleneck module, features input in the second branch pass through one transverse DSConv X1 convolution branch, finally, the features of the two branches are spliced along the channel dimension, feature channels are restored by using DSConv X1 convolution layers, and the output features are fused by the 1X 1 convolution layers after being spliced.
8. A detection and identification network model for infrared weak targets is characterized by being constructed by adopting the detection and identification network model construction method for infrared weak targets according to any one of claims 1-7.

Description

Infrared weak and small target-oriented detection and identification network model and building method thereof Technical Field The invention relates to the technical field of infrared dim target detection, in particular to a detection and identification network model for infrared dim targets and a construction method thereof. Background Infrared dim objects are objects that are very small in size, weak in signal, and often lack distinct shapes and edge features in infrared images, with typical characteristics of low signal-to-noise ratio (SNR), weak contrast, and easy confusion with complex backgrounds. The targets play an important role in key fields such as sky early warning, offshore rescue, border monitoring, intelligent transportation and the like. Particularly in extreme environments, the infrared imaging technology does not depend on visible light, and can provide all-weather and full-period target monitoring capability, so that infrared small target Detection (INFRARED SMALL TARGET Detection is abbreviated as IRSTD) is one of important directions of computer vision and target recognition research. Traditional infrared small target detection methods rely mainly on model-driven image processing and background suppression algorithms, such as strategies based on local contrast, filters and multi-scale analysis. These methods perform well in a simple background, but are prone to failure under conditions of severe clutter, very weak target signals, or complex background motion. In recent years, with the development of deep learning, particularly Convolutional Neural Network (CNN) and transducer architectures, data-driven methods exhibit strong feature extraction and generalization capabilities in infrared small target detection. For example, DATRANSNET (DYNAMIC ATTENTION TRANSFORMER NETWORK) employs dynamic attention Transformer (DATrans) to extract gradient features and a global feature extraction module by simulating Central Differential Convolution (CDCs), providing a comprehensive view, aimed at extracting and retaining detailed information critical to infrared small targets. SCTRANSNET (Spatial-Channel Cross Transformer Network) a Spatial channel cross-transformation network that solves the problem of ignoring valid global information when the target is highly similar to the background by using a Spatial channel cross-attention module (SCTBs) over long-distance skip connections (SKs). The IR-TransDet (Infrared Transformer Detection Network) network combines the advantages of Convolutional Neural Network (CNN) and transducer, and can effectively extract the global semantic information and characteristics of the small target. Although the deep learning method has remarkable progress in terms of precision and robustness, the problems of large model volume and high computing resource consumption still limit the application of the deep learning method in practical deployment. For this reason, lightweight detection models are receiving a great deal of attention. Researchers try to ensure the accuracy of the detection model and greatly reduce the parameter and calculation cost through the technologies of model compression, structural heavy parameterization, depth separable convolution, knowledge distillation and the like. However, the existing methods still face some problems to be solved: 1) Most algorithms have insufficient robustness under the condition of a real complex background or shielding; 2) Training relies on large scale labeling of datasets while infrared small target datasets are limited; 3) The lightweight approach still has a trade-off between accuracy and speed. There is therefore a need to propose a new solution to the above problems. Disclosure of Invention The invention aims to provide a detection and identification network model for infrared weak and small targets and a building method thereof, which aim to solve the technical problems in the background technology. In order to achieve the purpose, the invention provides the following technical scheme that the method for constructing the detection and identification network model for the infrared weak and small targets at least comprises the following steps: S1, constructing a windmill type convolution module, namely a PConv module, acquiring information of different directions of a weak and small target in an asymmetric filling and grouping convolution mode, and realizing preliminary directional extraction of potential target characteristics while expanding a target receptive field; S2, designing a ATTENTIVE CORRELATION ENHANCEMENT module, wherein the ATTENTIVE CORRELATION ENHANCEMENT module is an ACE module, the ACE module pertinently extracts target features conforming to human eye vision priori, and realizes fusion of context features of different scales, so that infrared weak and small target features are further enhanced, and the situation that important features of a model are lost in the downsampling process is reduced; S3, design