CN-121982389-A - Combat target identification method based on multi-scale space-frequency domain feature enhancement

CN121982389ACN 121982389 ACN121982389 ACN 121982389ACN-121982389-A

Abstract

The invention belongs to the technical field of image classification and recognition processing, and relates to a combat target recognition method based on multi-scale space-frequency domain feature enhancement, which is used for establishing a multi-scale space-frequency domain feature fusion network MSSFFF-Net consisting of a dual-path space-frequency domain feature extractor DP-SFE, a multi-scale frequency domain feature enhancement module MS-FDFEM and a space-frequency domain feature fusion module SFFFM, and is used for extracting the space-frequency domain features, strengthening fine-grained expression and effectively fusing; according to the invention, through a lightweight design, spatial detail information in a video frame and inter-frame frequency domain discriminant features are effectively combined, robustness and discriminant of feature characterization are enhanced under the condition of limited labeling samples, and scene constraints of complex background, attitude dynamic change and noise interference in video information are adapted; the invention can remarkably make up the defect of insufficient characteristic abstract capacity on the shallow backbone network, and can bring certain performance improvement on the deep backbone network.

Inventors

ZHANG YOUMU
REN BINGHUA
HOU XIAOQI
LI YAOWEI
Zhao Ruoyuan
ZHAO PENGFEI
REN JIE
WANG YIMENG

Assignees

北方自动控制技术研究所

Dates

Publication Date: 20260505
Application Date: 20260119

Claims (8)

1. A battle object identification method based on multi-scale space-frequency domain feature enhancement is characterized by comprising the following steps: establishing a multi-scale space-frequency domain feature fusion network MSSFFF-Net consisting of a dual-path space-domain feature extractor DP-SFE, a multi-scale frequency domain feature enhancement module MS-FDFEM and a space-frequency domain feature fusion module SFFFM for extracting the space-frequency domain features, strengthening fine-grained expression and effectively fusing; The basic constituent unit of the multi-scale spatial-frequency domain feature fusion network MSSFFF-Net is a spatial-frequency domain fusion block SFFB, and the spatial-frequency domain fusion block SFFB performs spatial domain processing, frequency domain enhancement and feature fusion on the single-scale depth convolution feature in a pluggable and lightweight manner: After the space-frequency domain fusion block SFFB is applied to the depth convolution features of different stages, respectively carrying out adaptive space-frequency feature enhancement and fusion on backbone network features on a plurality of scales to finally form a multi-scale fusion architecture; The backbone structure of the multi-scale spatial-frequency domain feature fusion network MSSFFF-Net is composed of 4 spatial-frequency domain fusion blocks SFFB connected in series, and each spatial-frequency domain fusion block SFFB respectively comprises a dual-path spatial feature extractor DP-SFE, a multi-scale frequency domain feature enhancement module MS-FDFEM and a spatial-frequency domain feature fusion module SFFFM which are sequentially arranged.
2. The battle objective identification method based on multi-scale space-frequency domain feature enhancement as claimed in claim 1, wherein the dual-path space-domain feature extractor DP-SFE extracts local features through basic branches, and simultaneously strengthens key features by combining a lightweight grouping attention module, and the two features are fused in parallel to preserve rich space details.
3. The method for recognizing combat target based on multi-scale space-frequency domain feature enhancement according to claim 2, wherein said dual-path space-domain feature extractor DP-SFE is composed of basic feature branches and enhanced feature branches in parallel, said basic branches maintaining convolution setup consistent with backbone network for extracting underlying local details and outputting record as The enhanced branch is divided into two sub paths by adopting a lightweight grouping convolution structure, one path is subjected to 1X 1 convolution, dimension reduction and then 3X 3 grouping convolution, local features are extracted by lower parameter quantity, the other path is stacked with two layers of 3X 3 grouping convolution, the receptive field is enlarged while the channel integrity is maintained, the context representation capability is enhanced, and the outputs of the two sub paths are respectively And forming the enhanced characteristic representation after splicing and fusion: Wherein, the Representing a splice of the dimensions of the channel, Is a through-attention mechanism for adaptively highlighting key channel features; Finally, the features of the base branch and the enhancement branch are fused by element-wise addition: Wherein, the Is a residual map.
4. The combat target identification method based on multi-scale space-frequency domain feature enhancement according to claim 1, wherein the multi-scale space-frequency domain feature enhancement module MS-FDFEM converts the space features into the frequency domain, divides the frequency domain into three frequency bands of low frequency, medium frequency and high frequency according to a proportion, realizes frequency band self-adaptive enhancement through complex linear transformation and amplitude attention, and inversely converts the enhanced frequency domain features back into the space domain.
5. The method for combat target identification based on multi-scale spatial-frequency domain feature enhancement as set forth in claim 4, wherein said multi-scale spatial-frequency domain feature enhancement module MS-FDFEM first characterizes the spatial features Performing a two-dimensional fast fourier transform into a frequency domain representation Reconstructing the multiband spectrum characteristics with global structural completeness and strong discrimination along the height direction of the spectrum The method is divided into low frequency, medium frequency and high frequency, and each section is characterized in that: Wherein, the method comprises the steps of, Respectively a start index and an end index of the ith frequency band in the H dimension of the frequency spectrum; and carrying out joint transformation on each frequency band by adopting complex linear transformation to ensure the cooperative processing of real part and imaginary part information: Wherein, the Is complex linear transformation; on the basis, the self-adaptive frequency enhancement is realized by designing an amplitude-aware attention mechanism; After the single-band enhancement is completed, all the enhanced frequency bands are spliced along the height dimension, so that the complete frequency characteristics are restored; The enhanced frequency domain features are mapped back to the spatial domain through two-dimensional inverse Fourier transform and then fused with the original spatial domain features so as to fully utilize the complementary advantages of the global and the local.
6. The method for recognizing a combat target based on multi-scale space-frequency domain feature enhancement according to claim 5, wherein the implementation flow of the amplitude-aware attention mechanism is divided into three steps: firstly, calculating the magnitude spectrum of complex features to measure the energy intensity of each frequency component: Then pass through The convolution channel pays attention to the mechanical learning amplitude weight: Finally, multiplying the attention weight and the complex feature element by element, thereby realizing enhancement of the key frequency components, wherein the enhancement process can be expressed as follows: Wherein, the 、 The real part and the imaginary part of the complex frequency domain characteristics are respectively, For numerical stabilization; 、、 Respectively an S type activation function, a 1x1 convolution and a linear activation function; Wherein mag (i) is the amplitude spectrum of complex characteristics of the ith frequency band, permute is the dimension rearrangement of tensors; for element-wise multiplication.
7. The method for recognizing a combat target based on multi-scale spatial-frequency domain feature enhancement according to claim 1, wherein the spatial-frequency domain feature fusion module SFFFM achieves dynamic balance between two types of features through a dual-attention mechanism, so as to obtain a more discriminant representation.
8. The method for combat target identification based on multi-scale spatial-frequency domain feature enhancement as set forth in claim 7, wherein said spatial-frequency domain feature fusion module SFFFM first characterizes the spatial-frequency domain And frequency domain features after inverse fourier transform Splicing in the channel dimension to obtain a joint feature: The method comprises the steps of obtaining a global context information from a channel attention branch through global average pooling, establishing a dependency relationship among channels by means of a fully-connected layer, capturing the spatial relevance of local areas through convolution operation by spatial attention so as to highlight a semantic area with discriminant, and obtaining a comprehensive attention map after multiplying the two attention weights For adaptively calibrating the stitching features: and then the weighted feature is subjected to dimension reduction by using 1 multiplied by 1 convolution, the channel number is compressed back to the original dimension, and the nonlinear expression capacity is enhanced by batch normalization and ReLU activation functions: and finally, adding the fusion result with the original input features through residual connection, and improving the perception capability of the model on the fine difference: wherein Concat is characteristic splicing operation, B, C, H, W is batch size, channel number, height and width of characteristic diagram, reLU is linear rectification activation function, batchNorm is batch normalization operation; The method is 1 multiplied by 1 point-by-point convolution and is used for channel transformation/dimension reduction and residual projection; is an input feature of the fusion module.

Description

Combat target identification method based on multi-scale space-frequency domain feature enhancement Technical Field The invention belongs to the technical field of image classification recognition processing, and particularly relates to a combat target recognition method based on multi-scale space-frequency domain feature enhancement. Background Image classification is one of the core problems in machine vision tasks, and Fine-grained object recognition (Fine-Grained Object Recognition, FGOR) is used as a key extension thereof, so that the method has irreplaceable application value in video information data processing. The core of fine-grained target recognition is to accurately distinguish similar target subcategories under the same large class from complex scenes of video information sequences, such as equipment of different models, vehicles or target individuals with specific identifications and the like. The core challenges of the task are not only from high similarity among classes and high diversity in the classes of the targets, but also from scene specificity of video information data, namely, the targets in video frames are easily affected by dynamic changes of gestures, background interference, inter-frame noise and local shielding, and further, the requirements of the capability of extracting fine distinguishing features (such as specific textures, structural details and the like) of the targets for the model are further improved. With the development of deep learning technology, a method based on a deep neural network has significantly progressed in capturing the target fine difference in a static image, but the dynamic nature and complexity of video information data put higher requirements on feature extraction, so that not only the local discrimination features in a single frame need to be captured, but also the consistency and dynamic association of targets between frames need to be considered. However, in an actual video intelligence processing scenario, fine-grained object recognition often faces a serious data scarcity problem, namely small sample Fine-grained object recognition (Few-Shot Fine-Grained Object Recognition, FSFGOR). Under such a scene, only a small amount of effective labeling video frames of specific targets (such as rare equipment models and specific monitoring objects) are often available, and the model is required to learn the feature expression with both fine discrimination and interframe adaptability from limited data more efficiently, so that accurate target discrimination in a complex video sequence is realized. In the prior art, the characteristic learning difficulty in a small sample environment is relieved to a certain extent by technologies such as transfer learning, model generation, measurement learning and the like, so that the model can primarily capture target differences in a limited sample. However, aiming at the specificity of video information data, the prior method still has obvious limitations that on one hand, the time sequence associated information among video frames is not fully utilized, and on the other hand, the extraction robustness of target fine features under a complex background is insufficient, so that the judgment precision and generalization capability of FSFGOR in video information processing still cannot meet the increasing fine-grained target identification requirement. Therefore, there is a need for a fine-grained object recognition method that can enhance the robustness and discrimination of feature characterization and enhance the discrimination of fine-grained classification. Disclosure of Invention The specific technical scheme of the invention is that the combat target identification method based on multi-scale space-frequency domain feature enhancement comprises the following steps: And establishing a multi-scale space-frequency domain feature fusion network MSSFFF-Net consisting of a dual-path space-domain feature extractor DP-SFE, a multi-scale frequency domain feature enhancement module MS-FDFEM and a space-frequency domain feature fusion module SFFFM for extracting the space-frequency domain features, strengthening fine-grained expression and effectively fusing. The basic constituent unit of the multi-scale spatial-frequency domain feature fusion network MSSFFF-Net is a spatial-frequency domain fusion block SFFB, and a spatial-frequency domain fusion block SFFB performs spatial domain processing, frequency domain enhancement and feature fusion on the single-scale depth convolution features in a pluggable and lightweight manner. After the space-frequency domain fusion block SFFB is applied to the depth convolution features of different stages, adaptive space-frequency feature enhancement and fusion are respectively carried out on backbone network features on multiple scales, and finally a multi-scale fusion architecture is formed. The backbone structure of the multi-scale spatial-frequency domain feature fusion network MSSFFF