CN-121982484-A - Multi-mode remote sensing small target identification method based on co-differential mode collaborative interaction fusion
Abstract
The invention discloses a multi-mode remote sensing small target recognition method based on co-differential mode collaborative interaction fusion, which relates to the technical field of image recognition and comprises the steps of taking an infrared image and a visible light image as input, and respectively obtaining infrared enhanced fusion characteristics and visible light enhanced fusion characteristics based on feature extraction and interaction fusion of a double-flow backbone feature extraction network in a multi-mode remote sensing small target recognition model; and the two backbone feature extraction channels are subjected to corresponding convolution downsampling modules, infrared enhancement fusion features output by an output layer and visible light enhancement fusion features are spliced and then are subjected to neck feature fusion to output to obtain a predicted image with a target recognition result. According to the invention, the infrared and visible light mode characteristics are fully utilized, and the common characteristic and the difference characteristic are subjected to collaborative modeling and interactive fusion, so that the characterization level of the small target is effectively improved, and the recognition of the remote sensing small target is more robust.
Inventors
- ZHOU GUOQIANG
- LIU YATING
Assignees
- 南京邮电大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (10)
- 1. A multi-mode remote sensing small target identification method based on co-differential mode cooperative interaction fusion is characterized by comprising the following steps: acquiring an infrared image and a visible light image; taking an infrared image and a visible light image as input, and outputting a predicted image with a target recognition result based on a multi-mode remote sensing small target recognition model; The multi-mode remote sensing small target recognition model comprises a trunk and a neck which are sequentially connected, the trunk adopts a double-flow backbone feature extraction network, the double-flow backbone feature extraction network comprises two backbone feature extraction channels with the same structure, each backbone feature extraction channel comprises a plurality of convolution downsampling modules and an output layer which are sequentially connected, and the convolution downsampling modules and the output layer are respectively used for carrying out multi-level multi-mode feature extraction on an infrared image and a visible light image to obtain multi-level infrared mode features and multi-level visible light mode features; The outputs of the corresponding convolution downsampling modules in the two backbone feature extraction channels are subjected to modal feature bidirectional interaction fusion by a common-differential mode collaborative interaction fusion module to obtain two modal feature bidirectional interaction fusion features; The two-modal feature bidirectional interaction fusion feature is added with the output of the convolution downsampling module element by element to obtain an infrared enhancement fusion feature and a visible light enhancement fusion feature, and the infrared enhancement fusion feature and the visible light enhancement fusion feature are used as the input of a next convolution downsampling module or an output layer in a corresponding backbone feature extraction channel; And the two backbone feature extraction channels are subjected to corresponding convolution downsampling modules, infrared enhancement fusion features output by an output layer and visible light enhancement fusion features are spliced and then are subjected to neck feature fusion to output to obtain a predicted image with a target recognition result.
- 2. The multi-mode remote sensing small target identification method based on co-differential mode collaborative interaction fusion according to claim 1, wherein the convolution downsampling module is divided into a lower convolution downsampling module, a middle convolution downsampling module and a higher convolution downsampling module; The low-layer convolution downsampling module comprises a convolution-batch normalization-SiLU activation module, a cross-stage part connection module and a convolution-batch normalization-SiLU activation module which are connected in sequence; The middle-layer convolution downsampling module comprises a cross-stage part connecting module and a convolution-batch normalization-SiLU activating module which are connected in sequence; The high-level convolution downsampling module comprises a cross-stage part connecting module, a convolution-batch normalization-SiLU activating module and a space pyramid pooling module which are connected in sequence; the output layer adopts a cross-stage part connection module.
- 3. The multi-mode remote sensing small target recognition method based on co-differential mode collaborative interaction fusion according to claim 2, wherein the low-layer convolution downsampling module takes an infrared image or a visible light image as an input, and the middle-layer convolution downsampling module, the high-layer convolution downsampling module and the output layer take infrared enhancement fusion characteristics as inputs Visible light enhanced fusion features As input, after feature extraction, the multi-level infrared mode feature is correspondingly obtained through channel alignment and scale alignment processing Multi-level visible light modal characterization 。
- 4. The multi-mode remote sensing small target identification method based on co-differential mode cooperative interaction fusion according to claim 1, wherein the co-differential mode cooperative interaction fusion module comprises a differential feature modulation module and a shared feature integration module; The multi-level infrared mode feature Multi-level visible light modal characterization All are used as the input of two difference characteristic modulation modules, and difference operation is carried out by the difference characteristic modulation modules, wherein one difference characteristic modulation module is used for calculating the difference increment value weight brought by the infrared mode in the visible light branch The other difference characteristic modulation module is used for calculating the difference increment value weight brought by the visible light mode in the infrared branch ; The multi-level infrared mode feature And multi-level visible light modal features Element-by-element addition and then respectively with 、 Multiplying the elements to obtain infrared branch joint sharing characteristics after differential regulation and visible light branch joint sharing characteristics after differential regulation; the infrared branch joint sharing characteristic after difference regulation and the visible light branch joint sharing characteristic after difference regulation are respectively corresponding to the multi-level infrared mode characteristic Multi-level visible light modal characterization Residual error addition is carried out to obtain infrared mode characteristics after differential characteristic modulation And visible light mode characteristics modulated by the difference characteristics ; Infrared mode characteristics modulated by difference characteristics And visible light mode characteristics modulated by the difference characteristics The input sharing feature integration module performs splicing fusion, and outputs to obtain two-mode feature bidirectional interaction fusion features 。
- 5. The method for identifying the multi-mode remote sensing small target based on the co-differential mode cooperative interaction fusion according to claim 4, wherein the data processing process of the shared feature integration module comprises the following steps: the infrared mode characteristics modulated by the difference characteristics And visible light mode characteristics modulated by the difference characteristics Respectively carrying out feature refining treatment on the corresponding depth separation convolution layers, and adding the output results of the depth separation convolution layers element by element to obtain primary fusion features ; The preliminary fusion feature Respectively inputting the space-describing characteristics under different angles by adding the space-describing characteristics into a channel average pooling module and a channel maximum pooling module and fusing the space-describing characteristics ; The space is characterized by Obtaining a spatial attention weight map through a1×1 convolution layer and an activation function ; The spatial attention weighting map With preliminary fusion features Multiplying the elements by each other to obtain a two-modal feature bidirectional interaction fusion feature 。
- 6. The multi-mode remote sensing small target recognition method based on co-differential mode collaborative interaction fusion according to claim 1, wherein the multi-mode remote sensing small target recognition model further comprises a super-resolution image reconstruction branch structure in a training stage; The super-resolution image reconstruction branch structure comprises a multi-scale feature integration encoder and a high-resolution reconstruction decoder which are sequentially connected; The infrared enhancement fusion characteristics and the visible light enhancement fusion characteristics corresponding to the output of the low-layer convolution downsampling module in the two backbone characteristic extraction channels are spliced to obtain low-layer fusion splicing characteristics, and the infrared enhancement fusion characteristics and the visible light enhancement fusion characteristics corresponding to the output of the high-layer convolution downsampling module in the two backbone characteristic extraction channels are spliced to obtain high-layer fusion splicing characteristics; The low-level fusion splicing characteristic and the high-level fusion splicing characteristic are used as inputs of a multi-scale characteristic integration encoder, the low-level characteristic sequentially passes through a 3X3 convolution layer and a 1X 1 convolution layer to obtain a low-level characteristic reforming characteristic, and the low-level characteristic reforming characteristic and the low-level fusion splicing characteristic are added element by element to obtain a low-level fusion characteristic; Inputting a low-level fusion feature into a multi-scale convolution branch comprising three branches, wherein a first branch comprises a5 x 5 convolution layer and a3 x 3 convolution layer in series, a second branch comprises a3 x 3 convolution layer, and a third branch comprises a 1 x 1 convolution layer; After the output of the multi-scale convolution branches is spliced, the integration of the scale expansion information is realized through a 1X 1 convolution layer, so that multi-scale integration characteristics are obtained; Performing significance enhancement on the low-layer fusion features by utilizing the multi-scale integration features to obtain significance enhanced low-layer fusion features; the high-level fusion splicing features sequentially pass through an up-sampling layer and a1 multiplied by 1 convolution layer to obtain high-level feature alignment features; high semantic guidance enhancement is carried out on the low-level fusion features by using the high-level feature alignment features to obtain low-level fusion features with enhanced multi-receptive field fusion weights; Adding the low-layer fusion characteristics with enhanced multi-receptive field fusion weights and the low-layer fusion characteristics with enhanced saliency element by element, and obtaining encoder output through a 3X 3 convolution layer and a ReLU activation function; The decoder adopts an enhanced depth super-resolution network structure, and the encoder output is output through the enhanced depth super-resolution network structure to obtain high-quality characteristics.
- 7. The method for identifying the small multi-mode remote sensing target based on the co-differential mode collaborative interaction fusion according to claim 6, wherein the small multi-mode remote sensing target identification model further comprises a detection head in a training stage, wherein the detection head is connected with the neck output end and is used for carrying out target prediction, positioning regression and classification prediction on the spliced infrared enhanced fusion characteristics and visible light enhanced fusion characteristics to obtain target confidence, target bounding box and class probability.
- 8. The method for identifying the small multi-modal remote sensing target based on the co-differential mode collaborative interaction fusion according to claim 7, further comprising pre-training the small multi-modal remote sensing target identification model, wherein the pre-training method comprises: Acquiring a training data set, wherein the training data set consists of a plurality of paired infrared images and visible light images; preprocessing the training data set to obtain a preprocessed training data set; And training the multi-mode remote sensing small target recognition model by taking the preprocessed training data set as input, calculating a loss function in the training process, and adjusting model parameters according to the loss function until the maximum training round is reached, so as to obtain the pre-trained multi-mode remote sensing small target recognition model.
- 9. The method for identifying the small targets by multi-mode remote sensing based on co-differential mode cooperative interaction fusion according to claim 8, wherein the preprocessing comprises the following steps: Labeling the infrared image and the visible light image to enable the infrared image and the visible light image to be matched with a model input format, and normalizing the coordinates of the central point and the width and the height of a target boundary frame in the infrared image and the visible light image; Establishing a mapping relation of target categories, and converting target object category names in a target boundary box into integer category indexes; the resolution ratios of the infrared image and the visible light image are adjusted to be uniform so as to ensure that the input dimensions are consistent, and the preprocessed infrared image and the preprocessed visible light image are obtained.
- 10. The multi-mode remote sensing small target identification method based on co-differential mode collaborative interaction fusion according to claim 8, wherein the expression of the loss function is: ; ; ; Wherein, the The loss function is represented by a function of the loss, 、 、 、 、 All of which represent the weight coefficient, Indicating a loss of detection of the trunk, Representing reconstruction loss of the super-resolution image reconstruction branch structure, Indicating a loss of confidence in the target, Indicating a loss of the target bounding box, Representing the probability loss of a category, An index indicating the position of the detection head, Corresponding to three detection heads with different scales respectively, Representing a low resolution detection head for small scale features, A medium resolution detection head representing a medium scale feature, Representing a high resolution detector head for large scale features, Indicating an error weight that adjusts the confidence level of the target, Representing the error weight of the adjustment target bounding box, The error weight representing the probability of adjusting the category, Representing the output result of the super-resolution image reconstruction branch structure, A visible light image is represented by a representation, Representing the L1 norm.
Description
Multi-mode remote sensing small target identification method based on co-differential mode collaborative interaction fusion Technical Field The invention relates to a multi-mode remote sensing small target identification method based on co-differential mode collaborative interaction fusion, and belongs to the technical field of image identification. Background In recent years, along with the continuous breakthrough of computer vision technology, the remote sensing small target identification has a wide application prospect in the fields of military reconnaissance, disaster monitoring, urban management and the like. Small targets in remote sensing images are usually characterized by small size, blurred edges, weakened textures, etc., and are subject to complex background interference. Although deep learning technologies such as Convolutional Neural Network (CNN) and transducer have advanced to a certain extent in small target recognition accuracy, the existing method still has the bottlenecks of high omission ratio, high false recognition rate and the like in the face of the problem of insufficient feature expression. Therefore, how to effectively improve the robustness and accuracy of the remote sensing small target recognition is still a key technical difficulty that needs to be broken through currently. Most of the existing small target recognition methods rely on a single vision sensor for detection and recognition, and the feature extraction capability of a model to small targets is improved by means of technologies such as feature pyramids, attention mechanisms and multiple detection heads, so that the recognition performance is broken through continuously, but the detection effect of only one visual feature of a voucher is difficult to break through the bottleneck of small target information of remote sensing and insufficient accuracy caused by complex environments, so that the improvement of the stability and the robustness of small target recognition by combining multi-mode information has become one of the important directions of current target recognition research. In multi-modality features, deep fusion of infrared and visible light modalities has become the most common and efficient technical route. The thermal radiation information carried by the infrared image and the abundant texture, edge and color detail characteristics in the visible light image are fused and modeled, so that the detection and identification performances of the target can be effectively improved under complex scenes such as low illumination, shielding and background interference, and the natural limitation of a single mode in the aspects of perceptibility and robustness can be made up. However, the existing fusion method of infrared and visible light modes still has a plurality of limitations in practical design. The early cross-modal feature joint fusion mode generally adopts feature splicing, weighted summation and other modes to carry out modal feature fusion coding, generates unified fusion representation and then directly uses the unified fusion representation for the identification and classification of the subsequent detection heads, so that the high-efficiency cross-modal information utilization is realized to a certain extent, but the characteristics are extracted from the respective modes independently for identification, so that the fusion representation is insufficient. In recent years, the interactive feature fusion strategy based on the modal feature bidirectional flow realizes deeper structural complementation and semantic alignment of the modal features. One method focuses on the interactive fusion of the mode sharing characteristics, semantic consistency among modes is unified by calculating fusion weights, but differences of fine-granularity complementary structures among the modes are ignored, so that fusion expression is lack of discrimination, the other method focuses on the interactive fusion of the difference characteristics among the modes, and complementary advantages among the modes are effectively utilized by carrying out difference and differential weighting on the mode characteristics, but the common modeling of the modes is insufficient by the methods. Recently, research is further focused on the utilization of consistency features and difference features at the same time, and the results show that the method improves the feature fusion capability and the recognition performance to a certain extent, but the method is easy to cause semantic overlapping and information redundancy on two-mode feature expression due to lack of coordination modeling and constraint on the internal relationship between shared and difference information flows. Therefore, how to simultaneously consider sharing information and difference information between modes in the fusion process and realize effective collaborative modeling of the two is still a great challenge in multi-mode feature fusion research. The prior art h