CN-122023991-A - Deep neural network model design method for automatic target recognition through infrared and visible light image fusion in complex environment

CN122023991ACN 122023991 ACN122023991 ACN 122023991ACN-122023991-A

Abstract

The invention relates to the technical field of computer vision and discloses a deep neural network model design method for automatic target recognition of infrared and visible light image fusion in a complex environment, which comprises the following steps of S1, data enhancement fusion operation, S2, construction of an attention convolution module, S3, construction of a self-adaptive feature extraction network, S4, construction of a multiscale fusion prediction network, S5, construction of a multisystem fusion module, S6, design of a multi-task loss function and dynamic non-maximum suppression operation, and solves the problems that the recognition effect is poor due to the fact that the multi-mode feature extraction and fusion strategy is insufficient, the complex low-illumination scene, the dense scene and the large influence of model size are commonly suffered when an accurate guided weapon performs the multi-mode image fusion automatic target recognition task in the complex battlefield environment, and the problem that the automatic target recognition effect of the infrared and visible light multi-mode images of the existing accurate guided weapon is poor.

Inventors

SONG SHENMIN
LIU JINGANG
YANG YAHU
ZHANG YANSONG
LI JIAPENG

Assignees

哈尔滨工业大学

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (8)

1. A deep neural network model design method for automatic target recognition by infrared and visible light image fusion in a complex environment is characterized by comprising the following steps: step S1, data enhancement fusion operation, namely aiming at a multi-mode target image captured by a sensor in a complex battlefield environment, performing data enhancement fusion processing through the data enhancement fusion operation; step S2, constructing an attention convolution module, wherein the attention convolution module comprises an attention mechanism and a depth separable convolution and is used for carrying out optimization processing on the traditional convolution; step S3, constructing a self-adaptive feature extraction network, which is used for feature extraction of the image; s4, constructing a multi-scale fusion prediction network, wherein the multi-scale fusion prediction network is used for extracting fusion of the obtained image features; S5, constructing a multi-mode fusion module, wherein the multi-mode fusion module is used for complementarily fusing visible light and infrared image information; step S6, designing a multi-task loss function and a dynamic non-maximum suppression operation.
2. The method for designing the deep neural network model for automatic target recognition by fusion of infrared and visible light images in a complex environment according to claim 1, wherein the data enhancement fusion operation of the step S1 comprises a Mosaic data enhancement module, a Mixup data enhancement module, a geometric distortion module and a self-calibration illumination learning module, wherein the Mosaic data enhancement module is used for randomly cutting four images and then splicing the four images to one image to serve as training data, the Mixup data enhancement module is used for mixing two random samples in proportion and distributing classification results in proportion, the geometric distortion module is used for performing operations such as scale scaling, cutting, overturning, rotating and the like on the images, and the self-calibration illumination learning module is introduced for rapidly, flexibly and robustly brightening the images in a complex low-light scene while the advantages of the existing data enhancement method are reserved.
3. The method for designing a deep neural network model for automatic target recognition in a complex environment by infrared and visible light image fusion of claim 2, wherein the step S2 of depth separable convolution decomposes the conventional convolution into a depth convolution and a point convolution, the depth convolution allocates a convolution kernel to each channel of the input image features separately, each convolution kernel is only responsible for performing convolution operation on the image features of the channel, and then 1 is used 1, And compressing and combining the output results of the channels obtained by the deep convolution.
4. The method for designing the deep neural network model for automatic target recognition by infrared and visible light image fusion in the complex environment according to claim 3, wherein the step S2 attention convolution module introduces a triple attention mechanism in the convolution module.
5. The method for deep neural network model design for infrared and visible light image fusion automatic target recognition in complex environment as set forth in claim 4, wherein said step S4 comprises the specific process of first, respectively coming from convolution modules Multi-level feature map of (a) And (3) up-sampling by using a multi-step interpolation method to obtain a smoother high-resolution feature map, then carrying out weighted fusion on the feature map after up-sampling of each level, and finally, directly carrying out target category identification and position regression in the fused high-resolution feature map, wherein the multi-step interpolation is represented by the following formula: , Wherein, the The up-sampling function is represented by a function of up-sampling, Representation 1 A convolution operation is performed in 1 with the result that, 、、、 Respectively represent the feature maps after upsampling, 、、、 Respectively represent the characteristic maps output by different convolution modules, 、、、、、 The values are 0.7,0.3,0.6,0.4,0.4,0.6 respectively.
6. The method for designing the deep neural network model for automatic target recognition by fusion of infrared and visible light images in a complex environment according to claim 5, wherein the step S5 is characterized in that two modal information are complementarily fused based on spatial information and channel information of visible light and infrared image features.
7. The method for designing the deep neural network model for automatic target recognition by fusion of infrared and visible light images in a complex environment according to claim 6, wherein the total loss function design in the step S6 is as follows: , Wherein, the The method comprises the steps of representing classification loss and judging performance of a model on classification tasks; Represents regression loss, is used for judging the performance of the model on regression tasks, The specific calculation expression of (2) is as follows: , Wherein, the And Representing the abscissa of the real bounding box and the predicted bounding box respectively, Representing the euclidean distance of the real bounding box center coordinates from the predicted bounding box center coordinates, The diagonal distance representing the real bounding box and the predicted bounding box is the minimum closure area, For the purpose of measuring similarity of aspect ratios, Representing the positive weight coefficient of the vehicle, The specific calculation expression of (2) is as follows: , Wherein, the , Is an adjustable factor.
8. The method for deep neural network model design for automatic target recognition in complex environment through infrared and visible light image fusion of claim 7, wherein the dynamic non-maximum suppression operation of step S6 uses a threshold list as an action space A, uses dynamic non-maximum suppression as an agent, and uses a prediction result of the model as a state space S.

Description

Deep neural network model design method for automatic target recognition through infrared and visible light image fusion in complex environment Technical Field The invention relates to the technical field of computer vision, in particular to a deep neural network model design method for automatic target recognition by infrared and visible light image fusion in a complex environment. Background Since the formal proposal of the automatic target recognition concept in the 70 th century of the 20 th century, the development of an automatic target recognition system and technology has made great progress and achievement through the development of more than 50 years, the role and position of an automatic target recognition platform in modern high-technology warfare are continuously improved, and the automatic target recognition technology of multi-mode image fusion is one of key technologies for enabling the automatic target recognition platform to adapt to complex and changeable battlefield environments and to precisely strike various targets under the countermeasure condition of vigorous games. In a complex battlefield environment, the multi-mode image fusion automatic target mark based on deep learning is a key core technology for realizing automatic target recognition of image guidance of an automatic target recognition platform in a current period and a future period, is a basis for realizing the battlefield requirements such as no matter after emission and remote accurate striking outside a defending area, has the advantages of high precision, strong electronic interference resistance, capability of realizing damage effect evaluation and the like, and is a research hotspot in the current automatic target recognition field. In general, in the research of the deep neural network model for automatic target recognition of automatic target recognition platform image guidance in a complex battlefield environment at the present stage, the problems of insufficient feature extraction and fusion strategy, low small target detection precision, difficulty in positioning a target area in a dense scene, large model size and the like generally exist, and the application effect and the application range of the deep neural network model are severely restricted. Disclosure of Invention The invention aims to provide a deep neural network model design method for automatic target recognition by fusion of infrared and visible light images in a complex environment, so as to solve the problems in the background technology. In order to achieve the purpose, the invention provides the following technical scheme that the method for designing the deep neural network model for automatic target recognition by fusing infrared and visible light images in a complex environment comprises the following steps: step S1, data enhancement fusion operation, namely aiming at a multi-mode target image captured by a sensor in a complex battlefield environment, performing data enhancement fusion processing through the data enhancement fusion operation; step S2, constructing an attention convolution module, wherein the attention convolution module comprises an attention mechanism and a depth separable convolution and is used for carrying out optimization processing on the traditional convolution; step S3, constructing a self-adaptive feature extraction network, which is used for feature extraction of the image; s4, constructing a multi-scale fusion prediction network, wherein the multi-scale fusion prediction network is used for extracting fusion of the obtained image features; S5, constructing a multi-mode fusion module, wherein the multi-mode fusion module is used for complementarily fusing visible light and infrared image information; step S6, designing a multi-task loss function and a dynamic non-maximum suppression operation. The data enhancement fusion operation in the step S1 preferably comprises a metal data enhancement module, a Mixup data enhancement module, a geometric distortion module and a self-calibration illumination learning module, wherein the metal data enhancement module is used for randomly cutting four images and splicing the images to one image to serve as training data, the Mixup data enhancement module is used for proportionally mixing two random samples and proportionally distributing classification results, the geometric distortion module is used for performing operations such as scaling, cutting, overturning and rotating on the images, and the self-calibration illumination learning module is introduced to retain the advantages of the existing data enhancement method and simultaneously can be used for quickly, flexibly and robustly brightening the images in complex low-light scenes. Preferably, the step S2 depth separable convolution decomposes the traditional convolution into a depth convolution and a point convolution, the depth convolution allocates a convolution kernel to each channel of the input image feature, each conv