CN-115496971-B - Infrared target detection method and device, electronic equipment and storage medium

CN115496971BCN 115496971 BCN115496971 BCN 115496971BCN-115496971-B

Abstract

The invention discloses an infrared target detection method, an infrared target detection device, electronic equipment and a storage medium, and relates to the field of infrared target detection, wherein the method comprises the steps of inputting a currently acquired infrared target image into a trained infrared target detection model for category detection; the training method comprises the steps of training an infrared target detection model, determining the infrared target detection model based on an attention mechanism neural network and a training data set, wherein the attention mechanism neural network comprises a feature extraction network, an attention module connected with the output end of the feature extraction network, a feature strengthening module connected with the output end of the attention module, and a target classification and detection network connected with the three output ends of the feature strengthening module, and the attention module is used for calculating fusion coefficients of original feature graphs output by the feature extraction network and determining a multi-scale feature graph. The invention can detect the infrared target image with low cost, high efficiency and accuracy.

Inventors

DING MENG
Yu Kuaikuai
LIU HAO
CHANG YAO
XU YIMING

Assignees

南京航空航天大学
中国电子科技集团公司第五十三研究所

Dates

Publication Date: 20260505
Application Date: 20221025

Claims (8)

1. An infrared target detection method, comprising: acquiring a currently acquired infrared target image; inputting the currently acquired infrared target image into a trained infrared target detection model to perform category detection; Wherein the trained infrared target detection model is determined based on an attention mechanism neural network and a training data set; the training data set comprises a plurality of infrared target images and label data corresponding to each infrared target image, wherein the label data is the category of the infrared target images; The attention mechanism neural network comprises a feature extraction network, an attention module connected with the output end of the feature extraction network, a feature strengthening module connected with the output end of the attention module, and a target classification and detection network connected with the three output ends of the feature strengthening module; The attention module is used for calculating fusion coefficients of the original feature graphs output by the feature extraction network and determining a multi-scale feature graph; The feature enhancement module specifically comprises three feature layers which are respectively positioned in a middle feature layer, a middle lower layer feature layer and a bottom layer feature layer from different positions of the attention module, wherein the bottom layer feature layer is subjected to 1 times of 1X 1 convolution adjustment channel to obtain P5, then is subjected to up-sampling and then is combined with the middle lower layer feature layer, then is subjected to feature extraction by CSPLayer to obtain P5_ upsample, the feature layer of P5_ upsample is subjected to 1 times of 1 convolution adjustment channel to obtain P4, then is subjected to up-sampling and is combined with the middle feature layer, then is subjected to feature extraction by CSPLayer to obtain P4_ upsample, the feature layer of P4_ upsample is subjected to one time of 3X 3 convolution to down-sampling and then is stacked with P4, then is subjected to feature extraction by CSPLayer to obtain P4_ downsample, the feature layer of P4_ downsample is subjected to one time of 3X 3 convolution and then is stacked with P5, and then is subjected to feature extraction by CSPLayer to obtain P5_ downsample; The target classification and detection network specifically comprises recoding an input fused characteristic image through a layer 1X 1 convolution, a BN layer and SiLU activation function, obtaining the category and a prediction frame of an infrared target image through two groups of two layers of 3X 3 convolutions, a BN layer and SiLU activation functions, wherein the infrared target category contained in each characteristic point is judged after the first group of output is activated through the layer 1X 1 convolution and sigmoid, the regression parameters of each characteristic point are respectively judged after the second group of output is activated through the layer 1X 1 convolution, the sigmoid activation and the 1X 1 convolution, and the prediction frame and whether each characteristic point contains an infrared target can be obtained after the regression parameters are adjusted.
2. The method for detecting an infrared target according to claim 1, wherein the determining process of the trained infrared target detection model is as follows: constructing a training data set; constructing an attention mechanism neural network; And training the attention mechanism neural network based on the training data set to obtain a trained infrared target detection model.
3. The method for detecting an infrared target according to claim 2, wherein the constructing an attention mechanism neural network specifically comprises: The characteristic extraction network is used for carrying out characteristic extraction on the infrared target image so as to obtain an original characteristic diagram; building an attention module; the feature enhancement module is used for carrying out feature fusion on the multi-scale feature images to obtain fused feature images; and constructing a target classification and detection network, wherein the target classification and detection network is used for carrying out category detection according to the fused feature images and the target frames, determining the categories of the infrared target images and obtaining a prediction frame.
4. The method for detecting an infrared target according to claim 1 or 3, wherein the feature extraction network is Darknet networks and has a depth of 53, and the construction process of the feature extraction network comprises: Setting up an initial network; and carrying out weight training on the initial network by adopting an ImageNet data set to obtain a characteristic extraction network.
5. An infrared target detection method according to claim 1 or 3, wherein the building process of the attention module is: constructing a channel attention module; the channel attention module is used for: Carrying out a local cross-channel interaction strategy without dimension reduction and a method for adaptively selecting one-dimensional convolution kernel size on the original feature map, and carrying out global average pooling on the original feature map in a space dimension to obtain an average pooled feature vector; after the average pooling feature vector is input into GAP aggregation convolution features without dimension reduction, a kernel k is determined in a self-adaptive mode, one-dimensional convolution is carried out, and Sigmoid function learning is carried out to obtain channel attention; The channel attention is input to the feature enhancement module.
6. An infrared target detection device, comprising: The data acquisition device is used for acquiring the currently acquired infrared target image; the category detector is used for inputting the currently acquired infrared target image into a trained infrared target detection model to perform category detection; Wherein the trained infrared target detection model is determined based on an attention mechanism neural network and a training data set; the training data set comprises a plurality of infrared target images and label data corresponding to each infrared target image, wherein the label data is the category of the infrared target images; The attention mechanism neural network comprises a feature extraction network, an attention module connected with the output end of the feature extraction network, a feature strengthening module connected with the output end of the attention module, and a target classification and detection network connected with the three output ends of the feature strengthening module; The attention module is used for calculating fusion coefficients of the original feature graphs output by the feature extraction network and determining a multi-scale feature graph; The feature enhancement module specifically comprises three feature layers which are respectively positioned in a middle feature layer, a middle lower layer feature layer and a bottom layer feature layer from different positions of the attention module, wherein the bottom layer feature layer is subjected to 1 times of 1X 1 convolution adjustment channel to obtain P5, then is subjected to up-sampling and then is combined with the middle lower layer feature layer, then is subjected to feature extraction by CSPLayer to obtain P5_ upsample, the feature layer of P5_ upsample is subjected to 1 times of 1 convolution adjustment channel to obtain P4, then is subjected to up-sampling and is combined with the middle feature layer, then is subjected to feature extraction by CSPLayer to obtain P4_ upsample, the feature layer of P4_ upsample is subjected to one time of 3X 3 convolution to down-sampling and then is stacked with P4, then is subjected to feature extraction by CSPLayer to obtain P4_ downsample, the feature layer of P4_ downsample is subjected to one time of 3X 3 convolution and then is stacked with P5, and then is subjected to feature extraction by CSPLayer to obtain P5_ downsample; The target classification and detection network specifically comprises recoding an input fused characteristic image through a layer 1X 1 convolution, a BN layer and SiLU activation function, obtaining the category and a prediction frame of an infrared target image through two groups of two layers of 3X 3 convolutions, a BN layer and SiLU activation functions, wherein the infrared target category contained in each characteristic point is judged after the first group of output is activated through the layer 1X 1 convolution and sigmoid, the regression parameters of each characteristic point are respectively judged after the second group of output is activated through the layer 1X 1 convolution, the sigmoid activation and the 1X 1 convolution, and the prediction frame and whether each characteristic point contains an infrared target can be obtained after the regression parameters are adjusted.
7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the infrared target detection method according to any one of claims 1 to 5.
8. A computer readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the infrared target detection method according to any one of claims 1 to 5.

Description

Infrared target detection method and device, electronic equipment and storage medium Technical Field The present invention relates to the field of infrared target detection, and in particular, to a method and apparatus for detecting an infrared target, an electronic device, and a storage medium. Background In recent years, image acquisition and processing technologies based on infrared image sensors are rapidly developed in the fields of traffic, security, environmental protection, and the like. The color visible light image (hereinafter referred to as color image) can obtain better target detection effect under ideal illumination condition, and can conveniently find the specific position of the interesting target in the color image and identify the category thereof by using the deep learning technology. However, in the night or in the absence of sufficient illumination, the target detection based on color images often has difficulty in achieving the desired effect, and is prone to missing detection, false detection, or even completely inoperable. Thermal infrared cameras are well suited for imaging under such conditions because they sense thermal radiation emitted by the target and are not limited by lighting conditions. However, compared to color images, thermal infrared images suffer from low contrast, insufficient texture and edge features, and the like, which make infrared object detection more difficult than object detection in color images. At present, the infrared target detection method mainly comprises a filtering-based method, a human visual system-based method, a low-rank sparse recovery-based method and a deep learning-based method. Among the four methods, the filtering-based method is suitable for single and uniform continuous background and scenes with smaller target size, the human visual system-based detection method is mainly suitable for scenes with relatively larger target brightness and relatively obvious difference from surrounding background, the low-rank sparse recovery-based method is suitable for relatively complex and rapidly-changing background, but has high computational complexity and is difficult to meet real-time requirements, the deep learning-based method mainly concentrates two-stage algorithm, such as R-CNN series, and the two-stage algorithm needs to generate proposal (a pre-selected frame possibly containing an object to be detected) firstly, then the infrared target type detection is carried out, and the algorithm needs to operate detection and classification processes for multiple times, and is complex and relatively slow. Disclosure of Invention The invention aims to provide an infrared target detection method, an infrared target detection device, electronic equipment and a storage medium, which can be used for detecting infrared target images in a low-cost, high-efficiency and accurate manner. In order to achieve the above object, the present invention provides the following solutions: in a first aspect, the present invention provides an infrared target detection method, including: acquiring a currently acquired infrared target image; inputting the currently acquired infrared target image into a trained infrared target detection model to perform category detection; Wherein the trained infrared target detection model is determined based on an attention mechanism neural network and a training data set; the training data set comprises a plurality of infrared target images and label data corresponding to each infrared target image, wherein the label data is the category of the infrared target images; The attention mechanism neural network comprises a feature extraction network, an attention module connected with the output end of the feature extraction network, a feature strengthening module connected with the output end of the attention module, and a target classification and detection network connected with the three output ends of the feature strengthening module; and the attention module is used for calculating fusion coefficients of the original feature graphs output by the feature extraction network and determining a multi-scale feature graph. Optionally, the determining process of the trained infrared target detection model is as follows: constructing a training data set; constructing an attention mechanism neural network; And training the attention mechanism neural network based on the training data set to obtain a trained infrared target detection model. Optionally, the constructing the attention mechanism neural network specifically includes: The characteristic extraction network is used for carrying out characteristic extraction on the infrared target image so as to obtain an original characteristic diagram; building an attention module; the feature enhancement module is used for carrying out feature fusion on the multi-scale feature images to obtain fused feature images; and constructing a target classification and detection network, wherein the target class