CN-122023890-A - Image recognition method and system based on lightweight dual-attention network

CN122023890ACN 122023890 ACN122023890 ACN 122023890ACN-122023890-A

Abstract

The invention provides an image recognition method and system based on a lightweight dual-attention network, which relate to the technical field of image recognition and comprise the following steps of S1, acquiring a training image and constructing a training recognition model, wherein the image recognition model comprises a dual-attention module and a classification module; the method comprises the steps of S2, carrying out feature extraction on training images by using a dual-attention module to obtain depth features, S3, carrying out classification on the depth features by using a classification module to obtain corresponding classification results and confidence degrees, adjusting an image recognition model according to the classification results, repeating the steps S2-S3 until the model converges to obtain an optimized recognition model, S4, loading the optimized recognition model on a mobile end and carrying out model quantization on the optimized recognition model to obtain an image recognition model, and S5, obtaining a local image, and carrying out recognition on the local image by using the image recognition model to obtain a corresponding image type. The invention solves the problem that the image recognition model in the prior art is not compatible with light weight and high precision.

Inventors

ZHANG NING
LV JIANFENG
LI XUAN
ZHANG ENXU
YANG JICHENG
LIU ZICHAO
JIANG HAILI

Assignees

陕西交控通宇交通研究有限公司

Dates

Publication Date: 20260512
Application Date: 20260121

Claims (10)

1. An image recognition method based on a lightweight dual-attention network is characterized by comprising the following steps: s1, acquiring a training image, and constructing a training recognition model, wherein the image recognition model comprises a double-attention module and a classification module; s2, carrying out feature extraction on the training image by using a double-attention module to obtain depth features; S3, classifying the depth features by using a classification module to obtain a corresponding classification result and confidence coefficient, adjusting the image recognition model according to the classification result, and repeating the steps S2-S3 until the model converges to obtain an optimized recognition model; S4, loading the optimized recognition model at the mobile terminal, and carrying out model quantization on the optimized recognition model to obtain an image recognition model; and S5, acquiring a local image, and identifying the local image by using an image identification model to obtain a corresponding image type.
2. The method for recognizing an image based on a lightweight dual-attention network as recited in claim 1, wherein: The dual-attention module comprises a D module, a DN module and a DAC module; the method for extracting the features of the training image by using the dual-attention module to obtain the depth features comprises the following steps: a1, extracting shallow features of a training image, performing common downsampling on the shallow features by using a D module, and then extracting first attention features of the shallow features by using a DAC module; A2, extracting middle-layer features of the first attention features, performing common downsampling on shallow-layer features by using a D module, and then extracting second attention features of the middle-layer features by using a DAC module; a3, extracting deep features of the second attention features, performing deep downsampling on shallow features by using a DN module, and then extracting deep features of the deep features by using a DAC module.
3. A method of image recognition based on a lightweight dual-attention network as recited in claim 2, wherein: After the shallow layer feature/middle layer feature/deep layer feature is obtained, calculating feature entropy of the shallow layer feature/middle layer feature/deep layer feature, and determining the number of DAC modules in the steps A1, A2 and A3 according to the feature entropy.
4. A method of image recognition based on a lightweight dual-attention network as recited in claim 2, wherein: The method for extracting the first attention feature of the shallow feature by using the DAC module comprises the following steps: Performing point convolution processing on the shallow features, and performing feature activation by using ReLU6 to obtain first convolution features; extracting channel attention features from the first convolution features to obtain fusion features; carrying out convolution processing on the fusion characteristic by sequentially adopting point convolution and depth separable convolution to obtain a second convolution characteristic; and carrying out grouping normalization on the second convolution characteristic to obtain a normalized characteristic, and then carrying out coordinate space attention characteristic extraction on the normalized characteristic to obtain a first attention characteristic.
5. The method for recognizing an image based on a lightweight dual-attention network as recited in claim 4, wherein: expanding the number of channels of the shallow features to be integral multiples of the number of the original channels when carrying out point convolution processing on the shallow features, and retracting the number of channels of the fusion features to be the same as the number of channels of the shallow features when carrying out point convolution processing on the fusion features.
6. The method for recognizing an image based on a lightweight dual-attention network as recited in claim 4, wherein: the method for extracting the channel attention characteristic from the first convolution characteristic to obtain the fusion characteristic comprises the following steps: carrying out self-adaptive local pooling on the first convolution characteristic to obtain pooled characteristic vectors, and then carrying out grouping normalization on the pooled characteristic vectors to obtain a plurality of groups of normalized characteristics; Calculating variances of each group of normalized features, converting all variances into channel weights by using a Sigmoid function, and then combining all channel weights into a first weight vector; carrying out convolution processing on the pooled feature vectors by adopting one-dimensional self-adaptive convolution to obtain self-adaptive convolution features, and activating the self-adaptive convolution features by using a Sigmoid function to generate corresponding second weight vectors; Setting a fusion coefficient, carrying out weighted fusion on the first weight vector and the second weight vector to obtain a channel attention weight, and then carrying out dynamic scaling fusion on the first convolution characteristic and the channel attention weight to obtain a fusion characteristic.
7. The method for recognizing an image based on a lightweight dual-attention network as recited in claim 4, wherein: the method for extracting the coordinate space attention characteristic of the normalized characteristic to obtain the first attention characteristic comprises the following steps: carrying out horizontal differential pooling and vertical differential pooling on the normalized features to respectively obtain horizontal feature vectors and vertical feature vectors, and then splicing the horizontal feature vectors and the vertical feature vectors to obtain differential feature vectors; Carrying out convolution processing on the differential feature vector by adopting two kinds of cavity convolution with different scales, and carrying out weighted fusion on two kinds of convolution results to obtain enhanced features; Grouping and normalizing the enhancement features, activating by using ReLU6, and then performing channel compression by using 1x1 convolution to obtain compression features; Activating the compression feature by using Sigmoid, generating initial spatial attention weight, and calibrating the initial spatial attention weight by using local response normalization to obtain standard spatial attention weight; and carrying out dynamic scaling fusion on the normalized feature and the standard space attention weight to obtain a first attention feature.
8. The method for recognizing an image based on a lightweight dual-attention network as recited in claim 7, wherein: the cavity convolution adopts two scales of 3x3 and 5x5 to carry out convolution processing on the differential feature vector.
9. The method for recognizing an image based on a lightweight dual-attention network as recited in claim 7, wherein: In performing channel compression, the number of channels of the enhancement feature is compressed to 1.
10. An image recognition system based on a lightweight dual-attention network, characterized in that the system uses an image recognition method based on a lightweight dual-attention network as claimed in any one of claims 1 to 9, comprising: The image acquisition module is used for acquiring training images and local images; the model construction module is used for constructing a training recognition model and training the training recognition model by using the training image to obtain an optimized recognition model; The mobile quantization module is used for loading the optimized recognition model at the mobile end and carrying out model quantization on the optimized recognition model to obtain an image recognition model; and the image recognition module is used for recognizing the local image by using the image recognition model to obtain a corresponding image type.

Description

Image recognition method and system based on lightweight dual-attention network Technical Field The invention relates to the technical field of image recognition, in particular to an image recognition method and system based on a lightweight dual-attention network. Background Along with the rapid development of intelligent terminals and edge computing technologies, rapid and accurate identification of image targets has become a key link for realizing intelligent decision making and automatic processing. Traditional methods relying on manual interpretation or traditional image processing are inefficient, highly subjective and difficult to cover complex scenes, and are difficult to meet the requirements of modern large-scale and automated applications. The image recognition technology based on machine vision and deep learning realizes classification and detection by automatically extracting and analyzing key features in images, and has become one of the core directions for promoting intelligent upgrading of various industries. The convolutional neural network is used as a representative model in the field of deep learning, shows strong feature extraction and recognition capability in image classification and target detection tasks, and is widely applied to various visual recognition scenes. However, in an actual deployment environment, the identification devices are mostly smartphones, mobile terminals or embedded edge computing nodes. Such devices generally have the characteristics of limited computing resources, small storage capacity, strict power consumption constraints, and the like. Although the traditional large CNN model has higher recognition precision, the traditional large CNN model has huge parameter quantity and high calculation complexity, and is difficult to realize efficient and real-time deployment and application on the resource-limited equipment. On the other hand, although a lightweight model specially designed for a resource-limited environment exists, the recognition accuracy of the model is often difficult to meet the actual demands, and particularly, the accuracy is obviously reduced when challenges such as uneven target feature distribution, fine inter-class differences, complex and changeable background and the like which are commonly existed in an image are processed. In order to improve the recognition accuracy of the lightweight model in a complex scene, attention mechanisms are widely introduced into network design. The core idea is that by calculating and giving different areas or channels in the feature map different weights, the model can be focused on the features more critical to the identification task independently, and the influence of irrelevant or interference information is restrained. The main attention mechanisms at present mainly comprise a channel attention mechanism which focuses on the importance of channels with different characteristics and enhances the expression of effective characteristics by adaptively adjusting the channel weight, a spatial attention mechanism which focuses on the importance of different spatial positions in an image and aims at precisely positioning a target or a key region, and a mixed attention mechanism which tries to perform collaborative optimization by combining the information of the two dimensions of the channel and the space. However, the existing attention mechanism is still limited when applied to the lightweight network, firstly, most attention modules contain considerable parameter amounts and calculated amounts, and after the attention modules are embedded into the lightweight network, the overall burden of the model is often increased significantly, which is contrary to the original design of lightweight. Secondly, most of existing attention mechanisms are modules with relatively single structures or simply overlapped attention in different dimensions, so that depth fusion of the multi-dimensional attention mechanism and a computing layer cannot be realized, and multi-level distinguishing features in an image are difficult to capture at the same time. In addition, the module stacking mode of the existing network is usually fixed, and the requirements of feature extraction and fusion of different layers such as shallow detail features and deep semantic features of the network cannot be flexibly met. Disclosure of Invention Aiming at the defects existing in the prior art, the invention provides an image recognition method and system based on a lightweight dual-attention network, which solve the problem that the lightweight and high-precision of an image recognition model in the prior art are not compatible. According to an embodiment of the present invention, an image recognition method based on a lightweight dual-attention network includes: s1, acquiring a training image, and constructing a training recognition model, wherein the image recognition model comprises a double-attention module and a classification module; s2, car