CN-121982397-A - Non-attention image classification network and method

CN121982397ACN 121982397 ACN121982397 ACN 121982397ACN-121982397-A

Abstract

The invention belongs to the technical field of computer vision and deep learning, and discloses a non-parametric attention image classification network and a method, wherein the network comprises a local feature enhancement module LFEM, and local enhancement features of channels and spaces are extracted from adjacent local areas in a self-adaptive pooling mode; the local-global feature interaction module L-GFIM fuses the global features and the local features extracted in parallel to realize the compensation of the local-global features, so that the model can better understand and utilize feature information under different scales, and utilizes a residual structure to acquire original image information, thereby avoiding information loss caused by network deepening and ensuring that the model accurately captures detailed information in images.

Inventors

LUAN HAO
Qi Xuanhao
BAO YIMIN
ZHAO MENGYING

Assignees

西安无倦边缘计算网络技术有限公司

Dates

Publication Date: 20260505
Application Date: 20260122

Claims (8)

1. A non-parametric imaging classification network, the non-parametric imaging classification network comprising: the system comprises a local feature enhancement module, a global feature enhancement module and a local-global feature interaction module; The local feature enhancement module is connected with the global feature enhancement module and the local-global feature interaction module and is used for extracting local enhancement features of the channel and the space in the adjacent local areas in a self-adaptive pooling mode; the global feature enhancement module is connected with the local feature enhancement module and the local-global feature interaction module and is used for acquiring global features of the cross-channel and the space at one time by adopting an energy function; The local-global feature interaction module is connected with the local feature enhancement module and the global feature enhancement module and is used for fusing the global features extracted in parallel with the local features and compensating the local-global features, so that the model can better understand and utilize the feature information under different scales and acquire the original image information by utilizing a residual structure.
2. The non-parametric imaging classification network of claim 1, wherein the local feature enhancement module: 1) Channel attention, global maximizing pooling operations along the spatial dimensions (i.e., height and width) of the input feature map, and extracting the feature map from the input feature map Compressed to A one-dimensional vector is thus obtained as a channel attention map, which vector is essentially a local representation of the most significant features of each channel over the entire spatial range, As the weight coefficient of each channel, the calculation process: , Channel attention selects the most representative characteristic value in each channel through maximum pooling, and calculates channel weight according to the characteristic value so as to emphasize a key channel and inhibit a non-key channel; 2) Performing global maximum pooling operation along channel dimension of input feature map, and removing feature map from input feature map Compressed to Generating a space attention weight graph to realize focusing on the most important characteristic part in the space characteristic graph, and the weight coefficient of each space The calculation process comprises the following steps: , The spatial attention is captured to the local spatial information of the whole feature map by calculating the maximum value of each spatial element on each channel and taking the maximum value as the generated spatial attention weight.
3. The non-parametric imaging classification network of claim 1, wherein the global feature enhancement module: First, an energy function is defined for each neuron: , Wherein, the And Representing the output of the target neuron and other neurons respectively, And Representing the true values of the target neuron and other neurons respectively, For the number of neurons per channel, Representing the index of the neuron element, And Neuron weights and biases, respectively; second, calculate the linear separability between the target neuron and other neurons in the same channel by minimizing the energy function, use And And adding a regular term to obtain a final energy function: , Then, the mean value of all neurons on a single channel is utilized Sum of variances To calculate the reduced model parameters to obtain the minimum energy for each location: , Wherein, the The smaller the value, the more separable the target neuron and other neurons of the current feature map are, the more significant the contribution, proving that the more important the neuron is; finally, the weighting of each neuron on the feature map is adopted Evaluating to obtain final output characteristic diagram : , Wherein, the All neurons for the feature map Value set, adding sigmoid function limits Is too large.
4. The non-parametric imaging classification network of claim 1, wherein the local-global feature interaction module: L-GFIM mainly comprises a channel attention module Spatial attention module And a 3D attention module ; Firstly, channel attention and space attention are converted into local features by using maximum pooling operation, the obtained local information is used as feature weight, then, 3D attention is used for calculating linear separability among neurons by constructing an optimized energy function to determine the importance of each neuron, and uniform global weight values in a feature map are calculated, finally, the channel attention and space attention are expanded along a reduced dimension and recombined with the 3D attention, the representation capacity of the recombined attention map is enhanced by using a sigmoid gating mechanism, the enhanced attention map is multiplied with the original input feature map to form a final output feature map, and the whole process is as follows: , Wherein, the Representing an element-by-element multiplication operation, Is a sigmoid function , As output by the PAAM module.
5. A method of implementing a non-attention image classification network in accordance with any one of claims 1-4, the method comprising: Step 1, extracting local enhancement features of channels and spaces from adjacent local areas by using a self-adaptive pooling mode through a local feature enhancement module; Step 2, acquiring global features of the cross-channel and the space at one time by adopting an energy function through a global feature enhancement module; and 3, fusing the global features and the local features extracted in parallel through a local-global feature interaction module, and compensating the local-global features, so that the model can better understand and utilize feature information under different scales, and acquire original image information by utilizing a residual structure.
6. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the non-attention image classification network method of claim 5.
7. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the inattentive image classification network method of claim 5.
8. An information data processing terminal, characterized in that the information data processing terminal is arranged to implement a non-attention seeking image classification network according to any one of claims 1-4.

Description

Non-attention image classification network and method Technical Field The invention belongs to the technical field of computer vision and deep learning, and particularly relates to a non-attention image classification network and a non-attention image classification method. Background With the rapid development of deep learning technology, convolutional Neural Networks (CNNs) have become the core method of image recognition and computer vision tasks. From AlexNet, VGGNet to ResNet, denseNet, researchers obviously improve the characteristic extraction capability and generalization performance of the network by continuously deepening the network layer number, introducing residual connection, dense connection and other modes. However, as the depth of the network increases, the number of parameters and the computational complexity of the model also increase exponentially, which brings about a great challenge in the practical application scenario with limited resources. In order to balance computing resources with model performance, attention mechanisms (AttentionMechanism) have evolved. The key idea is to imitate the selective sensing mechanism of the human visual system, and by dynamically adjusting the feature weights, the network can focus on the key areas in the image and suppress redundant background information. Existing attention mechanisms are largely divided into two categories, channel attention mechanisms (e.g., SE-Net, ECA-Net) and spatial and channel mixed attention mechanisms (e.g., CBAM, BAM). These "plug and play" modules are widely integrated into various types of backbone networks due to their flexibility. However, while existing attention modules have improved model performance to some extent, the following major technical bottlenecks and limitations are faced: First, there is a conflict between performance improvement and parameter increase. Most existing attention methods (e.g., SE-Net, CBAM) focus on designing complex sub-networks to generate attention weights. For example, SE-Net learns inter-channel correlations by compression and expansion operations through fully connected layers (FC), CBAM introduces additional convolution layers to extract spatial features. These additional structures, while effective, inevitably increase the number of Parameters (Parameters) and floating point operations (FLOPs) of the model, contrary to the original purpose of lightweight design, limiting its deployment capabilities in extremely resource constrained environments. Second, feature interactions of a single dimension limit the context modeling capabilities. Existing attention module designs mostly focus on extracting features independently along the channel or spatial dimension. For example, channel attention tends to ignore spatial structure information, and spatial attention may lose semantic association between channels. While partial hybrid attention models (e.g., BAMs, CBAMs) attempt to connect these two dimensions in series or in parallel, they typically employ a single acquisition approach that fails to effectively establish a deep association between spatial information and channel information. The processing mode of the fracturing leads to that the model easily loses the local key information of the cross-channel and the cross-space, thereby influencing the overall distinguishing performance. Finally, there is a lack of efficient interaction and complementation of local and global features. Most existing attention modules singly acquire local or global features in a serial mode. For example, relying solely on local convolution operations has difficulty capturing long-range dependencies, while relying solely on global pooling tends to ignore tiny texture details. The existing architecture is difficult to realize mutual compensation of local features and global features, so that when complex scenes (such as target shielding and large scale change) are processed, the phenomenon that the overall structure of a target is incomplete or the details of the contour of the target are lost easily occurs to the model. In summary, how to simultaneously implement joint modeling of channel-space dimensions and efficient interaction of local-global features on the premise of zero parameter increase has become a key scientific problem to be solved in current lightweight attention mechanism research. Based on the background, the invention provides a non-attention aggregation model (PAAM), and the ternary contradiction of the efficiency-precision-interaction is solved through a mathematical energy function and an adaptive pooling strategy. Through the above analysis, the problems and defects existing in the prior art are as follows: The amount of parameters present in existing attention mechanisms increases, feature dimension interactions are insufficient, and local-global information is split. Disclosure of Invention The invention provides a non-attention image classification network aiming at the problems exis