CN-122023940-A - Image classification processing method and electronic equipment
Abstract
The application provides an image classification processing method and electronic equipment, wherein the method comprises the steps of obtaining an original Swin transform model, training based on a preset QCFS activation function, obtaining a target Swin transform model after training is finished, replacing the activation function in the target Swin transform model according to a preset integral discharge neuron to obtain a target image classification model, inputting image data to be classified into the target image classification model, and classifying the image data to be classified by the target image classification model to obtain a classification result of the image data to be classified. The application can combine the high performance of the transducer with the low power consumption advantage of the SNN, thereby remarkably reducing the energy consumption and expanding the application scene of the SNN in low power consumption equipment while improving the classification accuracy of static and neuromorphic data sets.
Inventors
- ZHANG QIANG
- WANG ZHEN
- LI YUANZHUO
- CUI WEI
- HAN GANG
- GUO YIJIE
- ZHAO WEN
- SUN JINGKAI
- Su Zeran
- SUN PIHAI
- SHI SHUAI
- WANG RENPENG
- MENG XIANG
- Yong Zhe
Assignees
- 北京人形机器人创新中心有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260323
Claims (10)
- 1. An image classification processing method, characterized by comprising: obtaining an original Swin transducer model; Training the original Swin transform model based on a preset QCFS activation function, and obtaining a target Swin transform model after the training is finished, wherein the QCFS activation function is different from the activation function of the original Swin transform model, and the weight parameters of the target Swin transform model are different from the weight parameters of the original Swin transform; According to a preset integral discharge neuron, replacing an activation function in the target Swin transform model to obtain a target image classification model, wherein the weight parameter of the target image classification model is the same as the weight parameter of the target Swin transform model; Inputting the image data to be classified into the target image classification model, and classifying the image data to be classified by the target image classification model to obtain a classification result of the image data to be classified.
- 2. The image classification processing method according to claim 1, wherein the training the original Swin Transformer model based on the preset QCFS activation function, and obtaining the target Swin Transformer model after the training is finished, includes: Replacing the activation function of the original Swin transform model with the QCFS activation function, and replacing the normalization layer of the original Swin transform model with a batch normalization layer to obtain an intermediate Swin transform model; And training the intermediate Swin transducer model to obtain a target Swin transducer model.
- 3. The method according to claim 1, wherein the replacing the activation function in the target Swin Transformer model according to the preset integral discharge neuron to obtain a target image classification model includes: Replacing QCFS activation functions in the target Swin transducer model with preset integral discharge neurons; And adding a preset random mask pulse processing unit after integrating and discharging at least one neuron in the target Swin transducer model to obtain the target image classification model.
- 4. The image classification processing method according to claim 3, wherein the target image classification model comprises an input processing module, a plurality of feature processing modules and an output module which are sequentially connected, wherein each feature processing module comprises a plurality of mask pulse operation modules which are sequentially connected, and each mask pulse operation module comprises a plurality of integral discharge neurons and a plurality of random mask pulse processing units; the classifying the image data to be classified by the target image classification model to obtain a classification result of the image data to be classified, including: Processing the image data to be classified by the input processing module to generate an input characteristic vector sequence; performing feature transformation on the input feature vector sequence by each mask pulse operation module in each feature processing module based on each integral discharge nerve and each random mask pulse processing unit to obtain output pulses; And the output module obtains a classification result of the image data to be classified based on the output pulse corresponding to each mask pulse operation module.
- 5. The method according to claim 4, wherein each of the mask pulse operation modules comprises a first normalization layer, a pulse self-attention module, a second normalization layer and a pulse sensor module which are sequentially connected, wherein the pulse self-attention module comprises the integral discharge neuron and the random mask pulse processing unit; The performing, by each mask pulse operation module in each feature processing module, feature transformation on the input feature vector sequence based on each integral discharge nerve and each random mask pulse processing unit to obtain an output pulse, including: The first normalization layer carries out distribution adjustment on input pulses to obtain normalized input pulses; performing pulse self-attention operation and random mask processing on the normalized input pulse by the integral discharge neuron and the random mask pulse processing unit in the pulse self-attention module to obtain a self-attention output pulse; residual connection is carried out on the normalized input pulse and the self-attention output pulse, so that an intermediate pulse is obtained; the second normalization layer carries out distribution adjustment on the intermediate pulse to obtain a normalized intermediate pulse; The pulse sensor module performs pulse characteristic transformation on the normalized intermediate pulse to obtain a sensing output pulse; and carrying out residual connection on the self-attention output pulse and the perception output pulse to obtain an output pulse.
- 6. The image classification processing method according to claim 5, wherein the pulse self-attention module comprises a first linear layer, a first integral discharge neuron, a first random mask pulse processing unit, an attention calculating unit, a second integral discharge neuron, a second random mask pulse processing unit, and a third integral discharge neuron connected in sequence; The pulse self-attention operation and the random mask processing unit are performed on the normalized input pulse by the integral discharge neuron and the random mask pulse processing unit in the pulse self-attention module, so as to obtain a self-attention output pulse, which comprises the following steps: projecting the normalized input pulse to a query space, a key space and a value space by the first linear layer to obtain a query matrix, a key matrix and a value matrix; converting the query matrix and the key matrix into a query pulse matrix, a key pulse matrix and a value pulse matrix by the first integral discharge neuron; The first random mask pulse processing unit performs random mask processing on the query pulse matrix and the key pulse matrix to obtain a trimmed query pulse matrix and a trimmed key pulse matrix; The attention calculating unit calculates attention scores according to the pruned query pulse matrix and the pruned key pulse matrix to obtain attention scores; Converting, by the second integral discharge neuron, the attention score into an attention pulse matrix; The second random mask pulse processing unit performs random mask processing on the attention pulse matrix to obtain a pruned attention pulse matrix; Calculating the product of the value pulse matrix and the pruned attention pulse matrix to obtain an attention weighting result; The attention weighting result is converted into a self-attention output pulse by the third integral discharge neuron.
- 7. The image classification processing method according to claim 5, wherein the pulse perceptron module comprises a third linear layer, a third normalization layer, a fourth integral discharge neuron, a fourth random mask pulse processing unit, a fourth linear layer, a fourth normalization layer, a fifth integral discharge neuron, and a fifth random mask pulse processing unit, which are sequentially connected; The pulse sensor module performs pulse characteristic transformation on the normalized intermediate pulse to obtain a sensing output pulse, and the method comprises the following steps: performing dimension-lifting processing on the normalized intermediate pulse by the third linear layer to obtain a dimension-lifting matrix; Normalizing the dimension-increasing matrix by the third normalization layer to obtain a normalized dimension-increasing matrix; converting the normalized up-dimension matrix into an up-dimension pulse matrix by the fourth integral discharge neuron; the fourth random mask pulse processing unit performs random mask processing on the dimension-increasing pulse matrix to obtain a trimmed dimension-increasing pulse matrix; Performing dimension reduction processing on the trimmed dimension-increasing pulse matrix by the fourth linear layer to obtain a dimension-reducing matrix; normalizing the dimension reduction matrix by the fourth normalization layer to obtain a normalized dimension reduction matrix; converting the normalized dimension reduction matrix into a dimension reduction pulse matrix by the fifth integral discharge neuron; and the fifth random mask pulse processing unit performs random mask processing on the dimension reduction pulse matrix to obtain a trimmed dimension reduction pulse matrix serving as the perception output pulse.
- 8. The image classification processing method according to claim 4, wherein the processing of the image data to be classified by the input processing module generates an input feature vector sequence, comprising: And processing the image data to be classified by the input processing module according to the data source type of the image data to be classified to generate an input feature vector sequence.
- 9. The image classification processing method according to claim 8, wherein the input processing module comprises a reduction layer, an image block division layer and a linear embedding layer which are sequentially connected; The input processing module processes the image data to be classified according to the data source type of the image data to be classified, and generates an input feature vector sequence, which comprises the following steps: if the data source type of the image data to be classified is an event camera, generating a target tensor by the reduction layer based on the image data to be classified; dividing the target tensor by the image block dividing layer to obtain an image block sequence; and carrying out characteristic projection on the image block sequence by the linear embedding layer to obtain the input characteristic vector sequence.
- 10. An electronic device comprising a processor and a memory, the memory storing machine-readable instructions executable by the processor, the processor executing the machine-readable instructions to perform the steps of the image classification processing method of any of claims 1 to 9 when the electronic device is operating.
Description
Image classification processing method and electronic equipment Technical Field The application relates to the technical field of image processing, in particular to an image classification processing method and electronic equipment. Background The impulse neural network (Spiking Neural Network, SNN for short) offers a very potential solution for edge computation by virtue of its event-driven low-energy-consumption characteristics and biological rationality, but its performance has long been limited by complex tasks. At the same time, the transducer has achieved unprecedented high performance in the fields of computer vision and the like by virtue of its strong self-attention mechanism. Therefore, how to combine the high performance of the transducer with the low power consumption advantage of the SNN to construct a long novel neural network architecture with both functions has become a research hotspot in the current brain-like computing and artificial intelligence intersection field. Disclosure of Invention The application aims to overcome the defects in the prior art, and provides an image classification processing method and electronic equipment, so as to solve the problem that the combination of the high performance of a transducer and the low power consumption advantage of SNN cannot be realized in the field of image processing in the prior art. In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows: In a first aspect, an embodiment of the present application provides an image classification processing method, including: obtaining an original Swin transducer model; Training the original Swin transform model based on a preset QCFS activation function, and obtaining a target Swin transform model after the training is finished, wherein the QCFS activation function is different from the activation function of the original Swin transform model, and the weight parameters of the target Swin transform model are different from the weight parameters of the original Swin transform; According to a preset integral discharge neuron, replacing an activation function in the target Swin transform model to obtain a target image classification model, wherein the weight parameter of the target image classification model is the same as the weight parameter of the target Swin transform model; Inputting the image data to be classified into the target image classification model, and classifying the image data to be classified by the target image classification model to obtain a classification result of the image data to be classified. In a second aspect, another embodiment of the present application provides an image classification processing apparatus, including: the acquisition module is used for acquiring an original Swin transform model; The training module is used for training the original Swin transform model based on a preset QCFS activation function, and obtaining a target Swin transform model after the training is finished, wherein the QCFS activation function is different from the activation function of the original Swin transform model, and the weight parameters of the target Swin transform model are different from the weight parameters of the original Swin transform; The replacement module is used for replacing the activation function in the target Swin transform model according to a preset integral discharge neuron to obtain a target image classification model, and the weight parameter of the target image classification model is the same as that of the target Swin transform model; The reasoning module is used for inputting the image data to be classified into the target image classification model, and classifying the image data to be classified by the target image classification model to obtain a classification result of the image data to be classified. In a third aspect, another embodiment of the application provides an electronic device comprising a processor, a storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium in communication over the bus when the electronic device is in operation, the processor executing the machine-readable instructions to perform the steps of the method according to any of the first aspects. In a fourth aspect, another embodiment of the application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of the first aspects described above. The method has the advantages that the original Swin Transformer model is trained through a preset QCFS activation function, the target Swin Transformer model is obtained after training is finished, the target Swin Transformer model can learn to use pulses in advance to express, the activation function in the target Swin Transformer model is replaced according to a preset integral discharge neuron to obtain t