CN-120495110-B - Pulse neural network-based multi-degradation scene binocular image self-adaptive enhancement method
Abstract
The invention relates to a pulse neural network-based multi-degradation scene binocular image self-adaptive enhancement method, and belongs to the field of image processing. The method comprises the steps of constructing a pulse neural network model for enhancing the multi-degradation scene binocular image, wherein the network model comprises a first branch and a second branch which both adopt encoder-decoder structures, performing characteristic interaction between encoder blocks and decoder blocks of the first branch and the second branch through a pulse stereo cross attention module, preparing a network model constructed by training a data set, evaluating the trained network model, judging whether the image restoration effect of the network model meets the performance requirement, and retraining if the image restoration effect of the network model does not meet the performance requirement, and finally, recovering the multi-degradation scene binocular image by adopting the network model which is completed by training and evaluated. The method can greatly reduce the calculation energy consumption, improve the calculation efficiency of the pulse network, has a good removal effect on rain lines and rain drops in the image, and has universality.
Inventors
- XIE JIN
- XU RONGHUA
- ZHAO MINGZHU
- WU YULONG
- NIE JING
Assignees
- 重庆大学
Dates
- Publication Date
- 20260508
- Application Date
- 20250521
Claims (4)
- 1. The method for adaptively enhancing the multi-degradation scene binocular image based on the impulse neural network is characterized by comprising the following steps of: firstly, constructing a pulse neural network model for multi-degradation scene binocular image enhancement, wherein the network model comprises a first branch and a second branch, the first branch and the second branch adopt encoder-decoder structures, and after the outputs of the encoder blocks of the first branch and the second branch are combined through a pulse stereo cross attention module between the encoder blocks of the first branch and the second branch and the decoder blocks of the second branch, the outputs of the encoder blocks of the first branch and the second branch are respectively input into the decoder blocks of the first branch and the second branch; The first and second branches have the same encoder-decoder structure; the encoder block comprises first to fourth layers of encoders which are sequentially connected, wherein the first layer of encoder comprises a convolution module and a feature extraction block, and the third to fourth layers of encoders comprise a downsampling module and a feature extraction block; the decoder block comprises a first layer decoder, a second layer decoder, a third layer decoder and a fourth layer decoder which are sequentially connected, wherein the first layer decoder, the second layer decoder and the third layer decoder comprise an up-sampling module and a feature extraction block; the feature extraction block comprises two cascaded pulse residual blocks, wherein the output of the subsequent pulse residual block is added with the input of the feature extraction block to be used as the output of the feature extraction block; The pulse stereo cross attention module comprises a first pulse separable convolution module, a second pulse separable convolution module and a first pulse convolution unit, wherein the first pulse separable convolution module and the second pulse separable convolution module receive the output of the first branch encoder block A third and fourth pulse separable convolution module receives the output of the second constituent encoder block The output of the third pulse separable convolution module is subjected to matrix transposition and then multiplied by the output of the second pulse separable convolution module to obtain characteristics Features of Multiplying the transposed matrix with the output of the first pulse separable convolution module to obtain features Features of Enters a second pulse convolution unit I, and the output of the second pulse convolution unit I is processed and then is first combined with the output of the second pulse convolution unit I Adding, and performing matrix dimension remodeling on the added result to obtain characteristics The features are As input to the second branch decoder block, features Multiplying the output of the fourth pulse separable convolution module to obtain a characteristic Features of Enters a first pulse convolution unit I, and the output of the first pulse convolution unit I is processed and then is first combined with the output of the second pulse convolution unit I Adding, and performing matrix dimension remodeling on the added result to obtain characteristics The features are As an input to the first constituent decoder block; Secondly, preparing a data set containing a rain line image or a data set containing a rain drop image, carrying out data enhancement, and then training a constructed impulse neural network model, and optimizing model parameters by calculating a loss value between an output image and a real image of the impulse neural network model; then, evaluating the trained impulse neural network model, judging whether the image recovery effect of the impulse neural network model meets the performance requirement, and retraining if the image recovery effect does not meet the performance requirement; and finally, recovering the multi-degradation scene binocular image by adopting the pulse neural network model which is trained and evaluated.
- 2. The method of claim 1, wherein the first through fourth pulse separable convolution modules have the same structure, comprising a pulse convolution unit I, a pulse neuron, a3 x 3 depth-wise convolution layer, a1 x 1 convolution layer, and a batch normalization layer connected in sequence.
- 3. The method of claim 1, wherein the data enhancement includes random cropping of any original image in the dataset, randomly selecting a sub-region of a particular size from the original image as a new image, and horizontal flipping of the image along the vertical central axis to generate the new image.
- 4. The method of claim 1, wherein evaluating the trained impulse neural network model comprises evaluating an image restoration effect of the impulse neural network model using peak signal-to-noise ratio and a structural similarity index as evaluation indicators, and calculating energy consumption of the impulse neural network model by a synaptic operator to evaluate energy consumption advantage of the impulse neural network model.
Description
Pulse neural network-based multi-degradation scene binocular image self-adaptive enhancement method Technical Field The invention belongs to the field of image processing, and relates to a multi-degradation scene binocular image self-adaptive enhancement method based on a pulse neural network. Background In reality, various degradation factors including low light, low resolution, rain, blurring and the like exist in the image shot by the equipment, and the image is restored and enhanced, so that the method has great significance for daily shooting, automatic driving, outdoor monitoring and the like. The single task approach, while capable of handling specific degradation types, has significant limitations. Images in real scenes often suffer from multiple degradation effects at the same time, single-task models are difficult to effectively cope with complex situations, and independently constructing models for each degradation can lead to numerous models, large resource consumption and poor universality. The rain is taken as a common degradation factor, the characteristics of density, length, type, falling point angle and the like of the rain lead to extremely complex shape and distribution, the existence of the rain is often not isolated, the rain is often generated with other degradation factors such as fog and noise and the like, the rain removal is taken as a cutting point, and the dilemma of single task is hopefully broken through by constructing a general model capable of comprehensively treating various degradation. The traditional monocular image rain removing method predicts a rain map and restores a clear image by means of an image priori or depth convolution network, but the detail of the shielded area is difficult to restore completely due to the fact that raindrops and rain lines shield a background scene, and the problems of information deletion and blurring exist. The binocular image rain removing utilizes the complementary information of the left view and the right view, digs the relativity between the images, reduces the uncertainty of the information, provides richer basis for recovering the clear image, and has the core of effectively realizing the information interaction of the left view and the right view. However, the conventional ANN-based binocular raindrop and rain line removal method faces challenges such as insufficient cross-view information mining, high model complexity, low calculation efficiency and the like when processing complex scenes. The impulse neural network (Spike Neural Network, SNN) has unique computational advantages and low energy consumption characteristics as a new generation neural network. Unlike an Artificial neural network (Artificial NeuralNetwork, ANN), the information in the SNN is transmitted in the form of a binary pulse sequence, which is calculated only when part of neurons are activated, and the pulse-driven calculation mode greatly reduces the energy consumption of the SNN during operation. In processing image tasks, ANNs typically require a large number of multiply and accumulate operations (Multiply Accumulate, MAC), which is computationally complex, whereas impulse neurons in SNNs only produce impulses when the membrane potential reaches a threshold, which allows for low power computations by sparse synaptic accumulate operations. In order to solve the problem of the traditional binocular rain removal method, researchers propose various innovative methods. Shi et al [1] propose a binocular raindrop removal method based on a line expansion attention module (RDA), expand the attention receptive field to realize efficient information propagation of left and right images, introduce attention consistency loss to enhance left and right consistency of stereoscopic images, and perform excellent in quantitative and qualitative evaluation. CDINet [2] utilizes the global context interaction module and the local detail interaction module to explore cross-view information, and effectively improves binocular rain line and rain drop removal performance. The technical means adopted by the two documents emphasize the information interaction of the left and right images, and a good effect is achieved in the binocular rain removal task through the innovative module design and the loss function. NAFSSR [3] is applied to the task of super resolution of images, but has important reference value for binocular rain removal task. The SCAM calculates the bidirectional cross attention between the left view feature and the right view feature based on the scaling dot product attention mechanism, fuses the cross view feature and the single view feature, and can effectively utilize the complementary information of the binocular image. The SCAM module is introduced into the binocular rain removal task, so that characteristic differences of raindrops and rain lines in left and right views can be better captured, and the rain removal effect is improved. In the development of SNN,