CN-121998883-A - Transformer low-light image enhancement method based on dynamic four priori
Abstract
The invention discloses a dynamic four-priori-based transform low-light image enhancement method, which comprises the steps of obtaining a low-light image dataset, dividing a training sample set, a verification sample set and a test sample set, constructing a dynamic four-priori-based transform low-light image enhancement model DQPGT, inputting the training sample set into a DQPGT model to train the model, verifying the training sample set through the verification sample set to finish training of a DQPGT model, inputting the test sample set into the trained model to carry out image enhancement and then outputting the image enhancement, and collaborative optimization of spatial and frequency domain feature characterization through a dynamic four-priori estimation and a frequency domain channel attention mechanism, so that the problems of uneven exposure, noise amplification and color distortion in the complex real illumination condition of the existing method based on the Retinex theory are solved.
Inventors
- LIU JIN
- LIU RUISEN
- ZHU MINGZHE
- WANG YAWEN
- HU PEILIN
Assignees
- 西安电子科技大学昆山创新研究院
- 西安电子科技大学
- 昆山行动者科技有限责任公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260130
Claims (9)
- 1. The method for enhancing the transducer low-light image based on the dynamic four-priori is characterized by comprising the following steps of: Acquiring a low-light image data set, and dividing a training sample set, a verification sample set and a test sample set; Constructing a model, namely constructing a dynamic four-priori-based transducer low-light image enhancement model DQPGT; Training a model, namely inputting the training sample set into a DQPGT model to train the model, and verifying through the verification sample set to finish training of the DQPGT model; and outputting a result, namely inputting the test sample set into the trained model for image enhancement and outputting.
- 2. The method for enhancing a dynamic four-prior-based transform low-light image according to claim 1, wherein the construction of the dynamic four-prior-based transform low-light image enhancement model DQPGT specifically comprises the steps of firstly generating a dynamic four-prior through a dynamic four-prior estimator, and then generating a loss healer constructed based on a transform framework based on dynamic four-prior guidance.
- 3. The method for enhancing a transducer low-light image based on dynamic four-priori according to claim 2, wherein the generation of the dynamic four-priori features is specifically as follows: Input image The input image is divided into RGB three channels with the size of ; Measuring spectral quantities of an input image, and calculating three spectral quantities by linear transformation approximation of RGB pixel values by using a Gaussian color model The following formula: (1) in the formula, Is that The matrix parameters are obtained through training, For the spatial abscissa and ordinate in the input image, As a function of the wavelength(s), , Is that For a pair of First-order and second-order partial derivatives of (a); Outputting the obtained spectral quantity and the scale parameters of the input image to a Gaussian filter, and respectively performing Gaussian filtering with adjustable scales on the three spectral components to generate basic characteristics And derivative features For a pair of At the position of Obtaining characteristics by direction deviation ; Using the spectral quantities and characteristics described above Acquiring four prior feature maps Through four priori feature graphs And acquiring dynamic four priori features.
- 4. A method of enhancing a dynamic four-prior-based transducer low-light image according to claim 3, wherein in the generation of the dynamic four-prior features, the scale parameter acquisition of the input image is specifically that the input image is to be obtained Through three convolution layer processing, the processing is alternated twice Activating the function to obtain the scale parameter of the Gaussian filter ; The features are The method is specifically obtained by the following formula: (2) (3) (4) (5) (5) in the formula, Is that In the generation step The characteristics of the individual phases are that, For convolution kernel as The number of output channels is Step size , Is used in the convolution operation of (1), The function is activated for the purpose of Silu, Is of the scale of Is a gaussian filter operation of (c).
- 5. The method for dynamic four-a priori based transducer low light image enhancement of claim 4, wherein the steps of The characteristic diagram specifically comprises the following steps: Bringing the obtained spectral features into three prior calculation formulas to obtain three dimensions Is a priori characterized by (a) The following formula: (6) (7) (8) (9) in the formula, As a function of the arc-tangent, As a logarithmic function, As a function of the tangent, Is that At the position of A component of direction; a fourth prior definition is as follows: (10) in the formula, Respectively represent The relative order of the channels in the RGB color space, the numerical range is normalized to ; Map four priori features Channel stitching to obtain features The following formula: (11) in the formula, The channel splicing operation is performed; inputting the input image into a dynamic weight generation module, and adaptively predicting four prior features according to the features of the input image The weights of (2) are as follows: (12) (13) (14) in the formula, For the first in weight generation The characteristics of the individual phases are that, Activating a function for Softmax; multiplying the obtained weight with the corresponding prior feature element by element to obtain And then the obtained characteristics The size of the convolved kernel is The number of channels is adjusted to 3 to obtain a size of Dynamic four-priori of (2) The following formula: (15) (16) in the formula, Is the Hadamard product operation of the feature map.
- 6. The method for enhancing a dynamic four-a priori based transducer low light image according to claim 2, wherein the transducer frame based loss healer comprises an encoder, an intermediate layer and a decoder, wherein the encoder and the decoder are integrated with a multi-head self-attention module And a convolution layer.
- 7. The method for enhancing a dynamic four-prior-based transform low-light image as claimed in claim 6, wherein said encoder adopts a hierarchical feature extraction architecture, and the specific process comprises the steps of firstly performing channel number lifting on an input image to complete initial feature mapping, and then passing the initial feature through a decoder The module fuses the local context and the global dependency relationship, the secondary channel number is lifted after fusion and is introduced into space downsampling, and then fusion characteristics sequentially pass through two cascaded components The module models the deep semantic association and the structural information, and finally, the channel number is promoted for three times to finish the fusion and compression of the high-level features, and a final multi-channel feature map is output to the middle layer; The intermediate layer comprises two The module cascade comprises a first one The module receives the primary purification and enhancement of the multi-channel characteristic diagram completion characteristic output by the encoder and outputs the processing result to a second one The module is used for carrying out the second-time deep interaction and context fusion; the decoder adopts a symmetrical hierarchical upsampling structure, and the specific process comprises the steps of reducing the channel number of middle layer output characteristics through initial upsampling, splicing the characteristics after the channel reduction with the middle characteristics of the corresponding hierarchy of the encoder to finish the fusion of deep semantic information and shallow space details, and sequentially passing the fused characteristics through two cascaded structures The module is used for carrying out feature regulation and context association, carrying out secondary channel number reduction on the obtained features, splicing the features with early intermediate features of the encoder, injecting bottom details, and finally, carrying out secondary fusion on the features through one The module performs feature optimization, and the optimized features are mapped and finely adjusted through the convolution layer to output an enhanced image with high visual quality.
- 8. The method for dynamic four-a priori based transform low light image enhancement according to claim 5 or 6, wherein the steps of The module is a core computing unit integrated with multiple self-attentions and feedforward network, wherein the input features first pass through a normalization layer Performing stabilization pretreatment, and then entering a dynamic four-prior-guided multi-head self-attention module The following steps By dynamically directing attention weight distribution four priors, the output of the attention module is via another normalization layer After that, send into the feedforward network Fusing and enhancing the characteristics through nonlinear transformation, wherein the specific formula is as follows: (17) (18) (19) (20) (21) in the formula, Is that In the module of The characteristics of the phases are that, Is that The input image of the module is displayed, For the purpose of normalizing the layer(s), For the matrix addition operation, For a multi-headed self-attention module with dynamic four-priori guidance, In order to feed-forward the network layer, For the frequency domain channel attention module, Is that The module outputs an image, wherein The multi-head self-attention calculation flow of (2) is as follows: (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) in the formula, For the number of channels Is a dynamic four-priori feature of (a), Is that Input features, all of size , For different stages The number of characteristic processing channels in the module, For the channel splitting operation, As an input feature after splitting the multiple heads, For the dynamic four priori features after multi-head splitting, For the number of heads to be counted, Refers to the input feature of a certain head, the size is , For dynamic four priori features of a certain head, the size is ; To dimension the characteristic Conversion to , In order to split the input features of a subsequent head, To split the dynamic four-prior feature of a post head, Is the first Individual query, key and value, all of size , Is the first A plurality of all-connection layers are arranged on the substrate, Is the first The result of the attention calculation of the individual head, For the matrix transposition operation, Is the first A trainable scaling parameter for an individual head; to dimension the characteristic Conversion to , The function is activated for the purpose of Gelu, Is the first Position-coded information of individual header of size ; For the characteristic output of a certain head, the size is ; Is that Output features of size of 。
- 9. The method for enhancing a dynamic four-prior-based transducer low-light image according to claim 1, wherein the training model specifically comprises: setting initialization iteration parameters; From a training sample set Is randomly selected without being put back The training samples are input into the model DQPGT as original images to generate an enhanced result graph Using average absolute error loss Loss of perception Loss of structural similarity Weighted generative overall loss function To calculate enhanced results And a real image Updating the weight of the model; Judging whether the training sample set is completed If yes, performing next verification, otherwise, continuing to input the model DQPGT for weight update; will verify the collection Inputting a model of the current training stage to perform forward reasoning, calculating a peak signal-to-noise ratio PSNR and a structural similarity index SSIM on a verification set, judging that the model is converged or fitted if two indexes are not lifted in a plurality of continuous periods, and stopping training in advance, otherwise, performing the next judgment; If the model is not converged or is over-fitted, judging inequality of the initialization parameter and the maximum iteration parameter If yes, obtaining a trained transducer low-light image enhancement model DQPGT based on dynamic four-priori, otherwise, initializing parameters And performs weight updating.
Description
Transformer low-light image enhancement method based on dynamic four priori Technical Field The invention relates to the technical field of low-light image enhancement, in particular to a dynamic four-priori-based transform low-light image enhancement method. Background The low-light image enhancement technology is an important research direction in the fields of computer vision and image processing, and aims to improve the quality of images acquired under the condition of insufficient illumination, and improve the brightness, contrast and detail visibility of the images so as to meet the high requirements on the vision quality and information integrity of the images in practical applications such as automatic driving, security monitoring, medical imaging, unmanned aerial vehicle navigation and the like. In low-light environment, serious degradation phenomena of the image, such as uneven exposure, noise interference, color distortion and the like, often occur, so that not only is human visual perception influenced, but also performance of high-level visual tasks is hindered. Image enhancement techniques face many challenges in complex low light scenes. The traditional methods mainly comprise histogram equalization, gamma correction and other technologies based on pixel adjustment, and although the methods can directly improve the image brightness, the methods ignore the illumination distribution characteristic and are easy to introduce artifacts under complex illumination conditions. The method based on the Retinex theory realizes enhancement of physical interpretability by decomposing an image into a reflection component and an illumination component, but the method is difficult to adapt to complex and changeable illumination conditions in the real world due to smooth illumination, so that the decomposed reflection component is distorted, and further the problems of noise amplification, color shift, structural distortion and the like in the enhanced image are caused. In recent years, the development of low-light image enhancement is remarkably promoted by a deep learning technology, and an early Convolutional Neural Network (CNN) based method comprises an end-to-end model and a multi-stage Retinex decomposition frame, wherein the former model is frequently provided with visual unnaturalness in enhancement effect, and the latter model is high in training complexity. Unsupervised methods mitigate the dependence on paired data by counterlearning and curve estimation strategies, but still have limitations in suppressing noise and color shift. With the wide application of the transducer architecture in visual tasks, the global modeling capability is introduced into the low light enhancement field, and the work such as UFORMer, SNR-Net, retinexformer and the like improves the illumination recovery and detail reconstruction effects by using a self-attention mechanism. However, most of the conventional Transformer methods rely on a feature map obtained by Retinex decomposition as a priori guidance, which still has obvious defects in terms of color fidelity and structure retention, and further improvement of model performance is limited. The prior art is as follows: CN202411939652 discloses an image enhancement method combining Retinex theory and wavelet transform, and training and reasoning under a transducer framework. According to the method, disturbance items are introduced into a Retinex model to simulate an image degradation process in a low-light environment more accurately, meanwhile, wavelet transformation is utilized to decompose an image into high-frequency components and low-frequency components, and enhancement and noise reduction are respectively carried out to preserve details. In addition, a gating fusion module is designed, up-and-down sampling characteristics in an encoder and a decoder are adaptively fused, noise suppression and detail reservation are balanced, a global dependency relationship is modeled by means of a self-attention mechanism of a transducer, and an enhancement effect is improved. CN202411391478 discloses a method combining convolution, transform and diffusion models, enhancing low-luminance images by low-light feature priors, improving extremely dark areas and reducing noise artifacts. The method comprises the steps of recovering local details through a convolution network, guiding a transducer to repair an extremely dark area through a feature map, then fusing features to obtain a coarse granularity result, and carrying out fine granularity optimization through a block decomposition diffusion model. The method has the defects that in low-light feature extraction of a model, multi-channel information is abandoned in a gray scale mode, dynamic weights are not available, fixed interpolation is used for feature fusion, weights are distributed according to a single-channel diagram, the requirements of areas cannot be dynamically adapted, and the feature characterization is incomplet