CN-121981901-A - Image defogging method based on multi-scale depth fusion
Abstract
The invention discloses an image defogging method based on multi-scale depth fusion, which belongs to the technical field of computer vision, and utilizes known depth of field information to randomly generate image pairs with different fog concentrations to construct a training data set based on an atmospheric scattering model. The network architecture adopts a coder-decoder structure, wherein the coder extracts multi-scale fog related features through a multi-branch downsampling module, the decoder introduces a full-scale jump connection mechanism, feature information of all coding layers and preceding decoding layers is fused to enhance detail reconstruction capability, and a feature map output by the decoder is processed through a progressive multi-scale image recovery network to generate a high-quality defogging result. The training process jointly optimizes the L1 norm loss, the perception loss and the multiscale structure similarity loss, and effectively balances the pixel-level precision and the visual perception quality. The invention can adaptively process fog with different concentrations, and remarkably improves the edge sharpness, texture detail and color authenticity of defogging images.
Inventors
- ZHENG HAOFENG
- QIAO WEI
- LIN SHUQING
- ZHANG HANBIN
- GENG XIAOHUI
- QIU SHUMAO
- Dong Jiansong
- CHEN MINGYOU
- ZHANG JIANZHONG
- WANG ZHEN
- LIN DAMING
- WANG ENCHENG
Assignees
- 福州机场复线高速公路有限公司
- 宁德宁古高速公路有限责任公司
- 交通运输部公路科学研究所
Dates
- Publication Date
- 20260505
- Application Date
- 20251205
Claims (9)
- 1. An image defogging method based on multi-scale depth fusion is characterized by comprising the following specific steps: s1, based on an atmospheric scattering model, randomly selecting a global atmospheric light value and an atmospheric scattering coefficient value through known depth information, generating a foggy image from a foggy image, and establishing a training data set; S2, constructing a defogging network based on an encoder and a decoder, wherein the encoder comprises a first-stage encoder to a fifth-stage encoder, the decoder comprises a first-stage decoder to a fourth-stage decoder, and a network structure of the defogging network adopts a multi-scale learning module to extract defogging-related multi-scale characteristic information; S3, inputting the image to be defogged into an encoder and a decoder to obtain defogging feature images, and inputting the defogging feature images into an image recovery network to generate a final defogging image; And S4, training the defogging network by adopting a linear weighted combination of an L1 norm loss function, a perception loss function and a multi-scale structure similarity loss function as a total loss function, and inputting a single defogging image and outputting a corresponding defogging image after training.
- 2. The image defogging method based on multi-scale depth fusion according to claim 1, wherein the encoder is composed of five downsampling modules, the input feature map size is reduced step by step, the decoder is composed of four upsampling modules, and the deconvolution operation steps in each upsampling module are set to enlarge the feature map size step by step; The step S3 further includes: S31, extracting fog pattern features by using a multi-branch downsampling module in each stage of encoder, reducing the feature pattern scale through downsampling operation, and inputting the processed features to the next stage of encoder; s32, introducing a full-scale jump connection mechanism into each stage of decoder, fusing the feature images output by all five stages of encoders and the output feature images of the previous stage of decoder, and splicing after spatial alignment with the channel to generate a defogging feature image of the current decoding layer; S33, inputting the feature image output by the first-stage decoder into an image restoration network, wherein the image restoration network adopts a progressive multi-scale convolution structure to generate a final defogging image.
- 3. The image defogging method based on multi-scale depth fusion according to claim 1, wherein in S1, the global atmospheric light value is randomly selected in a range from 0.7 to 1.0, the atmospheric scattering coefficient value is randomly selected in a range from 0.6 to 1.8, and a foggy image is generated according to an atmospheric scattering model I (x) =j (x) t (x) +a (1-t (x)), wherein t (x) is a transmittance J (x) is a foggy image, and a is global atmospheric light.
- 4. The image defogging method based on multi-scale depth fusion according to claim 1, further comprising: The multi-branch downsampling module comprises three parallel branches, wherein the first branch comprises 1 convolution unit, the second branch comprises 2 convolution units, the third branch comprises 3 convolution units, and the outputs of the branches are spliced in the channel dimension to form a comprehensive characteristic representation; The convolution unit is formed by sequentially connecting point-by-point convolution, depth convolution and repeated point-by-point convolution, and batch normalization processing and activation functions are carried out after each layer of convolution; The first depth convolution of the first branch adopts a3 multiplied by 3 convolution kernel, the stride is 2, the filling parameter is 1, one half downsampling of the characteristic diagram is realized, the expansion coefficients of the depth convolutions of all layers are 1,2 and 4 in sequence, and the stride is 1.
- 5. The image defogging method based on multi-scale depth fusion according to claim 1, wherein the full-scale jump connection mechanism is used for carrying out maximum pooling operation on the feature map of the shallow encoder in each decoding layer, carrying out bilinear interpolation operation on the feature map of the deep encoder, unifying the space size and the channel number of all feature maps, splicing along the feature dimension, and carrying out batch normalization and linear rectification activation function processing to obtain the defogging feature map of the current decoding layer.
- 6. The image defogging method based on multi-scale depth fusion according to claim 1, wherein the image restoration network comprises a five-layer convolution structure, and the first layer adopts a 1 x 1 convolution kernel; the second layer adopts a 3×3 convolution kernel, the third layer adopts a 5×5 convolution kernel, and the outputs of the two are integrated by a bidirectional fusion mechanism; The fourth layer adopts 7×7 convolution kernel to model global fog concentration distribution, and the output is fed back to the fusion layer to form enhancement features; The fifth layer uses a 3 x3 convolution kernel to produce the final defogged image.
- 7. The image defogging method based on multi-scale depth fusion according to claim 1, wherein the L1 norm loss function calculates the sum of absolute values of differences between corresponding pixel values of the defogging image and the defogging image; the perception loss function calculates semantic difference based on a first layer feature map of the pre-training deep neural network; The multi-scale structure similarity loss function calculates the similarity of brightness, contrast and structure on a plurality of scales respectively, and weights and sums the similarity.
- 8. The image defogging method based on multi-scale depth fusion according to claim 2, wherein said S31 further comprises: Extracting fog pattern features in a first-stage encoder by using a first multi-branch downsampling module, reducing the feature pattern scale through downsampling operation to obtain a first downsampling feature pattern, and inputting the processed first downsampling feature pattern to a second-stage encoder; Performing downsampling operation on the first downsampling feature map by using a second multi-branch downsampling module in the second-stage encoder to reduce the feature map scale so as to obtain a second downsampling feature map, and inputting the processed second downsampling feature map to the third-stage encoder; Performing downsampling operation on the second downsampling feature map by using a third multi-branch downsampling module in the third-stage encoder to reduce the feature map scale so as to obtain a third downsampling feature map, and inputting the processed third downsampling feature map to the fourth-stage encoder; Performing downsampling operation on the third downsampling feature map by using a fourth multi-branch downsampling module in the fourth-stage encoder to reduce the feature map scale to obtain a fourth downsampling feature map, and inputting the processed fourth downsampling feature map to the fifth-stage encoder; And performing downsampling operation on the fourth downsampled feature map by using a fifth multi-branch downsampling module in the fifth-stage encoder to reduce the feature map scale so as to obtain a fifth downsampled feature map, and inputting the processed fifth downsampled feature map to the fourth-stage decoder.
- 9. The image defogging method based on multi-scale depth fusion according to claim 2, wherein said S32 further comprises: Inputting the first downsampling feature map, the second downsampling feature map, the third downsampling feature map, the fourth downsampling feature map and the fifth downsampling feature map into a fourth-stage decoder to obtain a fourth upsampling feature map; Inputting the fourth up-sampling feature map, the first down-sampling feature map, the second down-sampling feature map and the third down-sampling feature map into a third-stage decoder to obtain a third up-sampling feature map; inputting the third upsampling feature map, the first downsampling feature map and the second downsampling feature map into a second-stage decoder to obtain a second upsampling feature map; and inputting the second up-sampling feature map into a first-stage decoder to obtain a defogging feature map.
Description
Image defogging method based on multi-scale depth fusion Technical Field The invention discloses an image defogging method based on multi-scale depth fusion, and belongs to the technical field of computer vision. Background With the wide application of computer vision and intelligent image processing technology, high-quality images have become the basic guarantee of key tasks such as target detection, automatic driving, video monitoring and the like. However, in the natural scene shooting process, particles such as water vapor, dust and the like suspended in the atmosphere can cause light scattering, so that the acquired image has the problems of low contrast, color deviation, blurred details and the like, and the reliability and accuracy of subsequent high-level visual tasks are seriously weakened. Particularly under complex meteorological conditions, the mist distribution has spatial non-uniformity and concentration variability, and higher requirements are put on the robustness and generalization capability of a defogging algorithm. The image defogging technology aims at recovering a clear and real defogging scene from a single foggy image. The traditional method is mainly divided into two types, namely one type is based on an atmospheric scattering physical model, recovery is realized by estimating a transmission diagram and global atmospheric light, such as Dark Channel Prior (DCP) and the like, but the method relies on strong statistical assumption, halation artifacts are easy to generate in a sky area or a low texture area, and the other type is based on an image enhancement strategy, such as histogram equalization or contrast stretching, and the method can improve visual impression, but is difficult to reconstruct real scene structure and illumination information. In recent years, deep-learning driven end-to-end defogging networks (such as AOD-Net, dehazeNet) have been freed from explicit prior constraints, but have been limited by single-scale feature extraction capabilities, often with deficiencies in edge sharpness preservation and detail texture recovery. In the prior art, although partial researches introduce encoder-decoder architecture and try multi-scale feature fusion (such as UNet 3+), the method has the obvious defects that firstly, standard convolution is generally adopted in the downsampling process, the receptive field is fixed, the adaptive modeling capability for different fog concentration areas is lacking, secondly, the jump connection only realizes feature transfer between local layers, full-scale coding information and cross-stage semantic association cannot be fully integrated, thirdly, a loss function designs multi-focus pixel level error, and collaborative optimization of high-level perception quality and structural similarity is neglected. These problems lead to the fact that the existing method often has detail loss, color distortion or artifact residues when processing high-density fog images, and the dual requirements of image authenticity and usability in practical application are difficult to meet. Therefore, there is a need for an image defogging method that can deeply fuse multi-scale context information, has strong expression capability, and gives a compromise in perceived quality. Disclosure of Invention The invention aims to provide an image defogging method based on multi-scale depth fusion, which can effectively solve the problems in the background technology. An image defogging method based on multi-scale depth fusion comprises the following specific steps: s1, based on an atmospheric scattering model, randomly selecting a global atmospheric light value and an atmospheric scattering coefficient value through known depth information, generating a foggy image from a foggy image, and establishing a training data set; S2, constructing a defogging network based on an encoder and a decoder, wherein the encoder comprises a first-stage encoder to a fifth-stage encoder, the decoder comprises a first-stage decoder to a fourth-stage decoder, and a network structure of the defogging network adopts a multi-scale learning module to extract defogging-related multi-scale characteristic information; S3, inputting the image to be defogged into an encoder and a decoder to obtain defogging feature images, and inputting the defogging feature images into an image recovery network to generate a final defogging image; And S4, training the defogging network by adopting a linear weighted combination of an L1 norm loss function, a perception loss function and a multi-scale structure similarity loss function as a total loss function, and inputting a single defogging image and outputting a corresponding defogging image after training. Further, the encoder consists of five downsampling modules, the size of the input feature map is gradually reduced, the decoder consists of four upsampling modules, and the deconvolution operation steps in each upsampling module are set to gradually enlarge the