CN-121981904-A - Mixed network image defogging method based on multi-scale interaction and airspace guiding double fusion
Abstract
The application relates to the technical field of image processing, in particular to a mixed network image defogging method based on multi-scale interaction and airspace guidance double fusion, which comprises the steps of obtaining an image defogging dataset, wherein the image defogging dataset comprises a foggy image and a corresponding clear foggy image; the method comprises the steps of constructing a U-shaped defogging network model with a mixed CNN and a Transformer, training the U-shaped defogging network model by adopting an image defogging data set, taking an L1 loss function as an optimization target, minimizing the difference between a defogging result and a clear defogging image, inputting the defogging image to be processed into the mixed network model to obtain the clear defogging image, realizing a channel and space two-dimensional redundancy removing mechanism through a mixed network architecture and multi-scale feature interaction, and integrating the design of bottleneck features, and improving the defogging precision of the image and the adaptability of different fog scenes.
Inventors
- YIN XUEHUI
- MIAO KANG
- LI HAONAN
- HUANG JIAWEI
Assignees
- 重庆邮电大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260120
Claims (9)
- 1. The utility model provides a mixed network image defogging method based on multiscale interaction and airspace guide double fusion, which is characterized by comprising the following steps: S1, acquiring an image defogging data set, wherein the image defogging data set comprises a foggy image and a corresponding clear defogging image; S2, constructing a U-shaped defogging network model with a mixed CNN and a Transformer, wherein the U-shaped defogging network model comprises an encoder, a decoder, defogging heads and a category weight module; the encoder adopts a CNN and Transformer serial structure for generating multi-scale jump connection characteristics, wherein in the CNN and Transformer serial structure, the CNN is used for extracting local characteristics and the Transformer is used for capturing global context information, and the CNN comprises induction bias which comprises spatial locality bias and translation invariance bias; The decoder comprises a scale attention module, a airspace guiding double-fusion module and a global-local channel attention module; the system comprises a scale attention module, an airspace guiding double-fusion module, a global-local channel attention module, a high-resolution feature map and a high-resolution feature map, wherein the scale attention module carries out interactive redundancy elimination on multi-scale jump features; The defogging head is used for synthesizing a candidate clear image according to the high-resolution characteristic image; the category weight module takes the feature with the minimum resolution in the multi-scale jump features as an input weight, obtains an output fusion weight through 2-layer convolution and global average pooling treatment, and carries out weighted fusion on the de-candidate images to obtain the final defogging image output; S3, training a U-shaped defogging network model by adopting an image defogging data set, and taking an L1 loss function as an optimization target to minimize the difference between a defogging result and a clear defogging image; s4, inputting the fog image to be processed into a trained hybrid network model to obtain a clear defogging image.
- 2. The hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion according to claim 1, wherein the scale attention module is used for solving the problems of feature dilution and redundancy.
- 3. The hybrid network image defogging method based on multi-scale interaction and airspace guiding double fusion according to claim 1, wherein the airspace guiding double fusion module is used for optimizing the fusion effect of local features and global features.
- 4. The hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion according to claim 1, wherein the global-local channel attention module is used for simultaneously focusing on channel importance and local spatial information.
- 5. The hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion according to claim 1, wherein the defogging head is constructed based on a CL2S method.
- 6. The method for defogging the mixed network image based on the multi-scale interaction and the airspace guidance double fusion according to claim 1 is characterized in that the defogging head generates 3 candidate images, the category weight module generates 3 fusion weights, respectively carries out weighted fusion on the 3 candidate images generated by the defogging head, and further fuses the three weighted fused images.
- 7. The hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion according to claim 1, wherein the data set is preprocessed before training.
- 8. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, implements the hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion of any of claims 1-7.
- 9. A computer readable storage medium storing a computer program, wherein the computer program, when executed by the processor, implements the hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion according to any of claims 1 to 7.
Description
Mixed network image defogging method based on multi-scale interaction and airspace guiding double fusion Technical Field The application relates to the technical field of image processing, in particular to a mixed network image defogging method based on multi-scale interaction and airspace guiding double fusion. Background Fog is a common natural phenomenon formed by floating particles in the atmosphere, and can cause image contrast reduction and texture distortion, and seriously affect the performance of advanced visual tasks such as target detection, identification, automatic driving and the like. In the existing mainstream image defogging scheme, dehazeFormer relieves the gradient inversion problem and utilizes edge characteristics by improving LayerNorm, geLU and a space aggregation mechanism of a Swin transform, VSPPA adopts a block-level attention layer for feature perception to enhance local correlation characteristics, MB-TaylorFormer approximates softmax-attention through a Taylor formula, and long-distance pixel interaction is captured by combining multi-scale block embedding and deformable convolution. The DeHamer network takes the global feature extracted by the transducer as condition information to modulate the local feature of the CNN, and the low-cost hybrid network simplifies the CNN feature extraction part and optimizes the operation efficiency. The existing mixed model still has the problems of characteristic dilution, poor fusion granularity, insufficient channel attention combined with space information and the like, so that defogging performance and adaptability of different fog scenes are required to be further improved. Disclosure of Invention In view of the above, the application discloses a hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion, which solves the problems in the prior art, and comprises the following steps: a mixed network image defogging method based on multi-scale interaction and airspace guidance double fusion comprises the following steps: S1, acquiring an image defogging data set, wherein the image defogging data set comprises a foggy image and a corresponding clear defogging image; S2, constructing a U-shaped defogging network model with a mixed CNN and a Transformer, wherein the U-shaped defogging network model comprises an encoder, a decoder, defogging heads and a category weight module; S3, training a U-shaped defogging network model by adopting an image defogging data set, and taking an L1 loss function as an optimization target to minimize the difference between a defogging result and a clear defogging image; S4, inputting the fog image to be processed into a trained hybrid network model to obtain a clear defogging image; the computer equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the mixed network image defogging method based on multi-scale interaction and airspace guidance double fusion when executing the computer program; a computer readable storage medium storing a computer program which, when executed by the processor, implements the above-described hybrid network image defogging method based on multi-scale interaction and airspace guidance double fusion. The application provides a CNN-converter serial encoder design, which enables a model to have induction bias and global context modeling capability, avoids performance limitation caused by a single model, designs a scale attention module on the basis, realizes channel and space double-dimensional redundancy elimination through multi-scale feature interaction, provides clean feature input, further designs an airspace guiding double-fusion module and a global-local channel attention module, combines the clean feature input with space attention, fuses local space information while focusing on the importance of the channel, realizes optimal fusion of local features and global features, fully exerts advantages of the local features and the global features, avoids important feature loss, optimizes fusion effect by using bottleneck features as fusion weight sources and utilizes extra mist category semantic information compared with final encoder features used by a CL2S method, and improves adaptability of different mist scenes by each design linkage. Drawings FIG. 1 is a schematic diagram of a U-shaped defogging network model of a CNN and a transducer mixture in an embodiment of the application; FIG. 2 is a schematic diagram of data set partitioning according to an embodiment of the present application; FIG. 3 is a comparison test result based on RESIDE-6K dataset in an embodiment of the present application; FIG. 4 is a comparison test result based on an O-HAZE dataset in an embodiment of the present application; FIG. 5 is a comparison test result based on the I-HAZE dataset in an embodiment of the present application; FIG. 6 is a comparison test result based on the NH-HAZE dataset in