CN-121998873-A - Image defogging method and system based on self-adaptive convolution and texture prior

CN121998873ACN 121998873 ACN121998873 ACN 121998873ACN-121998873-A

Abstract

The invention discloses an image defogging method and system based on self-adaptive convolution and texture prior, and belongs to the technical field of computer vision. The method comprises the steps of obtaining an original image, inputting the original image into a self-adaptive multi-scale convolution module in a defogging model to be processed to obtain a coding feature map, optimizing the coding feature map through a texture restoration module in the defogging model according to a discrete feature codebook obtained through pre-training to obtain a texture restoration feature map, and processing the texture restoration feature map through a decoder in the defogging model to obtain a defogged target image. According to the scheme, on one hand, the characteristic extraction effect of non-uniform fog spots in a complex real scene is improved through the self-adaptive multi-scale convolution module, and on the other hand, the texture smoothed in the characteristic extraction process is restored through the texture restoration module, so that the defogging effect of an image is improved.

Inventors

ZHENG XINYE
CHENG YONGBIN
LUO JINSHENG
CUI YIRAN
YU YE
LU QIANG

Assignees

合肥工业大学

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (10)

1. An image defogging method based on adaptive convolution and texture prior, the method comprising: Acquiring an original image; Inputting the original image into a self-adaptive multi-scale convolution module in a defogging model for processing to obtain a coding feature map, wherein a convolution kernel in the self-adaptive multi-scale convolution module is provided with a receptive field corresponding to a pixel point; Optimizing the coding feature map through a texture restoration module in a defogging model according to a discrete feature codebook obtained through pre-training to obtain a texture restoration feature map, wherein the discrete feature codebook comprises a plurality of high-definition texture feature vectors; And processing the texture restoration feature map through a decoder in the defogging model to obtain a defogged target image.
2. The method of claim 1, wherein the adaptive multi-scale convolution module comprises a geometry prediction module, a dynamic resampling module, and a feature modulation module; Inputting the original image into a self-adaptive multi-scale convolution module in a defogging model for processing to obtain a coding feature map, wherein the method comprises the following steps of: Inputting the original image into the geometric form prediction module to obtain a height scale map and a width scale map, wherein the height scale map and the width scale map are respectively used for representing the receptive field expansion strength of each coordinate in the vertical direction and the horizontal direction; Inputting the height scale map and the width scale map into the dynamic resampling module, generating an adaptive sampling grid, resampling and aggregating original image features according to the adaptive sampling grid, and obtaining a sampling feature map; Inputting the original image into a characteristic modulation module for processing to obtain a predicted modulation coefficient and a predicted bias coefficient; And modulating the sampling characteristic diagram according to the modulation coefficient and the bias coefficient to obtain the coding characteristic diagram.
3. The method of claim 2, wherein inputting the original image into the geometry prediction module to obtain a height scale map and a width scale map comprises: Extracting features of the original image through a feature extraction network to obtain a first feature map; The height scale map is obtained by the following formula Width scale map : Wherein the method comprises the steps of Representing a high degree of convolution mapping; A width convolution map is represented and, As a function of the Sigmoid, Is a preset height upper limit; is the upper limit of the preset width; is a first feature map.
4. The method according to claim 2, wherein inputting the height scale map and the width scale map into the dynamic resampling module generates an adaptive sampling grid, resamples and aggregates the original image features according to the adaptive sampling grid, and obtains a sampling feature map, including: determining the size of a sampling grid according to the overall average value of the height scale map and the width scale map of each coordinate; Determining an adaptive sampling interval of each coordinate according to the height scale map and the width scale map of each coordinate and the size of the sampling grid to generate an adaptive sampling grid, wherein the adaptive sampling grid has a corresponding adaptive sampling interval for each coordinate; According to the self-adaptive sampling grid, performing bilinear interpolation resampling on original image features in a continuous space to obtain extended resampling features; And selecting a corresponding aggregation convolution kernel according to the sampling grid size, performing block aggregation on the resampling characteristics, and recovering to the original spatial resolution to obtain the sampling characteristic diagram.
5. The method according to claim 2, wherein modulating the sampling profile according to the modulation factor and the bias factor to obtain the coding profile comprises: The coding feature map is obtained by the following formula: Wherein, the Is a modulation factor; is a bias coefficient; For element-by-element multiplication; A sampling feature map; Is a coding feature map.
6. The method according to any one of claims 1 to 5, wherein the optimizing the encoded feature map by a texture restoration module in a defogging model according to the pre-trained discrete feature codebook to obtain a texture restoration feature map comprises: Mapping the coding feature map from an original vector space to a feature vector space corresponding to the discrete feature codebook to obtain each vector to be processed; Screening high-definition texture feature vectors similar to each vector to be processed from the discrete feature codebook; And optimizing the coding feature map according to high-definition texture feature vectors similar to each vector to be processed to obtain the texture restoration feature map.
7. The method of claim 6, wherein optimizing the encoded feature map based on high definition texture feature vectors similar to each vector to be processed to obtain the texture restoration feature map comprises: Mapping high-definition texture feature vectors similar to each vector to be processed back to an original vector space to obtain a quantized feature map; inputting the coding feature map into an affine parameter prediction network to obtain a scaling factor and a bias factor; Processing the quantized feature map according to the scaling factor and the bias factor to obtain a third feature map; And fusing the third characteristic diagram and the coding characteristic diagram to obtain the texture restoration characteristic diagram.
8. The method according to any one of claims 1 to 5, further comprising: acquiring a sample foggy image and a sample real image; inputting the sample foggy image into the defogging model to obtain a defogged prediction image; and acquiring a final loss function value through a preset loss function according to the predicted image and the sample real image, and updating parameters of the defogging model according to the final loss function value.
9. The method of claim 8, wherein the predetermined loss function is the following formula: Wherein, the In order to predict the picture, For the final loss function value, As a true image of the sample, And Is a balance coefficient which dynamically changes along with the training round; Wherein, the Reconstructing a loss for a pixel, for characterizing a difference in pixel value between the predicted image and the sample real image; the method comprises the steps of representing differences between feature images of different levels of the predicted image and the sample real image for multi-scale perception losses; the boundary perception loss is used for representing the edge intensity difference between the predicted image and the sample real image; The semantic consistency constraint loss is used for representing the distance between the predicted image and the sample foggy image in a deep feature space; The loss is maintained for the structure, and is used for representing the difference of the characteristic diagram of the predicted image and the characteristic diagram of the sample real image on the gradient.
10. An image defogging system based on adaptive convolution and texture priors, the system comprising: An original image acquisition unit configured to acquire an original image; The coding unit is used for inputting the original image into the adaptive multi-scale convolution module for processing to obtain a coding feature map, wherein a convolution kernel in the adaptive multi-scale convolution module is provided with a receptive field corresponding to the pixel point; The texture restoration unit is used for optimizing the coding feature map according to a discrete feature codebook obtained by pre-training to obtain a texture restoration feature map, wherein the discrete feature codebook comprises a plurality of high-definition texture feature vectors; and the decoding unit processes the texture restoration feature map through a decoder to obtain a defogged target image.

Description

Image defogging method and system based on self-adaptive convolution and texture prior Technical Field The invention relates to the technical field of computer vision, in particular to an image defogging method and system based on self-adaptive convolution and texture prior. Background In haze weather, due to the absorption and scattering effects of suspended particles in the atmosphere on light, the phenomena of reduced visibility, degraded contrast, color distortion, loss of details and the like of an image acquired outdoors occur, so that image defogging is always a fundamental bottom visual task with great attention. Today image defogging typically utilizes convolutional neural networks or transfomer architectures, learning end-to-end mapping from a foggy image to a sharp image by training on a massive synthetic pair-wise dataset, or defogging by predicting physical parameters. The existing method still has insufficient recovery capability for high-frequency detail information when dealing with non-uniform fog and extremely dense fog. Most models often have difficulty distinguishing between dense fog areas and thin fog areas in images during global defogging, resulting in over-enhancement of the thin fog areas or defogging failure of the dense fog areas. Meanwhile, when the texture details seriously shielded by the dense fog are restored, the existing loss function is easy to cause that the restoration result tends to be smooth, and the high-frequency texture and the edge information of the image are lost, so that the defogged image lacks sense of reality and definition in visual sense. Disclosure of Invention Based on the above, it is necessary to provide an image defogging method and system based on adaptive convolution and texture prior, so as to improve the defogging effect of the image. The application provides an image defogging method based on self-adaptive convolution and texture prior, which comprises the following steps: Acquiring an original image; Inputting the original image into a self-adaptive multi-scale convolution module in a defogging model for processing to obtain a coding feature map, wherein a convolution kernel in the self-adaptive multi-scale convolution module is provided with a receptive field corresponding to a pixel point; Optimizing the coding feature map through a texture restoration module in a defogging model according to a discrete feature codebook obtained through pre-training to obtain a texture restoration feature map, wherein the discrete feature codebook comprises a plurality of high-definition texture feature vectors; And processing the texture restoration feature map through a decoder in the defogging model to obtain a defogged target image. In an alternative embodiment, the adaptive multi-scale convolution module includes a geometry prediction module, a dynamic resampling module, and a feature modulation module; Inputting the original image into a self-adaptive multi-scale convolution module in a defogging model for processing to obtain a coding feature map, wherein the method comprises the following steps of: Inputting the original image into the geometric form prediction module to obtain a height scale map and a width scale map, wherein the height scale map and the width scale map are respectively used for representing the receptive field expansion strength of each coordinate in the vertical direction and the horizontal direction; Inputting the height scale map and the width scale map into the dynamic resampling module, generating an adaptive sampling grid, resampling and aggregating original image features according to the adaptive sampling grid, and obtaining a sampling feature map; Inputting the original image into a characteristic modulation module for processing to obtain a predicted modulation coefficient and a predicted bias coefficient; And modulating the sampling characteristic diagram according to the modulation coefficient and the bias coefficient to obtain the coding characteristic diagram. In an alternative embodiment, inputting the original image into the geometric prediction module to obtain a height scale map and a width scale map includes: Extracting features of the original image through a feature extraction network to obtain a first feature map; The height scale map is obtained by the following formula Width scale map: Wherein the method comprises the steps ofRepresenting a high degree of convolution mapping; A width convolution map is represented and, As a function of the Sigmoid,Is a preset height upper limit; is the upper limit of the preset width; is a first feature map. In an optional implementation manner, the inputting the height scale map and the width scale map into the dynamic resampling module generates an adaptive sampling grid, resamples and aggregates the original image features according to the adaptive sampling grid, and obtains a sampling feature map, which includes: determining the size of a sampling grid according to the o