CN-122023161-A - O-type remote sensing and unmanned aerial vehicle image defogging system and method

CN122023161ACN 122023161 ACN122023161 ACN 122023161ACN-122023161-A

Abstract

The invention discloses an O-type remote sensing and unmanned aerial vehicle image defogging method and system, and belongs to the field of computer vision and image processing. The method comprises the steps of firstly constructing a double-branch O-shaped defogging network, wherein a first branch is used for realizing global context modeling based on a transducer mechanism, a second branch is used for realizing sequence modeling based on a Mamba state space model, the double branches conduct characteristic interaction through cyclic topology, in the network, the first branch adopts a sparse enhanced attention mechanism and adaptively focuses on a key region of an image through a learnable sparsity enhancing operator, a second branch is integrated with a mixed vision state space module, two-dimensional space context information is effectively captured through two-dimensional selective scanning operation, and in addition, a frequency attention module is introduced in an encoding stage to enhance frequency domain characteristics. The method effectively solves the defects of the existing method in the aspects of global modeling, local detail maintenance and space structure utilization, and obtains excellent performance in the aspects of remote sensing and unmanned aerial vehicle defogging dataset disclosure.

Inventors

ZHOU HAO
ZHANG JIAN
WANG LE
TAO TAO
WANG TONG
GUAN XIN

Assignees

安徽工业大学

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. An O-type remote sensing and unmanned aerial vehicle image defogging method is characterized by comprising the following steps: S1, constructing a defogging network, wherein the network adopts a double-branch architecture comprising a first branch and a second branch, the first branch is constructed based on a transducer mechanism to perform global context modeling, the second branch is constructed based on a Mamba model to perform sequence modeling, and the double branches perform characteristic interaction through a topological structure with a circulating information flow; S2, training the network by using a training data set comprising a fog image and a clear image pair to obtain a defogging mapping relation; s3, inputting the remote sensing or unmanned aerial vehicle image to be defogged to the network after training is completed; s4, outputting defogged images through forward propagation of the network.
2. The method of claim 1, wherein a sparse attention-enhancing operation is performed in the first branch, the operation comprising generating attention weights based on input features to enhance attention to important areas in the image while selectively filtering the self-attention scores through a masking mechanism that employs a learnable soft modulation map that is adaptively generated by the input features and acts on the self-attention scores in an element-by-element manner prior to normalization.
3. The method of claim 2, wherein the step of generating the attention weight includes introducing a sparsity enhancing operator to dynamically generate pilot signals from the input to adjust the importance of different channels or spatial locations to adaptively impose sparsity constraints in the attention calculation process.
4. The method of claim 1, wherein the second branch is modeled sequentially with stacked Mamba layers using a state space model SSM as a core, and; The Mamba branches are integrated with a hybrid visual state space module, which processes input features through three parallel branches for converting two-dimensional features into a sequence representation.
5. The method of claim 1, wherein the hybrid visual state space module processes input features through three parallel branches, the process for converting two-dimensional features into a sequential representation comprising: in the first branch, linear projection expands the channel dimension, and then performs depth separable convolution, activation and two-dimensional selective scanning; in the second branch, the characteristic channel is extended by a linear layer, and then depth separable convolution, activation and residual convolution are performed; carrying out layer normalization on the outputs of the two branches through linear layer connection and fusion, and simultaneously, expanding the channel number by using a linear layer for a third branch and then activating; combining the obtained features and the features fused by the first two branches through element-by-element multiplication; The number of channels is reduced back to the original number by linear projection, resulting in an output feature with the same spatial size as the input.
6. The method of claim 5, wherein the two-dimensional selective scan is processed by scanning a two-dimensional feature map along a plurality of diagonal directions and converting it into a sequence to capture two-dimensional spatial context information.
7. The method according to claim 1, wherein frequency attention modules FAM are provided in the encoding stages of the dual branches for extracting and enhancing frequency domain information of features to enhance restoration capability of image edges and textures.
8. An O-type remote sensing and unmanned aerial vehicle image defogging system, which adopts the method as set forth in any one of claims 1 to 7, and is characterized by comprising: The model building module is used for building and storing a defogging network model, the model comprises a first network branch and a second network branch which interact through a cyclic topology, the first network branch is realized based on a transducer mechanism, and the second network branch is realized based on a state space model; The image input module is used for receiving remote sensing or unmanned aerial vehicle images to be defogged; The processing execution module is used for inputting the image into the defogging network model and executing forward computation; And the result output module is used for outputting the clear image after defogging treatment.
9. The system of claim 7, wherein the first network branch includes a sparse enhanced attention unit for dynamically generating attention weights based on input features, the sparse enhanced attention unit including a learnable sparsity adjuster for adaptively directing a sparsification process of attention computation.
10. The system of claim 7, wherein the second network branch comprises a hybrid vision state space processing unit integrated with a two-dimensional selective scanning component configured to scan a two-dimensional feature map in a plurality of predetermined directions and a residual convolution component.

Description

O-type remote sensing and unmanned aerial vehicle image defogging system and method Technical Field The invention belongs to the technical field of computer vision and image processing, and particularly relates to an O-type remote sensing and unmanned aerial vehicle image defogging system and method. Background In the process of acquiring the remote sensing and unmanned aerial vehicle images, the image quality is seriously degraded due to the scattering and absorption effects of suspended particles in the atmosphere, and the image quality is particularly characterized by reduced contrast, lost details and color distortion. Such degradation can directly affect the accuracy and reliability of subsequent high-level visual tasks such as object detection, clutter classification, environmental monitoring, and the like. To recover degraded images, image defogging techniques have been developed. Conventional defogging methods are mostly based on physical models, such as an atmospheric scattering model, to restore images by estimating transmittance and atmospheric light. However, these methods tend to be unstable and prone to introducing artifacts when dealing with complex, non-uniform haze distributions. In recent years, deep learning-based methods, particularly convolutional neural networks CNN, have made significant progress in defogging tasks by virtue of their strong feature learning capabilities. However, CNN is difficult to effectively model long-range dependence on widely distributed haze areas in images due to local receptive field limitations of convolution kernels. To break through this limitation, models based on self-attention mechanisms, visionTransformer et al, were introduced. The self-attention mechanism can directly model the global relationship between any pixel pair in the image, thereby improving the capturing capability of the large-range context information. However, the standard self-attention mechanism processes all areas in the image equally, has high computational complexity, is difficult to effectively focus on key areas with high haze concentration and serious degradation, and is easy to generate blurring when processing detail textures. Recently, state space model SSM, particularly Mamba architecture, has received attention due to its linear complexity and strong long-range dependency capture capability in sequence modeling. However, the standard Mamba model is specially designed for one-dimensional sequence, and when the model is applied to two-dimensional images, the images need to be flattened into a sequence, and the process can destroy the inherent spatial locality and two-dimensional structure information of the images, so that the local detail recovery is poor. Therefore, the invention designs an O-type remote sensing and unmanned aerial vehicle image defogging system and method. Disclosure of Invention The invention aims to solve the problems in the prior art, and provides an O-type remote sensing and unmanned aerial vehicle image defogging system and method. The invention firstly discloses an O-type remote sensing and unmanned aerial vehicle image defogging method, which comprises the following steps: S1, constructing a defogging network, wherein the network adopts a double-branch architecture comprising a first branch and a second branch, the first branch is constructed based on a transducer mechanism to perform global context modeling, the second branch is constructed based on a Mamba model to perform sequence modeling, and the double branches perform characteristic interaction through a topological structure with a circulating information flow; S2, training the network by using a training data set comprising a fog image and a clear image pair to obtain a defogging mapping relation; s3, inputting the remote sensing or unmanned aerial vehicle image to be defogged to the network after training is completed; s4, outputting defogged images through forward propagation of the network. In the above system, performing a sparse attention-enhancing operation in the first branch, the operation comprising generating attention weights based on input features to enhance attention to important areas in the image while selectively filtering self-attention scores through a masking mechanism; The masking mechanism. In the method, the step of generating the attention weight comprises the steps of introducing a sparsity enhancing operator, dynamically generating guide signals according to input to adjust the importance of different channels or spatial positions so as to adaptively apply sparsity constraint in the attention calculation process. In the method, the second branch adopts a state space model SSM as a core, adopts stacked Mamba layers to carry out sequence modeling, and is characterized in that the second branch adopts a state space model SSM as a core; The Mamba branches are integrated with a hybrid visual state space module, which processes input features through three parallel branc