CN-120031708-B - Multi-source remote sensing image dodging and color homogenizing method based on style migration
Abstract
The invention discloses a multi-source remote sensing image dodging and homogenizing method based on style migration, and belongs to the technical field of computer vision and remote sensing science. The method comprises the steps of manufacturing a remote sensing satellite cloud image data set based on a standard optical model, constructing positive and negative prompt words through category semantic tags, introducing degraded cloud images into a noise estimation network as conditions to estimate condition noise distribution, training a condition diffusion model, constructing a remote sensing image semantic extraction model combining a convolution network and a Transformer, carrying out semantic extraction on a multi-source remote sensing image, outputting ground object category tags of each pixel point, constructing a style migration model based on the Transformer aiming at the content and style information of the multi-source remote sensing image, and referring to a reference image with proper radiation characteristics to realize uniform light and color of the remote sensing image. Can be applied to realize tone coordination and moderate contrast between adjacent images, so that the color image is similar to natural and real colors.
Inventors
- YU CHANGHUI
- MEI XIAOQING
- LI TINGBIN
- LIU XIAOAN
- XU YAOWEN
- YI YAOHUA
Assignees
- 武汉大学
Dates
- Publication Date
- 20260508
- Application Date
- 20250125
Claims (9)
- 1. A multi-source remote sensing image dodging and homogenizing method based on style migration is characterized by comprising the following steps: based on a standard optical model, synthesizing images containing cloud and fog of different degrees, and manufacturing a remote sensing satellite cloud and fog image dataset; Constructing a prompt word through category semantic tags to generate text conditions, acquiring a degraded cloud image from a remote sensing satellite cloud image data set, training a potential diffusion model based on the text conditions and the degraded cloud image conditions, wherein the training process comprises the steps of copying the potential diffusion model into two identical parts which are a locked copy and a trainable copy respectively, freezing the locked copy, namely keeping the weight unchanged, fine-tuning the trainable copy by using the degraded cloud image conditions, namely introducing the degraded cloud image as image conditions into a noise estimation network to estimate conditional noise distribution, and embedding the degraded cloud image conditions into a noise estimation network through a trainable embedding layer Calculating, wherein the training targets are as follows: ; Wherein, the Representing the actual noise value, subject to a standard normal distribution , Representing a model predicted noise value; Representing the time step in the diffusion process, Is a token vector in the potential space and, Representing a potential variable in the diffusion process at a point in the diffusion process Is used for the control of the state of (a), A cloud image condition indicative of degradation is embedded, The text condition is represented as being embedded, Representing the expected value; Adding the fine-tuned trainable copy and the result output by the trained potential diffusion model, and outputting a cloud and fog removed remote sensing image through a VAE decoder; Carrying out semantic extraction on the remote sensing image removed by cloud through a pre-constructed remote sensing image semantic extraction model, and outputting the content and style information of the multi-source remote sensing image; And constructing a style migration model based on a transducer according to the content and style information of the multi-source remote sensing image, wherein the style migration model refers to a reference image with proper radiation characteristics, so that the uniform light and uniform color of the remote sensing image are realized.
- 2. The style migration-based multi-source remote sensing image dodging and homogenizing method as set forth in claim 1, wherein the standard optical model is: ; Wherein, the For the coordinate values of the pixels of the image, In the case of a hazy image, For the defogging image to be restored, In order for the transmittance to be high, Is a global atmospheric light component, and the reflection energy is reduced to cause transmission attenuation under the weather condition of cloud and fog Thereby causing the brightness of the image to be reduced, and the ambient illumination to scatter to form air light The image brightness is enhanced, the saturation is reduced, the cloud image is formed by shooting light reflected by an object weakened by cloud and atmospheric light reflected by the cloud through a camera, Is the weakness and scattering of the cloud to the light.
- 3. The style migration-based multi-source remote sensing image dodging and homogenizing method of claim 1 is characterized in that the construction method of the remote sensing image semantic extraction model comprises the following steps: Constructing a convolution network-based encoder, extracting multi-scale semantic features by adopting ResNet as the encoder, wherein each stage consists of residual blocks of four stages, and the scale factors of the feature images are reduced by 2 times through downsampling; Constructing a transducer-based decoder that utilizes three global-local attention transducer blocks and a feature refinement header to construct a lightweight transducer-based decoder; The global-local attention transducer block constructs two parallel branches, extracts global and local contexts respectively, comprising: the local branch adopts two parallel convolution layers, the core size is 3 and 1 respectively, so as to extract local context; the global branching captures the global context using a window-based multi-headed self-attention mechanism, extends the channel dimension of the input 2D feature map three times using standard 1x1 convolution, and then applies a window segmentation operation to segment the 1D sequence into query Q, key K, and value V vectors.
- 4. The method for homogenizing and coloring multi-source remote sensing images based on style migration of claim 3 wherein the formula of the weighted summation operation is: ; Wherein, the The characteristics of the fusion are represented and, Representing the characteristics of the generation of the residual block, Features representing global local transducer block generation, Representing the weights.
- 5. The method for homogenizing and coloring multi-source remote sensing images based on style migration of claim 3 wherein the method for constructing the semantic extraction model of the remote sensing images further comprises constructing a cross window context interaction module which captures global context by fusion of two feature maps generated by a horizontal averaging pooling layer and a vertical averaging pooling layer, wherein the horizontal averaging pooling layer establishes a horizontal relationship between windows, and wherein for any point in one window of the horizontal relationship Corresponding points in another window in horizontal relation The dependency modeling of (2) is: ; ; ; ; Wherein, the The index is represented by a number of indices, Is the size of the window in which the window is to be formed, A self-attention calculation is represented for modeling the dependency of pixel pairs in a local window.
- 6. The method for homogenizing and coloring the multi-source remote sensing image based on style migration of claim 1, wherein the style migration model based on the Transformer comprises the following steps: the converter encoder is used for sending the image sequence to be processed into the converter encoder, each layer comprises a multi-head self-attention module and a feedforward network, and the input sequence is converted into a query Q, a key K and a value V; The multi-head self-attention module is used for processing different heads in parallel through a multi-head self-attention mechanism, calculating attention and effectively encoding an input sequence; The method comprises the steps of coding a content sequence, generating a characteristic sequence, namely generating a query Q by using the content sequence, generating a key K and a value V by using the style sequence, and performing sequence translation, wherein the code content sequence is translated in a regression mode according to the coded style sequence, all sequence blocks are input and predicted and output at one time, and the output sequence of the converter is further refined by a three-layer CNN decoder, namely convolution, reLU activation and upsampling.
- 7. The method for homogenizing and coloring the multi-source remote sensing image based on style migration of claim 6, wherein the method for homogenizing and coloring the remote sensing image by the style migration model with reference to the reference image with proper radiation characteristics comprises the following steps: after the embedding process, the input content sequence is sent to a transducer encoder, where it is converted into a representation of the query Q, key K, and value V: ; Wherein, the A sequence of input content is represented and, , The length of the sequence is indicated and, Representing the number of attention points; The multi-head self-attention mechanism calculates attention by processing different heads in parallel, and realizes effective coding of an input sequence: ; Transformer decoder based on coding style sequence Translating encoded content sequences in a regression manner The input of the transducer decoder comprises the encoded content sequence And style sequence Generating queries using content sequences Generating keys using style sequences Sum value : ; Calculating output sequences of a transducer decoder : ; Output sequence of a transducer decoder Has a shape of Wherein And Representing the height and width of the output respectively, Representing the number of channels, further refining the output of the transducer decoder using a three-layer CNN decoder, scaling up for each layer of CNN decoder, including 3x3 convolution, reLU activation function and 2 up-sampling, finally obtaining a resolution The output image is matched with the input image in spatial resolution, each pixel point of the output image contains information of three color channels, and a dodging and dodging remote sensing image is generated.
- 8. A computer readable storage medium storing one or more programs, characterized in that the one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-7.
- 9. A computing device, comprising: One or more processors, one or more memories, and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-7.
Description
Multi-source remote sensing image dodging and color homogenizing method based on style migration Technical Field The invention relates to a multi-source remote sensing image dodging and homogenizing method based on style migration, and belongs to the technical field of computer vision and remote sensing science. Background With the rapid progress of remote sensing technology, remote sensing images play an increasingly important role in a plurality of key fields such as urban planning, land resource management, environment monitoring and the like. Under the background of national promotion of urban intelligent construction, the application of the remote sensing technology not only promotes the reasonable utilization of land resources, but also provides powerful support for the economic high-quality development. As a core tool for obtaining geospatial information, color consistency and authenticity of remote sensing images are critical for subsequent analysis and application. However, due to the influence of various factors such as illumination conditions, sensor characteristics, atmospheric influence and the like, color difference and non-uniformity phenomena often occur in the obtained remote sensing image, and the problems seriously influence the mosaic effect and the practical application value of the image. In the field of remote sensing image processing, the research of the dodging and dodging method has profound significance. The method not only remarkably improves the usability and analysis precision of the remote sensing image, but also is particularly important in the aspect of processing the remote sensing image with large scale, multiple sources and multiple phases. By eliminating color differences caused by different sensors, imaging conditions and environmental changes, the dodging and dodging method ensures consistency of remote sensing images in color and brightness, and provides a solid high-quality data base for subsequent image analysis and application. At present, research and study on uniform light and color of multisource remote sensing images still face a plurality of challenges, such as cloud and fog layer interference, obvious color difference, insufficient automation level, difficult extraction of splicing seams, unsatisfactory splicing seam elimination effect and the like. Disclosure of Invention The invention provides a multi-source remote sensing image dodging and homogenizing method based on style migration, which solves the problems disclosed in the background technology. The method can be divided into three technical modules of cloud and fog removal, semantic extraction and style migration based on style migration. Through the deep learning technology, the color and the brightness of the remote sensing image can be intelligently adjusted so as to achieve the purpose of uniform light and uniform color. Firstly, a remote sensing cloud image data set is manufactured based on the crawled OSM image and a standard optical model, a conditional diffusion model is trained, and cloud and fog removal pretreatment is carried out on the multi-source remote sensing image. And then carrying out semantic extraction on the preprocessed remote sensing image to realize semantic segmentation of the image. Finally, the consistency of the ground object styles among different images is ensured by using a style migration technology, and seamless mosaic is realized. The method optimizes the visual presentation of the image on the basis of retaining the feature of the ground object, and improves the application value of the remote sensing image. In order to solve the technical problems, the invention adopts the following technical scheme: A multi-source remote sensing image dodging and homogenizing method based on style migration comprises the following steps: based on a standard optical model, synthesizing images containing cloud and fog of different degrees, and manufacturing a remote sensing satellite cloud and fog image dataset; The method comprises the steps of constructing a prompt word through category semantic tags to generate text conditions, introducing degraded cloud images in a remote sensing satellite cloud image dataset into a noise estimation network as conditions to estimate conditional noise distribution, generating cloud control conditions, training a potential diffusion model according to the text conditions and the cloud control conditions, adding a result after the cloud image control conditions are applied and a result output by the potential diffusion model, and outputting a cloud-removed remote sensing image through a VAE decoder; Carrying out semantic extraction on the remote sensing image subjected to cloud removal pretreatment through a pre-constructed remote sensing image semantic extraction model, and outputting the content and style information of the multi-source remote sensing image; And constructing a style migration model based on a transducer according to the content a