KR-20260067693-A - INFRARED AND VISIBLE LIGHT IMAGE FUSION DEVICE USING FEATURE DECOMPOSITION-BASED TRANSFORMER

KR20260067693AKR 20260067693 AKR20260067693 AKR 20260067693AKR-20260067693-A

Abstract

An infrared and visible light image synthesis device using a feature decomposition-based transformer according to an embodiment synthesizes a pair of visible light and infrared images containing common and unique information about a scene from different perspectives to obtain a meaningful result image that is easy for subsequent processing, and enables the acquisition of a high-quality synthesized image more effectively than existing methods. In addition, through the embodiment, image quality can be improved by synthesizing a pair of visible light and infrared images containing unique information about a scene from different perspectives.

Inventors

이철
원해양
비엔지아안
김가현

Assignees

주식회사 토즈

Dates

Publication Date: 20260513
Application Date: 20241106

Claims (10)

A memory storing at least one instruction for infrared and visible light image synthesis using a feature decomposition-based transformer; and It includes a processor that performs an operation according to the above instruction, The above processor is, Based on the intrinsic characteristics of visible light and infrared images, feature maps are extracted at multiple scales, and the extracted feature maps are decomposed into common features and intrinsic features through a feature decomposition-based transformer to estimate relevance maps and non-relevance maps. Infrared and visible light image synthesis device that generates a synthetic image using complementary information from the above-mentioned relevance map and non-relevance map.
In paragraph 1, the processor To extract common features and unique features from the input images, a pair of input infrared and visible light images are used to extract a three-level feature pyramid using a convolution layer, and An infrared and visible light image synthesis device in which, at each of the above levels, except for the topmost level, a pair of feature maps is combined with the feature maps of the previous level through a 1×1 convolution layer and upsampling.
In paragraph 2, the processor The feature map combined with the feature map from the previous level learns feature representations through the RFDB (residual feature distillation block) in the next convolution layer, and Feature Decomposition Transformers (FDTs) decompose each input into common features and eigen features in the spatial and channel domains, and An infrared and visible light image synthesis device in which a reconstruction block synthesizes each feature map to generate a result image.
In paragraph 3, the above feature decomposition-based transformers (FDTs) To extract information necessary for image synthesis, local and global information is captured using a feature decomposition-based transformer, and The infrared and visible light image synthesis device comprises a feature decomposition-based transformer that decomposes and extracts common features in the spatial and channel domains of each input image so that unique features and common information are not lost.
In paragraph 4, the space-decomposing transformer is An infrared and visible light image synthesis device composed of two transformers for extracting infrared and visible light features, wherein each spatial decomposition transformer receives a feature map and then acquires common features and unique features.
In paragraph 5, the space-decomposing transformer is Generate query features using a single 1×1 convolution layer, and An infrared and visible light image synthesis device that generates a key feature map through a gated bottleneck that exchanges information at the intersection between modalities, since the input image contains common information of different modalities, and estimates a relevance map and an irrelevance map by reconstructing the query feature and the key feature map into HW×C and C×1 matrices, respectively.
In paragraph 4, the channel decomposition transformer is Utilize global interactions, and Query features for visible light images are acquired using a single 1×1 convolution layer, and a key feature map is generated through a gated bottleneck, and An infrared and visible light image synthesis device that estimates a relevance map and an unrelatedness map by reconstructing the above query feature and key feature map into C×HW and HW×1, respectively.
In paragraph 4, the above feature decomposition-based transformers (FDTs) An infrared and visible light image synthesis device that acquires a common feature map and a unique feature map for a visible light image and an infrared image in order to independently extract common features and unique features in a spatial domain and a channel domain.
In claim 8, the above feature decomposition-based transformers (FDTs) An infrared and visible light image synthesis device that combines common and unique features of each decomposition using a single 1×1 convolution layer to convey essential information within the feature maps of spatial decomposition and channel decomposition, and obtains fused common feature maps and unique feature maps for each modality.
In paragraph 3, the above-mentioned reconstruction block is It is configured based on self-fusion convolution (SFC) to effectively synthesize decomposed common features and eigen features, and Infrared and visible light image synthesis device that synthesizes a common feature map for infrared and visible light through a single 1×1 convolution, and generates a final synthesized image using the synthesized common feature map and two unique feature maps for infrared and visible light as inputs to an SFC block.

Description

Infrared and Visible Light Image Fusion Device Using Feature Decomposition-Based Transformer The technical concept of the present disclosure relates to an infrared and visible light image synthesis apparatus using a feature decomposition-based transformer, and more specifically, to an apparatus and method for generating a high-quality synthesized image by synthesizing a pair of visible light and infrared images containing common information and unique information of a scene. Unless otherwise indicated in this specification, the contents described in this section are not prior art for the claims of this application, and are not to be recognized as prior art simply because they are included in this section. Heterogeneous image synthesis is a technology that generates more useful images by utilizing complementary information from multiple images acquired using different sensors or optical settings. It is a technique for creating new images by combining different types of images, where heterogeneous images refer to, for example, images with different resolutions, frame rates, color spaces, or types of content. When synthesizing such images, it is important to consider the characteristics of each image to ensure a natural combination. On the other hand, infrared images generated by reflecting the thermal radiation of an object can detect objects under various lighting conditions, but they have low spatial resolution and fail to provide detailed information. In contrast, visible light images contain rich detail and are generated by detecting light within the visible spectrum, similar to what humans see. However, visible light images have the disadvantage of being easily affected by environmental conditions such as weather and illumination levels. Traditional model-based image synthesis techniques rely on mathematical synthesis rules to transform input images into specific transformation domains and synthesize them. However, designing synthesis rules manually is difficult, and rules generated in this way have limitations in effectively utilizing complementary information from two different forms. In contrast, the latest deep learning-based image synthesis techniques demonstrate excellent synthesis performance based on the superior feature extraction capabilities of convolutional neural networks (CNNs). However, the limited receptive range of CNNs results in a reduced ability to capture globally important information. Generative adversarial network (GAN)-based synthesis algorithms generate synthetic images that preserve the pixel value distribution in each image. The most recently developed Transformer-based synthesis algorithms can capture global interactions between input images through a self-attention mechanism. While they offer excellent synthesis performance, using self-attention for each input image may prevent the full utilization of the unique features of infrared and visible light images. FIG. 1 is a diagram showing the configuration of an infrared and visible light image synthesis device using a feature decomposition-based transformer according to an embodiment. FIG. 2 is a diagram showing an image synthesis algorithm according to an embodiment. FIG. 3 is a drawing showing the structure of a space-decomposed transformer proposed in an embodiment. FIG. 4 is a drawing showing the structure of a channel decomposition transformer provided in the embodiment. FIG. 5 is a diagram showing (a) an infrared image, (b) a visible light image used in the embodiment, and (c) a composite image generated by an infrared and visible light image synthesis device using a feature decomposition-based transformer according to the embodiment. Figure 6 is a diagram illustrating an infrared and visible light image synthesis method using a feature decomposition-based transformer. Hereinafter, various embodiments of the present disclosure are described in conjunction with the accompanying drawings. As various embodiments of the present disclosure may be subject to various modifications and may have various forms, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the various embodiments of the present disclosure to specific forms, and it should be understood that they include all modifications and/or equivalents and substitutions that fall within the spirit and scope of the various embodiments of the present disclosure. In relation to the description of the drawings, similar reference numerals have been used for similar components. In various embodiments of the present disclosure, terms such as “comprising” or “having” are intended to indicate the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. In various embodiments of the