US-20260127899-A1 - APPARATUS AND METHOD FOR MONOCULAR 3D OBJECT DETECTION USING WEATHER-ADAPTIVE DIFFUSION MODELS

US20260127899A1US 20260127899 A1US20260127899 A1US 20260127899A1US-20260127899-A1

Abstract

Disclosed herein are an apparatus and method for monocular 3D object detection using weather-adaptive diffusion models. The apparatus for monocular 3D object detection includes an encoder configured to encode input features of input images, a weather codebook configured to generate weather reference features, which contain knowledge of reference weather and indicate a degree of enhancement for the input features, a weather-adaptive diffusion model configured to obtain enhanced features by enhancing the input features with reference to the weather reference features, and a detection block configured to perform monocular 3D object detection using the enhanced features.

Inventors

Hyungil Kim
Seong Tae Kim
Jung Uk KIM
Youngmin OH

Assignees

ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Dates

Publication Date: 20260507
Application Date: 20251104
Priority Date: 20241105

Claims (20)

1 . A method for monocular 3D object detection, comprising: encoding input features of input images; generating weather reference features that contain knowledge of reference weather and indicate a degree of enhancement for the input features by a weather codebook; obtaining enhanced features by enhancing the input features with reference to the weather reference features by a weather-adaptive diffusion model; and performing monocular 3D object detection using the enhanced features by a detection block.
2 . The method according to claim 1 , wherein the weather codebook is trained through a weather codebook training phase that comprises: randomly initializing embedding weight parameters of the weather codebook; obtaining a first enhanced feature by passing a first feature through a convolutional layer; obtaining a first weather reference feature through a quantization process on each element of the first enhanced feature; performing softmax and global average pooling (GAP) on the first feature and the first weather reference feature to generate first and second probabilities representing importance of each channel, respectively; and training the weather codebook through use of a clear knowledge embedding (CKE) loss function calculated using Kullback-Leibler divergence for the first and second probabilities.
3 . The method according to claim 2 , wherein the weather codebook training phase further comprises: obtaining a second enhanced feature by passing a second feature through a convolutional layer; obtaining a second weather reference feature through the quantization process on each element of the second enhanced feature; and training the weather codebook through use of a weather-invariant guiding (WIG) loss function that guides the second feature to be memorized as a feature of first weather using the first and second weather reference features.
4 . The method according to claim 3 , wherein in the weather codebook training phase, the weather codebook is trained using a clear knowledge recalling (CKR) loss function that adds the CKE loss function and the WIG loss function.
5 . The method according to claim 4 , wherein the weather-adaptive diffusion model is trained by performing forward and reverse processes in multiple time steps of fixed Markov Chain, and uses only the reverse process in its inference phase.
6 . The method according to claim 5 , wherein the weather-adaptive diffusion model is diffused using noise obtained by subtracting the first feature from the second feature for each of the multiple time steps in the forward process.
7 . The method according to claim 6 , wherein the weather-adaptive diffusion model generates the enhanced features by removing noise using a conditional autoencoder that receives a second feature at each corresponding one of the multiple time steps and the weather reference features from the weather codebook in the reverse process.
8 . The method according to claim 7 , wherein the weather-adaptive diffusion model calculates similarity between the input features and the weather reference features and transfers its enhancement to the input features.
9 . The method according to claim 7 , wherein the weather-adaptive diffusion model is trained using the following weather-adaptive enhancement (WAE) loss function that enables estimation of fog variants through several forward and reverse processes, L wae = E x c , ϵ n ∼ F , t [  ϵ n - ϵ θ ( x t c , t , x r )  2 2 ] where ∈ n is noise obtained by subtracting the second feature from the first feature, ∈ θ is estimated noise, x t c is a first feature at a t-th time step, and x r is a weather reference feature.
10 . The method according to claim 3 , wherein: the reference weather is clear weather; the first feature is a clear feature and the second feature is a foggy feature; and the first enhanced feature is an enhanced feature for clear weather and the second enhanced feature is an enhanced feature for foggy weather.
11 . An apparatus for monocular 3D object detection, comprising: an encoder configured to encode input features of input images; a weather codebook configured to generate weather reference features, which contain knowledge of reference weather and indicate a degree of enhancement for the input features; a weather-adaptive diffusion model configured to obtain enhanced features by enhancing the input features with reference to the weather reference features; and a detection block configured to perform monocular 3D object detection using the enhanced features.
12 . The apparatus according to claim 11 , wherein the weather codebook is trained through a weather codebook training phase that comprises: randomly initializing embedding weight parameters of the weather codebook; obtaining a first enhanced feature by passing a first feature through a convolutional layer; obtaining a first weather reference feature through a quantization process on each element of the first enhanced feature; performing softmax and global average pooling (GAP) on the first feature and the first weather reference feature to generate first and second probabilities representing importance of each channel, respectively; and training the weather codebook through use of a clear knowledge embedding (CKE) loss function calculated using Kullback-Leibler divergence for the first and second probabilities.
13 . The apparatus according to claim 12 , wherein the weather codebook training phase further comprises: obtaining a second enhanced feature by passing a second feature through a convolutional layer; obtaining a second weather reference feature through the quantization process on each element of the second enhanced feature; and training the weather codebook through use of a weather-invariant guiding (WIG) loss function that guides the second feature to be memorized as a feature of first weather using the first and second weather reference features.
14 . The apparatus according to claim 13 , wherein in the weather codebook training phase, the weather codebook is trained using a clear knowledge recalling (CKR) loss function that adds the CKE loss function and the WIG loss function.
15 . The apparatus according to claim 13 , wherein the weather-adaptive diffusion model is trained by performing forward and reverse processes in multiple time steps of fixed Markov Chain, and uses only the reverse process in its inference phase.
16 . The apparatus according to claim 15 , wherein the weather-adaptive diffusion model is diffused using noise obtained by subtracting the first feature from the second feature for each of the multiple time steps in the forward process.
17 . The apparatus according to claim 16 , wherein the weather-adaptive diffusion model generates the enhanced features by removing noise using a conditional autoencoder that receives a second feature at each corresponding one of the multiple time steps and the weather reference features from the weather codebook in the reverse process.
18 . The apparatus according to claim 17 , wherein the weather-adaptive diffusion model calculates similarity between the input features and the weather reference features and transfers its enhancement to the input features.
19 . The apparatus according to claim 17 , wherein the weather-adaptive diffusion model is trained using the following weather-adaptive enhancement (WAE) loss function that enables estimation of fog variants through several forward and reverse processes, L wae = E x c , ϵ n ∼ F , t [  ϵ n - ϵ θ ( x t c , t , x r )  2 2 ] where ∈ n is noise obtained by subtracting the second feature from the first feature, ∈ θ is estimated noise, x t c is a first feature at a t-th time step, and x r is a weather reference feature.
20 . The apparatus according to claim 13 , wherein: the reference weather is clear weather; the first feature is a clear feature and the second feature is a foggy feature; and the first enhanced feature is an enhanced feature for clear weather and the second enhanced feature is an enhanced feature for foggy weather.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present application claims priority to and the benefit of Korean Patent Application No. 10-2024-0155302, filed on Nov. 5, 2024, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference. BACKGROUND 1. Technical Field The present disclosure relates to an apparatus and method for monocular 3D object detection using weather-adaptive diffusion models. 2. Description of Related Art Monocular 3D object detection is intended to detect 3D objects using only a single camera. Unlike LiDAR-based methods that use expensive LiDAR sensors for depth estimation and stereo-based methods that require synchronized stereo cameras, the monocular 3D object detection requires only monocular images, resulting in lower computational costs and fewer resources. Due to these characteristics, the monocular 3D object detection is being applied to various real-world applications such as autonomous vehicles and robotics. However, existing monocular 3D object detectors mainly focus on ideal autonomous driving environments such as clear weather. Hence, it is difficult to apply these detectors to real-world situations with bad weather conditions such as fog or rain. Among others, fog presents the biggest challenge compared to other types of weather. This is because fog is dense and diffuse, causing difficulty in object detection due to its characteristics of strongly scattering and absorbing light. Since the monocular 3D object detection relies solely on visual information from monocular images, unlike with LiDAR, it is important to design detectors to achieve improved performance even in low-visibility situations such as fog. SUMMARY The present disclosure is intended to overcome problems with conventional 3D object detection technology that suffers from performance degradation when it is applied to real-world environments with extreme changes in weather such as snow, rain, and fog that do not exist in learning environments, and an object of the present disclosure is to provide an apparatus and method for monocular 3D object detection that are resistant to weather. In accordance with an aspect of the present disclosure, there is provided a method for monocular 3D object detection, which includes encoding input features of input images, generating weather reference features that contain knowledge of reference weather and indicate a degree of enhancement for the input features by a weather codebook, obtaining enhanced features by enhancing the input features with reference to the weather reference features by a weather-adaptive diffusion model, and performing monocular 3D object detection using the enhanced features by a detection block. The weather codebook may be trained through a weather codebook training phase. The weather codebook training phase may include randomly initializing embedding weight parameters of the weather codebook, obtaining a first enhanced feature by passing a first feature through a convolutional layer, obtaining a first weather reference feature through a quantization process on each element of the first enhanced feature, performing softmax and global average pooling (GAP) on the first feature and the first weather reference feature to generate first and second probabilities representing importance of each channel, respectively, and training the weather codebook through use of a clear knowledge embedding (CKE) loss function calculated using Kullback-Leibler divergence for the first and second probabilities. The weather codebook training phase may further include obtaining a second enhanced feature by passing a second feature through a convolutional layer, obtaining a second weather reference feature through the quantization process on each element of the second enhanced feature, and training the weather codebook through use of a weather-invariant guiding (WIG) loss function that guides the second feature to be memorized as a feature of first weather using the first and second weather reference features. In the weather codebook training phase, the weather codebook may be trained using a clear knowledge recalling (CKR) loss function that adds the CKE loss function and the WIG loss function. The weather-adaptive diffusion model may be trained by performing forward and reverse processes in multiple time steps of fixed Markov Chain, and use only the reverse process in its inference phase. The weather-adaptive diffusion model may be diffused using noise obtained by subtracting the first feature from the second feature for each of the multiple time steps in the forward process. The weather-adaptive diffusion model may generate the enhanced features by removing noise using a conditional autoencoder that receives a second feature at each corresponding one of the multiple time steps and the weather reference features from the weather codebook in the reverse process. The weather-adaptive diffusion model may calculate similarity betwee