KR-20260067052-A - Apparatus and Method for Monocular 3D Object Detection Using Weather-Adaptive Diffusion Model
Abstract
A monocular 3D object detection device and method using a weather-adaptive diffusion model are provided. A monocular 3D object detection device according to one embodiment of the present invention comprises: an encoder that encodes input features of an input image; a weather codebook that generates a weather reference feature including knowledge of a reference weather that indicates the degree of enhancement of the input feature; a weather-adaptive diffusion model that enhances the input feature by referring to the weather reference feature to obtain an enhanced feature; and a detection block that performs monocular 3D object detection using the enhanced feature.
Inventors
- 김형일
- 김성태
- 김정욱
- 오영민
Assignees
- 한국전자통신연구원
Dates
- Publication Date
- 20260512
- Application Date
- 20241105
Claims (20)
- Encoding step for encoding input features of an input image; A step of generating a weather reference feature including knowledge of reference weather, indicating the degree of improvement for the above input feature using a weather codebook; A step of obtaining an enhanced feature by enhancing an input feature by referring to the weather reference feature in a weather-adaptive diffusion model; and Step of performing monocular 3D object detection using the enhanced features in the detection block A monocular 3D object detection method comprising
- In paragraph 1, the weather codebook is, A step of randomly initializing the embedding weight parameters of the weather codebook; A step of obtaining a first enhanced feature by passing the first feature through a convolution layer; A step of obtaining a first weather reference feature by undergoing a quantization process for each element of the first enhanced feature; A step of generating a first probability and a second probability representing the importance of each channel by performing softmax and global average pooling (GAP) on the first feature and the first weather reference feature, respectively; A step of training a weather codebook using a CKE loss calculated using the Kullback-Leibler divergence for the first and second probabilities above. That which has been learned through a weather codebook learning stage including, Monocular 3D object detection method.
- In paragraph 2, the weather codebook learning step is, A step of obtaining a second enhanced feature by passing the second feature through a convolution layer; A step of obtaining a second weather reference feature by undergoing a quantization process for each element of the second enhanced feature; A step of training a weather codebook using a WIG loss function that guides the second feature to be remembered as the first weather feature using the first weather reference feature and the second weather reference feature. A monocular 3D object detection method further comprising
- In paragraph 3, the weather codebook learning step is, Training a weather codebook using a CKR loss function which is the sum of the above CKE loss function and the above WIG loss function, Monocular 3D object detection method.
- In paragraph 4, the weather-adaptive diffusion model is, It is learned by performing forward and reverse processes with multiple time steps of a fixed Markov chain, and Using only the reverse process in the inference stage, Monocular 3D object detection method.
- In paragraph 5, the weather-adaptive diffusion model is, Diffusion using noise obtained by subtracting the first feature from the second feature at each of the plurality of time steps of the forward process, Monocular 3D object detection method.
- In paragraph 6, the weather-adaptive diffusion model is, In the reverse process, for each of the plurality of time steps, a conditional autoencoder is used to remove noise and generate the enhanced feature by using a conditional autoencoder that receives the second feature of the corresponding time step and the weather reference feature from the weather codebook. Monocular 3D object detection method.
- In paragraph 7, the weather-adaptive diffusion model is, Calculating the similarity between input features and weather reference features, transferring enhancements to input features, Monocular 3D object detection method.
- In paragraph 7, the weather-adaptive diffusion model is, Weather-adaptive enhancement (WAE) loss function that enables the estimation of fog deformation through multiple forward and backward processes It is learned using, is the noise obtained by subtracting the second feature from the first feature, is estimated noise, is the first feature of the t-th time step, is a weather reference feature, Monocular 3D object detection method.
- In any one of paragraphs 2 through 9, The weather mentioned above is clear, and The first characteristic above is a clear characteristic, and the second characteristic above is a foggy characteristic, The first enhanced feature is an enhanced feature for clear weather, and the second enhanced feature is an enhanced feature for foggy weather, Monocular 3D object detection method.
- Encoder that encodes input features of an input image; A weather codebook that generates a weather reference feature containing knowledge of the reference weather, indicating the degree of improvement for the above-mentioned input feature; A weather-adaptive diffusion model that obtains enhanced features by enhancing input features by referring to the above weather reference features; and A detection block that performs monocular 3D object detection using the above enhanced features A monocular 3D object detection device equipped with
- In Paragraph 11, the above weather codebook is, A step of randomly initializing the embedding weight parameters of the weather codebook; A step of obtaining a first enhanced feature by passing the first feature through a convolution layer; A step of obtaining a first weather reference feature by undergoing a quantization process for each element of the first enhanced feature; A step of generating a first probability and a second probability representing the importance of each channel by performing softmax and global average pooling (GAP) on the first feature and the first weather reference feature, respectively; A step of training a weather codebook using a CKE loss calculated using the Kullback-Leibler divergence for the first and second probabilities above. That which has been learned through a weather codebook learning stage including, Monocular 3D object detection device.
- In Clause 12, the above weather codebook learning step is, A step of obtaining a second enhanced feature by passing the second feature through a convolution layer; A step of obtaining a second weather reference feature by undergoing a quantization process for each element of the second enhanced feature; A step of training a weather codebook using a WIG loss function that guides the second feature to be remembered as the first weather feature using the first weather reference feature and the second weather reference feature. A monocular 3D object detection device further comprising
- In Clause 13, the weather codebook learning step above is, Training a weather codebook using a CKR loss function which is the sum of the above CKE loss function and the above WIG loss function, Monocular 3D object detection device.
- In Clause 13, the weather-adaptive diffusion model is, It is learned by performing forward and reverse processes with multiple time steps of a fixed Markov chain, and Using only the reverse process in the inference stage, Monocular 3D object detection device.
- In paragraph 15, the above weather-adaptive diffusion model is, Diffusion using noise obtained by subtracting the first feature from the second feature at each of the plurality of time steps of the forward process, Monocular 3D object detection device.
- In Clause 16, the above weather-adaptive diffusion model is, In the reverse process, for each of the plurality of time steps, a conditional autoencoder is used to remove noise and generate the enhanced feature by using a conditional autoencoder that receives the second feature of the corresponding time step and the weather reference feature from the weather codebook. Monocular 3D object detection device.
- In Clause 17, the above weather-adaptive diffusion model is, Calculating the similarity between input features and weather reference features, transferring enhancements to input features, Monocular 3D object detection method.
- In Clause 17, the above weather-adaptive diffusion model is, Weather-adaptive enhancement (WAE) loss function that enables the estimation of fog deformation through multiple forward and backward processes It is learned using, is the noise obtained by subtracting the second feature from the first feature, is estimated noise, is the first feature of the t-th time step, is a weather reference feature, Monocular 3D object detection device.
- In any one of paragraphs 12 through 19, The weather mentioned above is clear, and The first characteristic above is a clear characteristic, and the second characteristic above is a foggy characteristic, The first enhanced feature is an enhanced feature for clear weather, and the second enhanced feature is an enhanced feature for foggy weather, Monocular 3D object detection device.
Description
Apparatus and Method for Monocular 3D Object Detection Using Weather-Adaptive Diffusion Model The present invention relates to a monocular 3D object detection device and method using a weather-adaptive diffusion model. Monocular 3D Object Detection aims to detect objects in 3D using only a single camera. Unlike LiDAR-based methods that use expensive LiDAR sensors for depth estimation and stereo-based methods that require synchronized stereo cameras, monocular 3D object detection requires only a single image, offering the advantage of lower computational costs and fewer resources. Due to these characteristics, monocular 3D object detection is being applied in various real-world applications such as autonomous vehicles and robotics. However, existing monocular 3D object detection systems primarily focus on ideal autonomous driving environments, such as clear weather. Consequently, there are difficulties in applying these detectors to real-world situations where adverse weather conditions, such as fog or rain, exist. Among these, fog presents the greatest challenge compared to other weather conditions. This is because the dense and diffuse nature of fog causes difficulties in object detection due to its strong scattering and absorption of light. Since monocular 3D object detection relies solely on visual information from monocular images, unlike LiDAR, it is important to design the detector to achieve enhanced performance even in situations with poor visibility, such as fog. FIG. 1 is a block diagram showing the configuration of a weather-adaptive diffusion model device according to one embodiment of the present invention. FIG. 2 is a flowchart showing the operation flow during inference of a weather-adaptive diffusion model method according to one embodiment of the present invention. Figure 3 shows the operation flow during inference of a weather-adaptive diffusion model method according to one embodiment of the present invention. Figure 4 conceptually illustrates the process of learning a weather codebook using the WIG loss function. Figure 5 conceptually shows how to use the CKE loss function so that the weather codebook can remember knowledge of clear weather. Figure 6 shows the learning process of the weather-adaptive diffusion model of the present invention. The aforementioned objectives of the present invention, as well as other objectives, advantages, and features, and the methods for achieving them, will become clear from the embodiments described in detail below together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but can be implemented in various different forms, and the following embodiments are provided merely to easily inform those skilled in the art of the purpose, structure, and effects of the invention, and the scope of the rights of the present invention is defined by the description in the claims. Meanwhile, the terms used in this specification are for describing the embodiments and are not intended to limit the invention. In this specification, the singular form includes the plural form unless specifically stated otherwise in the text. As used in this specification, "comprises" and/or "comprising" do not exclude the presence or addition of one or more other components, steps, actions, and/or elements to the mentioned components, steps, actions, and/or elements. The following explanation uses examples of weather types, clear weather, and foggy weather, but it is applicable to other weather conditions such as rainy weather. In the present invention, the following two main aspects are considered for a weather-robust monocular 3D object detection device. (1) How to quantify the degree of improvement required for the input image. (2) How to guide the model to the input image representation. To address these two main aspects, the monocular 3D object detection device of the present invention comprises an encoder (110), a weather codebook (Z), a weather-adaptive diffusion model (120), and a detection module (130) as illustrated in FIG. 1. The encoder (110) encodes input features from an input image. The weather codebook (Z) is a new concept for how to quantify the degree of improvement required for an input image in a situation where the weather is unknown. The weather codebook (Z) learns knowledge about clear weather during the learning phase and transmits it to the weather-adaptive diffusion model (120) to improve weather-related content. Through the weather codebook (Z), weather reference features, which are reference knowledge for appropriate improvement according to weather conditions, can be transmitted to the weather-adaptive diffusion model (120) regardless of what weather image is given. The weather-adaptive diffusion model (120) effectively enhances feature representation according to weather conditions, thereby enabling monocular 3D object detection to adapt to various weather conditions. The detection block (130) perfo