US-20260127714-A1 - METHOD AND SYSTEM FOR DETERMINING AUTO-EXPOSURE FOR HIGH-DYNAMIC RANGE OBJECT DETECTION USING NEURAL NETWORK

US20260127714A1US 20260127714 A1US20260127714 A1US 20260127714A1US-20260127714-A1

Abstract

An auto-exposure control is proposed for high dynamic range images, along with a neural network for exposure selection that is trained jointly, end-to-end with an object detector and an image signal processing (ISP) pipeline. Corresponding method and system for high dynamic range object detection are also provided.

Inventors

Emmanuel Luc Julien Onzon
Felix Heide
Fahim MANNAN

Assignees

TORC CND ROBOTICS, INC.

Dates

Publication Date: 20260507
Application Date: 20251230

Claims (20)

1 . A computing system for high dynamic range (HDR) object detection of an autonomous vehicle, comprising at least one processor in communication with at least one memory device, the at least one processor programmed to: form an object detection system, the object detection system including: an auto-exposure neural network configured to receive a low dynamic range (LDR) image acquired by an LDR sensor and predict an exposure value of the LDR sensor; an image signal processing (ISP) pipeline configured to process the LDR image based on the predicted exposure value; and an object detection neural network configured to detect objects in the LDR image based on the processed LDR image and the predicted exposure value; and train the object detection system by generating a training dataset, the training dataset including at least one simulated LDR image and corresponding ground truth output from the object detection neural network with the at least one simulated LDR image as an input to the auto-exposure neural network, the at least one simulated LDR image generated based on an HDR raw image and at least one predicted exposure value by the auto-exposure neural network, the ground truth annotated using the HDR raw image.
2 . The computing system of claim 1 , wherein the at least one processor is further programmed to: generate the training dataset by: receiving a first HDR image of a first frame, the HDR raw image being of a second frame immediately adjacent to the first frame; simulating a first simulated LDR image with a random exposure shift, based on the first HDR image; predicting, by the auto-exposure neural network, the at least one predicted exposure value, by inputting the first simulated LDR image with the random exposure shift into the auto-exposure neural network; and generating a second simulated LDR image based on the HDR raw image and the at least one predicted exposure value.
3 . The computing system of claim 2 , wherein the at least one processor is further programmed to: determine a base exposure based on the HDR raw image; and apply the random exposure shift to the base exposure.
4 . The computing system of claim 1 , wherein the at least one processor is further programmed to: train the object detection system using a first loss associated with a region proposal network in the object detection neural network, the region proposal network outputting regions of interest including candidates of objects in an input image to the object detection system.
5 . The computing system of claim 4 , wherein the at least one processor is further programmed to: train the object detection system using a second loss associated with the regions of interest.
6 . The computing system of claim 5 , wherein the at least one processor is further programmed to: train the object detection system using a total loss as a weighted sum of the first loss and the second loss.
7 . The computing system of claim 1 , wherein the ground truth includes classifications and locations of objects in the HDR raw image.
8 . The computing system of claim 1 , wherein the at least one processor is further programmed to: simulate noise in the at least one simulated LDR image, based on the HDR raw image; and generate the at least one simulated LDR image by adding the simulated noise to the at least one simulated LDR image.
9 . The computing system of claim 8 , wherein the at least one processor is further programmed to: simulate the noise by randomly varying a variance of the noise.
10 . The computing system of claim 8 , wherein the at least one processor is further programmed to: simulate the noise by determining a variance of the noise including a variance of spatially-correlated noise and a variance of spatially-uncorrelated noise.
11 . The computing system of claim 1 , wherein the at least one processor is further programmed to: train the object detection system by: updating at least one of weights or biases of the object detection neural network; updating parameters of the ISP pipeline; and updating at least one of weights or biases of the auto-exposure neural network.
12 . A computer-implemented method for high dynamic range (HDR) object detection of an autonomous vehicle, the method comprising: forming an object detection system, the object detection system including: an auto-exposure neural network configured to receive a low dynamic range (LDR) image acquired by an LDR sensor and predict an exposure value of the LDR sensor; an image signal processing (ISP) pipeline configured to process the LDR image based on the predicted exposure value; and an object detection neural network configured to detect objects in the LDR image based on the processed LDR image and the predicted exposure value; and training the object detection system by generating a training dataset, the training dataset including at least one simulated LDR image and corresponding ground truth output from the object detection neural network with the at least one simulated LDR image as an input to the auto-exposure neural network, the at least one simulated LDR image generated based on an HDR raw image and at least one predicted exposure value by the auto-exposure neural network, the ground truth annotated using the HDR raw image.
13 . The method of claim 12 , wherein generating the training dataset further comprises: receiving a first HDR image of a first frame, the HDR raw image being of a second frame immediately adjacent to the first frame; simulating a first simulated LDR image with a random exposure shift, based on the first HDR image; predicting, by the auto-exposure neural network, the at least one predicted exposure value, by inputting the first simulated LDR image with the random exposure shift into the auto-exposure neural network; and generating a second simulated LDR image based on the HDR raw image and the at least one predicted exposure value.
14 . The method of claim 12 , wherein training the object detection system further comprises: training the object detection system using a total loss as a weighted sum of a first loss and a second loss, the first loss associated with a region proposal network in the object detection neural network, the region proposal network outputting regions of interest including candidates of objects in an input image to the object detection system, and the second loss associated with the regions of interest.
15 . The method of claim 12 , wherein the ground truth includes classifications and locations of objects in the HDR raw image.
16 . One or more non-transitory computer-readable storage media for high dynamic range (HDR) object detection of an autonomous vehicle, the one or more non-transitory computer-readable storage media comprising a plurality of instructions stored thereon that, in response to being executed, cause a system to: form an object detection system, the object detection system including: an auto-exposure neural network configured to receive a low dynamic range (LDR) image acquired by an LDR sensor and predict an exposure value of the LDR sensor; an image signal processing (ISP) pipeline configured to process the LDR image based on the predicted exposure value; and an object detection neural network configured to detect objects in the LDR image based on the processed LDR image and the predicted exposure value; and train the object detection system by generating a training dataset, the training dataset including at least one simulated LDR image and corresponding ground truth output from the object detection neural network with the at least one simulated LDR image as an input to the auto-exposure neural network, the at least one simulated LDR image generated based on an HDR raw image and at least one predicted exposure value by the auto-exposure neural network, the ground truth annotated using the HDR raw image.
17 . The one or more non-transitory computer-readable storage media of claim 16 , wherein the plurality of instructions further cause the system to generate the training dataset by: receiving a first HDR image of a first frame, the HDR raw image being of a second frame immediately adjacent to the first frame; simulating a first simulated LDR image with a random exposure shift, based on the first HDR image; predicting, by the auto-exposure neural network, the at least one predicted exposure value, by inputting the first simulated LDR image with the random exposure shift into the auto-exposure neural network; and generating a second simulated LDR image based on the HDR raw image and the at least one predicted exposure value.
18 . The one or more non-transitory computer-readable storage media of claim 16 , wherein the plurality of instructions further cause the system to: train the object detection system using a total loss as a weighted sum of a first loss and a second loss, the first loss associated with a region proposal network in the object detection neural network, the region proposal network outputting regions of interest including candidates of objects in an input image to the object detection system, and the second loss associated with the regions of interest.
19 . The one or more non-transitory computer-readable storage media of claim 16 , wherein the ground truth includes classifications and locations of objects in the HDR raw image.
20 . The one or more non-transitory computer-readable storage media of claim 16 , wherein the plurality of instructions further cause the system to: train the object detection system by: updating at least one of weights or biases of the object detection neural network; updating parameters of the ISP pipeline; and updating at least one of weights or biases of the auto-exposure neural network.

Description

CROSS REFERENCE TO RELATED APPLICATIONS The present application is a continuation of U.S. patent application Ser. No. 19/373,298 filed on Oct. 29, 2025, which is a continuation of U.S. patent application Ser. No. 17/722,261 filed on Apr. 15, 2022 (ALX-009-US). ALX-009-US is now U.S. Pat. No. 12,482,068 issued on Nov. 25, 2025. ALX-009-US claims benefit from U.S. provisional patent application Ser. No. 63/175,505, filed on Apr. 15, 2021 (ALX-009-US-prov). ALX-009-US is also a continuation-in-part of U.S. patent application Ser. No. 17/712,727 filed on Apr. 4, 2022, which is now U.S. Pat. No. 11,783,231 issued on Oct. 10, 2023 (ALX-004-US-CON2). ALX-004-US-CON2 is a continuation of U.S. patent application Ser. No. 16/927,741 filed on Jul. 13, 2020, which is now U.S. Pat. No. 11,295,176 issued on Apr. 5, 2022 (ALX-004-US-CON1). ALX-004-US-CON1 is a continuation of U.S. patent application Ser. No. 16/025,776 filed on Jul. 2, 2018, which is now a U.S. Pat. No. 10,713,537 issued on Jul. 14, 2020 (ALX-004-US). ALX-004-US claims benefit from U.S. provisional patent application Ser. No. 62/528,054 filed on Jul. 1, 2017 (ALX-004-US-prov). The entire contents of above noted patents and applications are incorporated herein by reference. FIELD OF THE INVENTION The present invention relates to a system and method for an auto-exposure selection and control employing a neural network, and in particular for determining the auto-exposure for high-dynamic range object detection. BACKGROUND OF THE INVENTION Computer vision systems have to measure and analyze a wide range of luminances, from no ambient illumination at night to a bright sunny day, which may exceed 280 dB expressed as a ratio of the highest to the lowest luminance values. While a typical range of luminance for an ordinary outdoor scene is about 120 dB, there are numerous situations when this range may be much wider. For example, exiting a tunnel may include various scene regions with almost no ambient illumination, the Sun, and scene points with intermediate luminances, all in one image. Capturing this wide dynamic range of luminances has been an open challenge for image sensors, with today's conventional CMOS image sensors being capable of acquiring only about 60-70 dB in a single capture. This constraint of existing image sensors poses a fundamental problem for low-level and high-level vision tasks in uncontrolled scenarios, and for various industrial applications that make decisions relying on computer vision modules in-the-wild, including outdoor robotics, drones, self-driving vehicles, driver assistance systems, navigation, and remote sensing, to name a few. To overcome this limitation, prior art vision pipelines rely on high dynamic range (HDR) sensors that acquire multiple captures with different exposures of the same scene. Numerous prior art explores different HDR sensor designs and acquisition strategies, with sequential capture methods and sensors that split each pixel into two sub-pixels being the most successfully deployed HDR sensor architectures. Although modern HDR image sensors are capable of capturing up to 140 dB at moderate resolutions, e.g., OnSemi™ AR0820AT image sensor, a multi-capture acquisition approach comes with fundamental limitations. Because exposures have different duration or start at different times, capturing a dynamic scene results in motion artefacts, which need to be eliminated. Also custom sensor architectures come at the cost of reduced fill-factor, and hence resolution, and also higher production cost, compared to conventional intensity sensors. Moreover, capturing HDR images not only requires a sensor that can measure the scene but also necessitates a high quality optics for HDR acquisition, without glare and lens flare. High Dynamic Range Imaging. As existing sensors are not capable of capturing an entire dynamic range of luminance values in real-world scenes in a single shot, HDR imaging methods employ multiplexing strategies to recover this dynamic range from multiple measurements with different exposures. For static scenes, conventional HDR acquisition methods rely on temporal multiplexing by sequentially capturing low dynamic range (LDR) images, also to be referred to as standard dynamic range (SDR) images in this application, for different exposures and then combining them by exposure bracketing. These methods suffer from motion artefacts for dynamic scenes, with a large volume of prior art being focused on post-capture stitching, optical flow, and deep learning. While these methods are successful for photography, they are not suitable for real-time applications, for example robotics. For safety-critical applications, including autonomous driving, recent prior art work that hallucinates HDR content from LDR images is also not an alternative for detection and navigation stacks that must measure a real world. Adaptive Camera Control. Although an auto-exposure control, or exposure control, is fundamental to acquisition