US-12621564-B2 - Selective operating mode switching for visible and infrared imaging

US12621564B2US 12621564 B2US12621564 B2US 12621564B2US-12621564-B2

Abstract

In various examples, an image processing pipeline may switch between different operating or switching modes based on speed of ego-motion and/or the active gear (e.g., park vs. drive) of a vehicle or other ego-machine in which an RGB/IR camera is being used. For example, a first operating or switching mode that toggles between IR and RGB imaging modes at a fixed frame rate or interval may be used when the vehicle is in motion, in a particular gear (e.g., drive), and/or traveling above a threshold speed. In another example, a second operating or switching mode that toggles between IR and RGB imaging modes based on detected light intensity may be used when the vehicle is in stationary, in park (or out of gear), and/or traveling below a threshold speed.

Inventors

Sakthivel Sivaraman
Rajath Shetty
Animesh Khemka
Niranjan Avadhanam

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260505
Application Date: 20231025

Claims (20)

1 . One or more processors comprising processing circuitry to: operate an image processing pipeline of an ego-machine in a first imaging switching mode that toggles between an RGB imaging mode and an infrared (IR) imaging mode at a fixed rate; and based at least on a speed or a state of the ego-machine, switch to operating the image processing pipeline in a second imaging switching mode that toggles between the RGB imaging mode and the IR imaging mode based at least on an amount of detected light intensity; select, while operating in at least one individual imaging switching mode of the first imaging switching mode or the second imaging switching mode, one or more machine learning models that are applicable to the at least one individual imaging switching mode from a plurality of supported machine learning models; and perform one or more detection tasks using the one or more machine learning models.
2 . The one or more processors of claim 1 , wherein the processing circuitry is further to switch, based at least on detecting that the speed of the ego-machine is above a threshold, to operating the image processing pipeline in the first imaging switching mode that toggles between the RGB imaging mode and the IR imaging mode at the fixed rate.
3 . The one or more processors of claim 1 , wherein the processing circuitry is further to switch, based at least on determining that the state engaged by the ego-machine is a drive gear, to operating the image processing pipeline in the first imaging switching mode that toggles between the RGB imaging mode and the IR imaging mode at the fixed rate.
4 . The one or more processors of claim 1 , wherein the processing circuitry is further to operate the image processing pipeline in the first imaging switching mode based at least on generating one or more frames of RGB image data and one or more frames of IR image data, and determining whether to perform at least one detection task of the one or more detection tasks on the one or more frames of RGB image data or on the one or more frames of IR image data based at least on a second amount of detected light intensity.
5 . The one or more processors of claim 1 , wherein the processing circuitry is further to operate the image processing pipeline in the first imaging switching mode based at least on generating and applying first image data to a first set of the plurality of supported machine learning models, and operate the image processing pipeline in the second imaging switching mode based at least on generating and applying second image data to a second set of the plurality of supported machine learning models that is different from the first set.
6 . The one or more processors of claim 1 , wherein the processing circuitry is further to operate the image processing pipeline in the second imaging switching mode based at least on generating and applying a representation of one or more RGB image frames to a first set of the plurality of supported machine learning models, and generating and applying a representation of one or more IR image frames to a second set of the plurality of supported machine learning models that is different from the first set.
7 . The one or more processors of claim 1 , wherein the processing circuitry is further to operate the image processing pipeline in the first imaging switching mode based at least on generating and applying a representation of one or more RGB image frames and one or more IR image frames to at least one individual machine learning model of the plurality of supported machine learning models.
8 . The one or more processors of claim 1 , wherein the processing circuitry is further to operate the image processing pipeline in the first imaging switching mode based at least on using a common sensor to generate one or more IR image frames and one or more RGB image frames, monitor an operator of the ego-machine based at least on executing a first detection task of the one or more detection tasks using the one or more IR image frames generated by the common sensor, and monitor a non-operator occupant of the ego-machine based at least on executing a second detection task of the one or more detection tasks using the one or more RGB image frames generated by the common sensor.
9 . The one or more processors of claim 1 , wherein the processing circuitry is further to: based at least on the ego-machine being in a state corresponding to a rate of velocity below a designated threshold, operate the image processing pipeline in the first imaging switching mode to generate first image data and perform at least one of child presence detection, three-dimensional (3D) pose estimation, or monocular depth estimation of the one or more detection tasks on the first image data; and based at least on the ego-machine being in motion, operate the image processing pipeline in the second imaging switching mode to generate second image data and perform at least one of driver distraction detection, driver drowsiness detection, driver hands on wheel detection, driver 3D pose estimation, occupant pose estimation, or monocular depth estimation of the one or more detection tasks on the second image data.
10 . The one or more processors of claim 1 , wherein the one or more processors are comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
11 . A system comprising one or more hardware processors to: switch, by an image processing pipeline, between a first switching mode that produces infrared (IR) illumination at a fixed rate, and a second switching mode that produces the IR illumination at a dynamic rate determined based at least on an amount of detected light intensity; select, while operating in at least one individual switching mode of the first switching mode or the second switching mode, one or more machine learning models from a plurality of supported machine learning models based at least on the at least one individual switching mode being in operation; and execute one or more detection tasks using the one or more machine learning models.
12 . The system of claim 11 , wherein the one or more hardware processors are further to switch, based at least on detecting that a speed of an ego-machine associated with the image processing pipeline is above a threshold, to operating the image processing pipeline in the first switching mode that produces the IR illumination at the fixed rate.
13 . The system of claim 11 , wherein the one or more hardware processors are further to switch, based at least on detecting that a speed of an ego-machine associated with the image processing pipeline is below a threshold, to operating the image processing pipeline in the second switching mode that produces the IR illumination at the dynamic rate.
14 . The system of claim 11 , wherein, in the first switching mode, the one or more hardware processors are further to generate one or more frames of RGB image data and one or more frames of IR image data, and determine whether to perform at least one detection task of the one or more detection tasks on the one or more frames of RGB image data or on the one or more frames of IR image data based at least on a second amount of detected light intensity.
15 . The system of claim 11 , wherein, in the first switching mode, the one or more hardware processors are further to generate and apply first image data to a first set of the plurality of supported machine learning models, wherein, in the second switching mode, the one or more hardware processors are further to generate and apply second image data to a second set of the plurality of supported machine learning models that is different from the first set.
16 . The system of claim 11 , wherein, in the second switching mode, the one or more hardware processors are further to generate and apply a representation of one or more RGB image frames to a first set of the plurality of supported machine learning models, and generate and apply a representation of one or more IR image frames to a second set of the plurality of supported machine learning models that is different from the first set.
17 . The system of claim 11 , wherein, in the first switching mode, the one or more hardware processors are further to generate and apply a representation of one or more RGB image frames and one or more IR image frames to at least one individual machine learning model of the plurality of supported machine learning models.
18 . The system of claim 11 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
19 . A method comprising: operating an image processing pipeline of an ego-machine that switches, based at least on a speed or an active gear of the ego-machine, between a first switching mode that causes infrared (IR) illumination to be produced at a fixed rate and a second switching mode that causes the IR illumination to be produced at a dynamic rate based at least on an amount of detected light intensity; and selecting, while operating in at least one individual switching mode of the first switching mode or the second switching mode, one or more machine learning models from a plurality of supported machine learning models based at least on the at least one individual switching mode being active.
20 . The method of claim 19 , wherein the method is performed by at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is related to U.S. patent application Ser. No. 18/494,010, filed entitled “Infrared Illumination Control for Visible and Infrared Imaging Modes,” filed on Oct. 25, 2023. BACKGROUND Autonomous and semi-autonomous vehicles and other machines rely on machine learning approaches—such as those using deep neural networks (DNNs)—to analyze images of an interior space (e.g., cabin, cockpit) to perform a number of important purposes. An Occupant Monitoring System (OMS) is an example of a system that may be used within a vehicle cabin to perform real-time assessments of occupant or operator presence, gaze, alertness, and/or other conditions. For example, OMS sensors (such as, but not limited to, RGB sensors, infrared (IR) sensors, depth sensors, cameras, and/or other optical sensors) may be used to track an occupant's or operator's gaze direction, head pose, and/or blinking. This information may be used to determine a level of attentiveness of the occupant or operator (e.g., to detect drowsiness, fatigue, and/or distraction), and/or to take responsive action to prevent harm to the occupant, operator or to the surrounding environment—e.g., by redirecting their attention to a potential hazard, pulling the vehicle over, and/or the like. For example, DNNs may be used to detect that an operator is falling asleep at the wheel, based on the operator's downward gaze toward the floor of the vehicle or prolonged periods without blinking, and the detection may lead to an adjustment in the speed and/or direction of the car (e.g., pulling the vehicle over to the side of the road) or an auditory alert to the operator. Conventional OMSs have a variety of challenges and drawbacks. For example, some OMSs seek to perform tasks like driver distraction, drowsiness detection, hands-on-wheel detection, three-dimensional (3D) pose estimation, monocular depth estimation, and/or child presence detection with high accuracy under different operating conditions to prevent overly frequent, redundant, or unnecessary alerts to the occupants. However, some operating conditions may lead to depth ambiguities and/or occlusions in sensor data that can limit the accuracy of these tasks. In one example, OMSs often need to perform in the presence of rapid, frequent, and drastic changes in illumination, which may occur in scenarios such as when a vehicle enters or exits a tunnel or parking lot. However, changes in illumination can negatively impact depth perception and detection accuracy, and can even cause failures in detection altogether—for example, when the image is too dark to detect the desired features during operation in visible light spectrums. OMS systems have begun using RGB/IR cameras that can capture both visible light (in the red, green, and blue or “RGB” spectrums) and IR light. These cameras are equipped with sensors that are sensitive to both types of light, allowing them to record images or videos in both the visible and IR spectrums. Most conventional techniques tend to operate in a single mode, relying on either RGB or IR images for their respective tasks, thereby limiting their ability to operate accurately in the presence of changes in scene illumination. One conventional technique uses an ambient light sensor to detect ambient light and switch between using RGB pixel sensors and IR pixel sensors when the amount of detected ambient light falls below a threshold. However, there are often intermediate and low levels of ambient light where the detection accuracy in RGB and/or IR domains is negatively impacted. One existing technique attempts to handle changes in scene illumination by switching on an IR emitter to provide IR illumination that boosts signal to noise ratio in the resulting images. However, IR emitters can have several degrading effects on a resulting image. IR light act as contamination for color reproduction and can result in color errors when producing RGB images. Furthermore, IR light can lower the dynamic range of the camera by introducing extra illumination in bright areas. One currently available RGB/IR camera offers an operating mode known as ABAB mode, which switches an IR emitter on and off and alternates between generating “A” frames (IR images) and “B” frames (RGB images). However, this mode generates streams of both IR and RGB image data, one of which is usually not used, effectively wasting the computational resources, power, bandwidth, and memory used to generate that stream. As such, there is a need for improved sensing and/or detection techniques that generate image data more accurately and/or with sharper detail, and/or that facilitate improved detection accuracy from such image data, in the presence of changes in scene illumination. SUMMARY Embodiments of the present disclosure relate to operating modes for visible and infrared imaging. For example, systems and methods are disclosed that toggle between IR and RGB imaging modes (e.g., toggling IR illu