US-20260126551-A1 - SENSOR CALIBRATION USING PROJECTED TARGETING FOR VEHICLE OCCUPANT MONITORING

US20260126551A1US 20260126551 A1US20260126551 A1US 20260126551A1US-20260126551-A1

Abstract

In various examples, systems and methods are provided for sensor calibration using projected targeting for vehicle occupant monitoring. A target projector may be used to cause a projection of a target to appear at predefined points on boundaries of the gaze regions. Region mapping data that includes 3D coordinates of the predefined points on the boundaries of the gaze regions is generated in the coordinate system of the target projector by pointing the target projector at each of the predefined points on the boundaries of the gaze regions. One or more sensors may be calibrated based at least on a transformation of the region mapping data from the coordinate system of the target projector to a coordinate system of the one or more sensors.

Inventors

Jia Chi Wu
Dae Jin Kim
Nishant Puri
Rajath Bellipady Shetty
Anshul Jain

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260507
Application Date: 20241106

Claims (20)

1 . One or more processors comprising processing circuitry to: control a target projector to cause a projection of a target to appear at predefined points defining a boundary of a region within an environment; determine a three-dimensional (3D) position corresponding to a location of the projection of the target; generate region mapping data in a coordinate system of the target projector comprising 3D coordinates of the predefined points defining the boundary of the region based at least on the 3D position; and calibrate at least one sensor located within the environment based at least on a transformation of the region mapping data from the coordinate system of the target projector to a coordinate system of the at least one sensor.
2 . The one or more processors of claim 1 , wherein the predefined points defining the boundary of the region correspond to one or more labeled surfaces within the environment.
3 . The one or more processors of claim 1 , wherein the processing circuitry is further to determine a position and an orientation of the at least one sensor in the coordinate system based at least on a fiducial marker on the target projector.
4 . The one or more processors of claim 1 , wherein the target projector comprises a range-finding sensor, wherein the processing circuitry is further to determine the 3D position corresponding to the location of the projection of the target based at least on a distance measured by the range-finding sensor.
5 . The one or more processors of claim 4 , wherein the 3D position corresponding to the location of the projection of the target includes at least one of: an azimuth component, an elevation component, or the distance measured by the range-finding sensor.
6 . The one or more processors of claim 1 , wherein the processing circuitry is further to: control the target projector to cause a projection of a target to appear at one or more points corresponding to a location of the at least one sensor; and determine an offset between the 3D position corresponding to the location of the projection of the target at the one or more points and an expected position of the at least one sensor.
7 . The one or more processors of claim 6 , wherein the processing circuitry is further to update, based at least on the offset, one or more of the region mapping data or calibration of the at least one sensor.
8 . The one or more processors of claim 6 , wherein the processing circuitry is further to: determine whether the offset satisfies a threshold; and validate, based at least on a determination that the offset satisfies the threshold, at least one of the region mapping data or calibration of the at least one sensor.
9 . The one or more processors of claim 1 , wherein the processing circuitry is further to: control the target projector to cause the projection of the target to appear at predefined points defining a boundary of a second region within the environment; generate additional region mapping data in the coordinate system of the target projector based at least on the 3D position corresponding to the location of the projection of the target, wherein the additional region mapping data comprises 3D coordinates of the predefined points defining the boundary of the second region; and calibrate the at least one sensor located within the environment based at least on a transformation of the additional region mapping data from the coordinate system of the target projector to the coordinate system of the at least one sensor.
10 . The one or more processors of claim 1 , wherein the processing circuitry is further to restore a localization of the at least one sensor in the region mapping data after moving the target projector by: generating a partial region mapping scan comprising 3D positions corresponding to the location of the projection of the target when directed towards at least three predefined points defining the boundary of the region; localizing the at least one sensor in the partial region mapping scan based at least on one or more fiducial markers on the target projector; and aligning the region mapping data with the partial region mapping scan.
11 . The one or more processors of claim 1 , wherein the one or more processors are comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system implementing one or more vision language models (VLMs); a system implementing one or more multi-modal language models; a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
12 . A system comprising one or more processors to: control a target projector to cause a projected target to appear at a set of predefined points on a boundary of a region; generate region mapping data in a first coordinate system based at least on a three-dimensional (3D) position corresponding to a location of the projected target, wherein the region mapping data comprises 3D coordinates of the set of predefined points on the boundary of the region; and calibrate at least one sensor based at least on a transformation of the region mapping data from the first coordinate system to a second coordinate system.
13 . The system of claim 12 , wherein the one or more processors are further to: control the target projector to cause the projected target to appear at a second set of predefined points on a boundary of a second region; generate additional region mapping data in the first coordinate system based at least on the 3D position corresponding to the location of the projected target, wherein the additional region mapping data comprises 3D coordinates of the second set of predefined points on the boundary of the second region; and calibrate the at least one sensor based at least on a transformation of the additional region mapping data from the first coordinate system to the second coordinate system.
14 . The system of claim 12 , wherein the 3D position includes a distance measured by a range-finding sensor of the target projector.
15 . The system of claim 12 , wherein the one or more processors are further to localize the at least one sensor based at least on a fiducial marker on the target projector.
16 . The system of claim 12 , wherein the one or more processors are further to validate one or more of the region mapping data or calibration of the at least one sensor based at least on an offset between the projected target and a known position of the at least one sensor.
17 . The system of claim 12 , wherein, after calibration of the at least one sensor, the one or more processors are further to: control the target projector to cause the projected target to appear on a surface of an interior space within the boundary of the region using the target projector; capture an image of the interior space using the at least one sensor, wherein the image captures a gaze of an occupant responsive to projection of the projected target; determine a position in a 3D space corresponding to the location of the projected target; and label the image of the occupant of the interior space based at least on the position in the 3D space.
18 . The system of claim 12 , wherein the at least one sensor comprises at least one of: an RGB optical sensor, an IR optical sensor, or an RGB-IR optical sensor.
19 . The system of claim 12 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system implementing one or more vision language models (VLMs); a system implementing one or more multi-modal language models; a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
20 . A method comprising: calibrating one or more sensors based at least on a transformation of region mapping data from a coordinate system of a target projector to a coordinate system of the one or more sensors, wherein the region mapping data includes 3D coordinates of predefined points on a boundary of a region in the coordinate system of the target projector.

Description

BACKGROUND Autonomous and semi-autonomous vehicles rely on machine learning approaches, such as those using deep neural networks (DNNs), to analyze images of an interior space (e.g., cabin, cockpit, etc.) of a vehicle or other machine. An Occupant Monitoring System (OMS) is an example of a system that may be used within a vehicle cabin to perform real-time assessments of occupant or operator presence, gaze, alertness, and/or other conditions. For example, OMS sensors (such as, but not limited to, red green blue (RGB) sensors, infrared (IR) sensors, depth sensors, cameras, and/or other optical sensors) may be used to track an occupant's or an operator's gaze direction, head pose, and/or blinking. This gaze information may be used to determine a level of attentiveness of the occupant or operator (e.g., to detect drowsiness, fatigue, and/or distraction), and/or to take responsive action to prevent harm to the occupant or operator (e.g., by redirecting their attention to a potential hazard, pulling the vehicle over, and/or the like). For example, DNNs may be used to detect that an operator is falling asleep at the wheel, based on the operator's downward gaze toward the floor of the vehicle, and the detection may lead to an adjustment in the speed and direction of the car (e.g., pulling the vehicle over to the side of the road) or an auditory alert to the operator. OMSs often rely on training DNNs with a high volume of training image data that reflects the facial features of different persons to help increase the accuracy of gaze predictions across all persons. SUMMARY Embodiments of the present disclosure relate to sensor calibration using projected targeting for vehicle occupant monitoring. Systems and methods are disclosed that may be used for, among other things, calibrating vehicle or machine occupant monitoring system sensors with respect to region mapping data in a coordinate system of a target projector. The coordinate system of the target projector may serve as the in-cabin frame of reference coordinate system, which may be referred to herein as the cabin coordinate system. In contrast to conventional calibration systems, the systems and method presented in this disclosure use a target projector to generate region mapping data with three-dimensional (3D) position information for boundary points of regions and calibrate sensors based, at least in part, on the region mapping data. In some embodiments, the target projector may cause a projection of a target to appear at predefined points on boundaries of the regions. The target projector may include a robotic target projector (e.g., a gimbal mounted robotic laser and/or laser range finder). The target projector may include a range-finding sensor (e.g., a laser range finder, an ultrasonic range finder, etc.) to determine a distance from the target projector to the target point where the projected target appears. A representation of the projection point location of the projected target (e.g., in polar coordinates azimuth, elevation, and distance) may be transformed to Cartesian coordinates with respect to the target projector. Accordingly, when the target projector is controlled to produce a projected target at a projection point on an interior surface of the cabin, the 3D coordinates of that projected target in the coordinate system of the target projector (and the cabin coordinate system) may be readily ascertained. One or more sensors may be calibrated based, at least in part, on a transformation of the region mapping data from the coordinate system of the target projector to a coordinate system of the sensor. Fiducial marker(s) may be included on (e.g., the base of) the target projector to facilitate localization of the one or more sensors in the coordinate system of the target projector. The one or more sensors may capture an image of the fiducial marker(s), and a rotation-translation transform may be derived for the one or more sensors that accounts for the pose (e.g., the rotation and translation) of the sensors. Based on a sensor's rotation-translation transform, the coordinates of the fiducial marker(s) detected in two-dimensional (2D) captured images may be referenced with respect to the coordinate system of the target projector. The accuracy of the region mapping data and the calibration of the one or more sensors may be evaluated by controlling the target projector to point at a known reference and comparing the 3D coordinates of the projected target in the coordinate system of the target projector when pointed at the known reference with an expected position of the known reference (e.g., based on the region mapping data and/or the determined rotation-translation transform). BRIEF DESCRIPTION OF THE DRAWINGS The present systems and methods for sensor calibration using projected targeting for vehicle occupant monitoring are described in detail below with reference to the attached drawing figures, wherein: FIG. 1 is an illustration of an example fl