US-20260127894-A1 - IMAGE ANALYSIS FOR OBJECT LOCALIZATION

US20260127894A1US 20260127894 A1US20260127894 A1US 20260127894A1US-20260127894-A1

Abstract

Techniques for detecting, based at least in part on a first image obtained at a first time, a first indication of a first object included in the first image obtained using a vehicle sensor. The techniques can further include generating, based at least in part on the first indication, a first feature space representation of the first object included in the first image. The techniques can further include comparing the first feature space representation of the first object with a set of stored feature space representations. Responsive to the comparing, the techniques can further include assigning a first unique identifier to the first feature space representation; generating, based on the first indication, a first distance of the first object from the vehicle sensor; and associating the first distance, the first unique identifier, and the first time in memory.

Inventors

Paresh Malalur
Onkar Trivedi
Dheeptha Badrinarayanan
Sandeep Badrinath

Assignees

Cambridge Mobile Telematics Inc.

Dates

Publication Date: 20260507
Application Date: 20251105

Claims (20)

1 . A method comprising: detecting, based at least in part on a first image obtained at a first time, a first indication of a first object included in the first image obtained using a vehicle sensor; generating, based at least in part on the first indication, a first feature space representation of the first object included in the first image; comparing the first feature space representation of the first object with a set of stored feature space representations; responsive to the comparing: assigning a first unique identifier to the first feature space representation; generating, based on the first indication, a first distance of the first object from the vehicle sensor; and associating the first distance, the first unique identifier, and the first time in memory.
2 . The method of claim 1 , wherein the first indication is generated using a You Only Look Once (YOLO) model and a portion of a Mask-Recurrent Neural Network (RCNN) model.
3 . The method of claim 2 , wherein the YOLO model generates a confidence score associated with a bounding box.
4 . The method of claim 1 , wherein the first indication is generated using a Re3 model.
5 . The method of claim 1 , wherein the first indication is generated using a You Only Look Once (YOLO) model, a portion of a Mask-Recurrent Neural Network (RCNN) model, and a Re3 model.
6 . The method of claim 1 , wherein assigning the first unique identifier comprises generating the first unique identifier, wherein the first unique identifier does not match a unique identifier already associated with the set of stored feature space representations.
7 . The method of claim 1 , wherein assigning the first unique identifier comprises assigning a unique identifier that was associated with a stored feature space representation included in the set of stored feature space representations to the first feature space representation.
8 . The method of claim 1 , further comprising: detecting, based at least in part on a second image obtained at a second time, a second indication of the first object included in the second image obtained using the vehicle sensor; generating, based at least in part on the second indication, a second feature space representation of the first object included in the second image; comparing the second feature space representation of the first object with the first feature space representation of the first object; responsive to the comparing: assigning the first unique identifier to the second feature space representation; generating, based on the second indication, a second distance of the first object from the vehicle sensor; and associating the second distance, the first unique identifier, and the second time in memory.
9 . The method of claim 1 , further comprising: detecting, based at least in part on the first image obtained at the first time, a second indication of a second object included in the first image obtained using the vehicle sensor; generating, based at least in part on the second indication, a second feature space representation of the second object included in the first image; comparing the second feature space representation of the second object with the set of stored feature space representations; responsive to the comparing: assigning a second unique identifier to the first feature space representation; generating, based on the second indication, a second distance of the second object from the vehicle sensor; and associating the second distance, the second unique identifier, and the first time in memory.
10 . A system comprising: one or more storage media storing instructions; and one or more processors configured to execute the instructions to cause the system to perform operations comprising: detecting, based at least in part on a first image obtained at a first time, a first indication of a first object included in the first image obtained using a vehicle sensor; generating, based at least in part on the first indication, a first feature space representation of the first object included in the first image; comparing the first feature space representation of the first object with a set of stored feature space representations; responsive to the comparing: assigning a first unique identifier to the first feature space representation; generating, based on the first indication, a first distance of the first object from the vehicle sensor; and associating the first distance, the first unique identifier, and the first time in memory.
11 . The system of claim 10 , wherein the first indication includes at least a bounding box around the first object or a pixel-wise mask for the first object.
12 . The system of claim 11 , wherein the first indication includes the bounding box and the pixel-wise mask.
13 . The system of claim 10 , wherein the first indication is generated using a You Only Look Once (YOLO) model and a portion of a Mask-Recurrent Neural Network (RCNN) model.
14 . The system of claim 10 , wherein the instructions cause the system to perform operations further comprising: detecting, based at least in part on a second image obtained at a second time, a second indication of the first object included in the second image obtained using the vehicle sensor; generating, based at least in part on the second indication, a second feature space representation of the first object included in the second image; comparing the second feature space representation of the first object with the first feature space representation of the first object; responsive to the comparing: assigning the first unique identifier to the second feature space representation; generating, based on the second indication, a second distance of the first object from the vehicle sensor; and associating the second distance, the first unique identifier, and the second time in memory.
15 . One or more non-transitory computer-readable storage media storing instructions that, upon execution by one or more processors of a system, cause the system to perform operations comprising: detecting, based at least in part on a first image obtained at a first time, a first indication of a first object included in the first image obtained using a vehicle sensor; generating, based at least in part on the first indication, a first feature space representation of the first object included in the first image; comparing the first feature space representation of the first object with a set of stored feature space representations; responsive to the comparing: assigning a first unique identifier to the first feature space representation; generating, based on the first indication, a first distance of the first object from the vehicle sensor; and associating the first distance, the first unique identifier, and the first time in memory.
16 . The non-transitory computer-readable storage media of claim 15 , wherein the first indication is generated using a Re3 model.
17 . The non-transitory computer-readable storage media of claim 15 , wherein the first indication is generated using a You Only Look Once (YOLO) model, a portion of a Mask-Recurrent Neural Network (RCNN) model, and a Re3 model.
18 . The non-transitory computer-readable storage media of claim 15 , wherein assigning the first unique identifier comprises generating the first unique identifier, wherein the first unique identifier does not match a unique identifier already associated with the set of stored feature space representations.
19 . The non-transitory computer-readable storage media of claim 15 , wherein assigning the first unique identifier comprises assigning a unique identifier that was associated with a stored feature space representation included in the set of stored feature space representations to the first feature space representation.
20 . The non-transitory computer-readable storage media of claim 15 , wherein the instructions cause the system to perform operations further comprising: detecting, based at least in part on the first image obtained at the first time, a second indication of a second object included in the first image obtained using the vehicle sensor; generating, based at least in part on the second indication, a second feature space representation of the second object included in the first image; comparing the second feature space representation of the second object with the set of stored feature space representations; responsive to the comparing: assigning a second unique identifier to the first feature space representation; generating, based on the second indication, a second distance of the second object from the vehicle sensor; and associating the second distance, the second unique identifier, and the first time in memory.

Description

CROSS-REFERENCES TO RELATED APPLICATIONS This application claims priority to and the benefit of U.S. Provisional Ser. No. 63/716,757 , filed Nov. 6, 2024, the entire contents of which is hereby incorporated by reference for all purposes. BACKGROUND OF THE INVENTION Modern vehicle safety systems increasingly rely on advanced sensing technologies to monitor and assess driving conditions, vehicle dynamics, and environmental factors in real time. Techniques for accurately tracking objects using information generated from sensing technologies are needed for further advancement. BRIEF SUMMARY OF THE INVENTION Implementations may include techniques for detecting, based at least in part on a first image obtained at a first time, a first indication of a first object included in the first image obtained using a vehicle sensor. The techniques can further include generating, based at least in part on the first indication, a first feature space representation of the first object included in the first image. The techniques can further include comparing the first feature space representation of the first object with a set of stored feature space representations. Responsive to the comparing, the techniques can further include assigning a first unique identifier to the first feature space representation; generating, based on the first indication, a first distance of the first object from the vehicle sensor; and associating the first distance, the first unique identifier, and the first time in memory. These and other aspects, features, and implementations can be expressed as methods, apparatus, systems, components, program products, means or steps for performing a function, and in other ways. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an example of an object location determination system, according to certain embodiments. FIG. 2 illustrates an example of an object detection system, according to certain embodiments. FIG. 3 illustrates a first example process performed by an object location determination system, according to certain embodiments. FIG. 4 illustrates a second example process performed by an object location determination system, according to certain embodiments. FIG. 5 illustrates a third example process performed by an object location determination system, according to certain embodiments. FIG. 6 illustrates a fourth example process performed by an object location determination system, according to certain embodiments. FIG. 7 illustrates a block diagram of an exemplary computer apparatus, according to certain embodiments. FIG. 8 illustrates an example of vehicle tracking at two different times, according to certain embodiments. DETAILED DESCRIPTION OF THE INVENTION Embodiments described herein are directed to techniques for detecting and tracking objects over the course of time provided one or more images. Image based systems utilizing cameras (e.g., front facing cameras) and image processing algorithms can be used to capture visual data before, during, and/or after events (e.g., drive, a crash, and/or hard braking, etc.). Systems are typically designed to detect and track objects such as vehicles, pedestrians, and obstacles within a vehicle's field of view. However, conventional approaches to video-based object detection and tracking suffer from technical limitations, including difficulties in persistently tracking objects across frames, handling occlusions, and maintaining consistent object identification in dynamic environments. Existing models, such as those for object detection, tracking, and identification, are frequently optimized for isolated tasks and single-image analysis, lacking a robust and integrated pipeline for associating visual data with real-world coordinates and ensuring temporal consistency across images. As a result, there is a need for improved techniques that can more accurately and reliably detect, track, and/or identify objects across multiple image frames, even in the presence of occlusions and/or challenging lighting conditions, while also enabling the projection of object locations into real-world spatial coordinates for enhanced incident analysis (e.g., post-event analysis). The techniques described herein that address such needs can improve vehicle safety, post-event analysis, and object tracking capabilities. Techniques described herein can enable a range of technical improvements in video-based object detection, tracking, and scene reconstruction, particularly for challenging real-world applications (e.g., vehicle crash analysis using dashcam footage). The techniques can integrate multiple machine learning models for object identification, feature association across frames, and monocular depth estimation. The models may include a You Only Look Once (YOLO) model for fast and reliable object detection, a Mask-RCNN model for high-fidelity segmen