US-12626403-B2 - Detecting objects of interests in panoramic images of an environment

US12626403B2US 12626403 B2US12626403 B2US 12626403B2US-12626403-B2

Abstract

Examples described herein provide a method for generating a three-dimensional (3D) model of an object of interest using panoramic images of an environment. The method includes detecting, using a trained machine learning model, the object of interest in a panoramic image of the environment. The method further includes determining 3D coordinates for the object of interest. The method further includes combining the 3D coordinates for the object of interest with an existing 3D model of the object of interest to create a revised 3D model of the object of interest.

Inventors

Heiko Bauer
Changyu Du

Assignees

FARO TECHNOLOGIES, INC.

Dates

Publication Date: 20260512
Application Date: 20240404

Claims (20)

1 . A system comprising: a panoramic camera to capture panoramic images of the environment; and a processing system communicatively coupled to the panoramic camera, the processing system comprising: a memory comprising computer readable instructions; and a processing device for executing the computer readable instructions, the computer readable instructions controlling the processing device to perform operations for generating a three-dimensional (3D) model of an object of interest using panoramic images of an environment, the operations comprising: detecting, using a trained machine learning model, the object of interest in a panoramic image of the environment; determining 3D coordinates for the object of interest; wherein detecting the object of interest in the panoramic image of the environment comprises generating, by the trained machine learning model, a bounding box for the object of interest, wherein determining the 3D coordinates for the object of interest comprises back projecting the bounding box to the panoramic image to determine spherical coordinates for the object of interest and using the spherical coordinates to determine the 3D coordinates for the object of interest using 3D data captured by a 3D coordinate measurement device; and combining the 3D coordinates for the object of interest with an existing 3D model of the object of interest to create a revised 3D model of the object of interest.
2 . The system of claim 1 , further comprising a three-dimensional (3D) coordinate measurement device to capture 3D data about an environment, the 3D coordinate measurement device being communicatively coupled to the processing system.
3 . The system of claim 1 , wherein the operations further comprise capturing the panoramic image of the environment using a panoramic camera, wherein the panoramic camera has a substantially 360-degree field of view.
4 . The system of claim 1 , wherein the operations further comprise training the machine learning model to detect the object of interest in the panoramic image of the environment.
5 . The system of claim 4 , wherein training the machine learning model comprises: receiving a plurality of training panoramic images; for each of the plurality of training panoramic images: generating a training cubemap representation comprising six two-dimensional (2D) training perspective images, and associating a label with an object of interest in at least one of the 2D training perspective images; and training the machine learning model using as input the 2D training perspective images and the associated labels, wherein the machine learning model generates a bounding box around the object of interest.
6 . The system of claim 1 , wherein combining the 3D coordinates for the object of interest with the existing 3D model of the object of interest is based at least in part on a feature point descriptor for the object of interest.
7 . The system of claim 6 , wherein the feature point descriptor is a signature of histograms of orientations feature point descriptor.
8 . A method comprising: detecting, using a trained machine learning model, a first object of interest in a first panoramic image of an environment to generate a first bounding box; detecting, using the trained machine learning model, a second object of interest in a second panoramic image of the environment to generate a second bounding box; generating a first frustum based at least in part on the first bounding box; generating a second frustum based at least in part on the second bounding box; performing frustum filtering based at least in part on the first frustum and the second frustum; wherein the frustum filtering is based at least in part on a distance between a panoramic camera and an intersection of the first frustum and the second frustum; performing feature matching filtering based at least in part on the first panoramic image and the second panoramic image; and determining whether the first object of interest and the second object of interest are the same object of interest based at least in part on results of the frustum filtering and results of the feature matching filtering.
9 . The method of claim 8 , wherein the first object of interest and the second object of interest are considered to be the same object of interest responsive to the distance satisfying the threshold.
10 . The method of claim 8 , wherein the first object of interest and the second object of interest are considered to be different objects of interest responsive to the distance failing to satisfy the threshold.
11 . The method of claim 8 , wherein the frustum filtering is based at least in part on an estimated distance between a target of interest and a panoramic camera, and the intersection of the first frustum and the second frustum.
12 . The method of claim 11 , wherein: the first object of interest and the second object of interest are considered to be the same object of interest, responsive to the intersection of the first frustum and the second frustum not exceeding the estimated distance, and the first object of interest and the second object of interest are considered to be different objects of interest, responsive to the intersection of the first frustum and the second frustum exceeding the estimated distance.
13 . The method of claim 8 , wherein the frustum filtering is based at least in part on a predicted category of the object of interest.
14 . The method of claim 8 , wherein the feature matching filtering comprises: cropping a first portion of the first panoramic image within the first bounding box; and cropping a second portion of the second panoramic image within the second bounding box.
15 . The method of claim 14 , wherein the feature matching filtering further comprises performing feature point detection on the first portion of the first panoramic image and the second portion of the second panoramic image.
16 . The method of claim 15 , wherein the feature matching filtering further comprises generating feature point descriptors for detected feature points.
17 . The method of claim 16 , wherein the feature matching filtering further comprises matching the first portion of the first panoramic image and the second portion of the second panoramic image.
18 . The method of claim 17 , wherein the feature matching filtering further comprises evaluating the matching to determine whether the first bounding box and the second bounding box contain the same object of interest.
19 . A method comprising: detecting, using a trained machine learning model, a first object of interest in a first panoramic image of an environment to generate a first bounding box; detecting, using the trained machine learning model, a second object of interest in a second panoramic image of the environment to generate a second bounding box; generating a first frustum based at least in part on the first bounding box; generating a second frustum based at least in part on the second bounding box; performing frustum filtering based at least in part on the first frustum and the second frustum; wherein the frustum filtering is based at least in part on an estimated distance between a target of interest and a panoramic camera, and an intersection of the first frustum and the second frustum; performing feature matching filtering based at least in part on the first panoramic image and the second panoramic image; and determining whether the first object of interest and the second object of interest are the same object of interest based at least in part on results of the frustum filtering and results of the feature matching filtering.
20 . The method of claim 19 , wherein: the first object of interest and the second object of interest are considered to be the same object of interest, responsive to the intersection of the first frustum and the second frustum not exceeding the estimated distance, and the first object of interest and the second object of interest are considered to be different objects of interest, responsive to the intersection of the first frustum and the second frustum exceeding the estimated distance.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/457,218, filed Apr. 5, 2023 and entitled “DETECTING OBJECTS OF INTERESTS IN PANORAMIC IMAGES OF AN ENVIRONMENT,” the contents of which are incorporated by reference herein in their entirety. BACKGROUND The subject matter disclosed herein relates to use of a three-dimensional (3D) coordinate measurement device, such as a laser scanner time-of-flight (TOF) coordinate measurement device referred to as a “TOF scanner,” “3D laser scanner,” or “laser scanner.” A 3D laser scanner of this type steers a beam of light to a non-cooperative target such as a diffusely scattering surface of an object. A distance meter in the device measures a distance to the object, and angular encoders measure the angles of rotation of two axles in the device. The measured distance and two angles enable a processor in the device to determine the 3D coordinates of the target. A TOF laser scanner is a scanner in which the distance to a target point is determined based on the speed of light in air between the scanner and a target point. Laser scanners are typically used for scanning closed or open spaces such as interior areas of buildings, industrial installations and tunnels. They can also be used, for example, in industrial applications and accident reconstruction applications. A laser scanner optically scans and measures objects in a volume around the scanner through the acquisition of data points representing object surfaces within the volume. Such data points are obtained by transmitting a beam of light onto the objects and collecting the reflected or scattered light to determine the distance, two-angles (i.e., an azimuth and a zenith angle), and optionally a gray-scale value. This raw scan data is collected, stored and sent to a processor or processors to generate a 3D image representing the scanned area or object. One application where 3D scanners are used is to scan an environment. While existing 3D coordinate measurement devices are suitable for their intended purposes, what is needed is a 3D coordinate measurement device having certain features of embodiments described herein. BRIEF DESCRIPTION In one embodiment, a method for generating a three-dimensional (3D) model of an object of interest using panoramic images of an environment is provided. The method includes detecting, using a trained machine learning model, the object of interest in a panoramic image of the environment. The method further includes determining 3D coordinates for the object of interest. The method further includes combining the 3D coordinates for the object of interest with an existing 3D model of the object of interest to create a revised 3D model of the object of interest. In addition to one or more of the features described herein, or as an alternative, further embodiments of the system can include a panoramic camera to capture panoramic images of the environment, and a processing system communicatively coupled to the panoramic camera. The processing system includes a memory having computer readable instructions, and a processing device for executing the computer readable instructions. The computer readable instructions control the processing device to perform operations for generating a three-dimensional (3D) model of an object of interest using panoramic images of an environment. The operations include detecting, using a trained machine learning model, the object of interest in a panoramic image of the environment. The operations further include determining 3D coordinates for the object of interest. The operations further include combining the 3D coordinates for the object of interest with an existing 3D model of the object of interest to create a revised 3D model of the object of interest. In another embodiment, a method for training a machine learning model to detect objects in panoramic images is provided. The method includes receiving a plurality of training panoramic images. The method further includes, for each of the plurality of training panoramic images: generating a training cubemap representation having six two-dimensional (2D) training perspective images, and associating a label with an object of interest in at least one of the 2D training perspective images. The method further includes training the machine learning model using as input the 2D training perspective images and the associated labels, wherein the machine learning model generates a bounding box around the object of interest. In another embodiment, a system is provided. The system includes a panoramic camera to capture a plurality of training panoramic images of the environment. The system further includes a processing system communicatively coupled to the panoramic camera. The processing system includes a memory having computer readable instructions and a processing device for executing the computer readable instructions. The computer readable instructio