US-20260127754-A1 - SIMULTANEOUS NAVIGATION AND RECONSTRUCTION VIA MONOCULAR DEPTH ESTIMATION

US20260127754A1US 20260127754 A1US20260127754 A1US 20260127754A1US-20260127754-A1

Abstract

Provided are systems and techniques for automated navigation of vehicles, such as drones. The systems generally include processing unit(s) that, collectively, perform several steps. Such steps include generating metric depth estimates, using a pre-trained model, for each pixel in received image(s) from a monocular camera, or transformed image(s) based on the received image(s). Such steps may also include generating a pose estimate from visual odometry, then generating a truncated signed distance function representation of an environment based on the absolute depth estimates and the pose estimate. The steps may include creating and/or updating a local map based on the truncated signed distance function representation. The steps may include plan a collision-free route towards a goal based on the local map. This may include using motion primitives, which may be generated in a single offline step and stored in a trajectory library.

Inventors

Nathaniel Simon
Anirudha Majumdar

Assignees

THE TRUSTEES OF PRINCETON UNIVERSITY

Dates

Publication Date: 20260507
Application Date: 20241120

Claims (17)

1 . A system for navigation, comprising: one or more processing units configured to, collectively: receive one or more images from a monocular camera; generate metric depth estimates for each pixel in the one or more images or one or more transformed images based on the one or more image, using a pre-trained model; generate a pose estimate from visual odometry; generate a truncated signed distance function representation of an environment based on the absolute depth estimates and the pose estimate; and update a local map based on the truncated signed distance function representation.
2 . The system of claim 1 , wherein the one or more processing units are further configured to, collectively, generate the one or more transformed images such that the one or more transformed images appear to have been taken with a same camera used to train the pre-trained model.
3 . The system of claim 1 , wherein the one or more processing units are further configured to, collectively, discretize the environment into blocks, and store blocks containing surfaces in a hashmap.
4 . The system of claim 1 , wherein the one or more processing units are further configured to, collectively, plan a collision-free route towards a goal based on the local map.
5 . The system of claim 4 , wherein planning the collision-free route includes using motion primitives.
6 . The system of claim 5 , wherein the motion primitives are generated in a single offline step and stored in a trajectory library.
7 . The system of claim 6 , wherein the motion primitives are defined to have a yaw rate that is zero at the beginning and end of each motion primitive.
8 . The system of claim 7 , wherein a library of motion primitives are generated by varying a maximum yaw rate.
9 . The system of claim 4 , wherein planning the collision-free route includes utilizing A*, Probabilistic Roadmaps (PRM), rapidly-exploring random tree (RRT), RRT*, or Trajectory Hybrid Optimal Frenet.
10 . The system of claim 1 , wherein the one or more processing units are disposed on a vehicle.
11 . The system of claim 1 , wherein the one or more images received from the monocular camera include a first distorted image, and the one or more processing units are further configured to, collectively, extract multiple depth images from the first distorted image using a virtual camera rotation scheme.
12 . A drone comprising the system of claim 1 .
13 . The drone of claim 12 , wherein the drone is a micro aerial vehicle (MAV).
14 . The drone of claim 12 , wherein the drone is a drone other than a micro aerial vehicle (MAV).
15 . A method for navigation, comprising: receiving one or more images from a monocular camera; generating metric depth estimates for each pixel in the one or more images or one or more transformed images based on the one or more images, using a pre-trained model; generating a pose estimate from visual odometry; generating a truncated signed distance function representation of an environment based on the absolute depth estimates and the pose estimate; and updating a local map based on the truncated signed distance function representation.
16 . The method of claim 15 , further comprising repeatedly performing the receiving, generating and updating steps as a vehicle including the monocular camera moves within the environment.
17 . The method of claim 16 , wherein the vehicle is moving at least 0.5 m/s through the environment.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present application claims priority to U.S. Provisional Patent Application No. 63/600,866, filed Nov. 20, 2023, the contents of which are incorporated by reference herein in its entirety. STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT This invention was made with government support under Grant Nos. DGE-2039656 and 2044149 awarded by the National Science Foundation, and Grant No. N00014-23-1-2148 awarded by the Office of Naval Research (ONR). The government has certain rights in the invention. TECHNICAL FIELD The present disclosure relates to techniques for controlling drones, and specifically to techniques for controlling monocular robots or vehicles, such as, e.g., Micro Aerial Vehicle (MAV) platforms (≤100 g) BACKGROUND A major challenge in deploying the smallest of Micro Aerial Vehicle (MAV) platforms (≤100 g) is their inability to carry sensors that provide high-resolution metric depth information (e.g., LiDAR or stereo cameras). Current systems rely on end-to-end learning or heuristic approaches that directly map images to control inputs, and struggle to fly fast in unknown environments. BRIEF SUMMARY In various aspects, a system for navigation may be provided. The system may include one or more processing units configured to, collectively, perform various tasks. Some or all of the processing unit(s) may be disposed on a vehicle, such as a drone, which may be, e.g., a flying drone, such as a micro aerial vehicle (MAV). The vehicle may be some other vehicle besides a MAV, including, e.g., cars, on-ground delivery drones, etc. The tasks may include receiving one or more images from a monocular camera. The tasks may include generating metric depth estimates for each pixel in the received image(s) or transformed image(s) based on the received image(s), using a pre-trained model. The tasks may include generating a pose estimate from visual odometry. The tasks may include generating a truncated signed distance function representation of an environment based on the absolute depth estimates and the pose estimate. The tasks may include generating or updating a local map based on the truncated signed distance function representation. The received image(s) from the monocular camera may include a first distorted image, and the tasks performed by the processing unit(s) may include extracting multiple depth images from the first distorted image using a virtual camera rotation scheme. The tasks may include generating the transformed image(s) such that transformed image(s) appear to have been taken with a same camera used to train the pre-trained model. The tasks may include discretizing the environment into blocks. The tasks may include storing blocks containing surfaces in a hashmap. The tasks may include planning a collision-free route towards a goal based on the local map. Planning the collision-free route may include using motion primitives. The motion primitives may be generated in a single offline step and stored in a trajectory library. The motion primitives may be defined to have a yaw rate that is zero at the beginning and end of each motion primitive. A library of motion primitives may be generated by varying a maximum yaw rate. Planning the collision-free route may include utilizing A*, Probabilistic Roadmaps (PRM), rapidly-exploring random tree (RRT), RRT*, or Trajectory Hybrid Optimal Frenet. In various aspects, a method for navigation may be provided. The method may include receiving one or more images from a monocular camera. The method may include generating metric depth estimates for each pixel in the received image(s) or transformed image(s) that are based on the received image(s), using a pre-trained model. The method may include generating a pose estimate from visual odometry. The method may include generating a truncated signed distance function representation of an environment based on the absolute depth estimates and the pose estimate. The method may include generating or updating a local map based on the truncated signed distance function representation. The method may include repeatedly performing the receiving, generating, and updating steps as a vehicle including the monocular camera moves within the environment (e.g., at forwards velocities of at least 0.5 m/s). BRIEF DESCRIPTION OF FIGURES FIG. 1 is a schematic illustration of a system. FIG. 2 is a flowchart of a method. FIG. 3 is a flowchart of a method for planning a route. FIG. 4 is a flowchart of a steps performed by an embodiment of a vehicle. FIG. 5 is a schematic illustration of a method. FIG. 6 is an illustration of point cloud distances for Crazyflie ZoeDepth and Azure Kinect images. FIGS. 7-10 are plots of the trajectories of all 15 trials in 5 unique environments, the goal positions (circles), and the crash locations (stars), including moving around a first corner (FIG. 7), moving around a second corner (FIG. 8), and moving through three different hallway paths for Mo