US-20260129172-A1 - FLOW-GUIDED ONLINE STEREO RECTIFICATION

US20260129172A1US 20260129172 A1US20260129172 A1US 20260129172A1US-20260129172-A1

Abstract

An autonomy computing system and a method of an autonomous vehicle for rectifying stereo images includes a memory storing computer executable instructions and a processor coupled to the memory, the processor, upon execution of the computer executable instructions, configured to: receive an image pair captured using respective cameras in the stereo camera pair; predict a rotation matrix between the first image and the second image by: extracting a first feature map and a second feature map; applying positional feature enhancement on the feature maps to derive a pair of enhanced feature maps; computing a correlation volume across the enhanced feature maps; determining a set of likely matches between the enhanced feature maps; computing a predicted relative pose; and computing the rotation matrix. The system and method further include calibrating the stereo camera pair to rectify the first image and the second image based on the rotation matrix.

Inventors

Felix Heide
Anush Kumar
Shile Li
Omid Hosseini Jafari
Fahim MANNAN

Assignees

TORC ROBOTICS, INC.

Dates

Publication Date: 20260507
Application Date: 20251229

Claims (20)

1 . An autonomy computing system of an autonomous vehicle for rectifying a stereo camera pair of the autonomous vehicle, the stereo camera pair including a first camera and a second camera separated by a baseline distance, the autonomy computing system comprising at least one processor in communication with at least one memory device, the at least one processor programmed to: receive a first image and a second image, the first image captured by the first camera, the second image captured by the second camera; predict, using a neural network model, a rotation matrix between the first image and the second image; and calibrate the stereo camera pair by rectifying, using differentiable rectification, the first image and the second image, based on the rotation matrix.
2 . The autonomy computing system of claim 1 wherein the at least one processor is further programmed to: estimate rectification homographies of a pose represented by the rotation matrix, the rectification homographies including i) a first rectification rotation and ii) a second rectification rotation, the first rectification rotation indicating a relative rotation of the first image and the second rectification rotation indicating a relative rotation of the second image; and rectify, using the rectification homographies, the first image and the second image.
3 . The autonomy computing system of claim 2 , wherein the pose includes a translation vector, the at least one processor further programmed to: rotate the translation vector into a rotated translation vector.
4 . The autonomy computing system of claim 3 , wherein the at least one processor is further programmed to: half-rotate the translation vector into a half-rotated translation vector such that t half =R half t, where t is the translation vector, t half is the half-rotated translation vector, and R half is a half rotation matrix generated based on the rotation matrix.
5 . The autonomy computing system of claim 3 , wherein the at least one processor is further programmed to: generate a unit vector along an axis of the rotated translation vector; determine a direction vector indicating a rotation aligning the rotated translation vector with the unit vector; and determine the rectification homographies, based on the direction vector and the rotation matrix.
6 . The autonomy computing system of claim 2 , wherein the at least one processor is further programmed to: rectify the first image and the second image by: converting a two-dimensional (2D) pixel of the first image or the second image in first pixel coordinates to normalized camera coordinates, based on the rectification homographies; converting the normalized camera coordinates into rectified camera coordinates by applying the rectification homographies to the normalized camera coordinates; projecting, using intrinsics of a respective camera of the first image or the second image, the rectified camera coordinates to second pixel coordinates; and assigning a pixel value at the first pixel coordinates to a pixel at the second pixel coordinates.
7 . The autonomous vehicle of claim 1 , wherein the at least one processor is further programmed to: train the neural network model by optimizing a loss function including an optical flow of the rectified first image and the rectified second image.
8 . The autonomous vehicle of claim 1 , wherein the at least one processor is further programmed to: calibrate the stereo camera pair while the autonomous vehicle is operating.
9 . One or more non-transitory computer-readable storage media for rectifying a stereo camera pair, the stereo camera pair including a first camera and a second camera separated by a baseline distance, the one or more non-transitory computer-readable storage media comprising instructions stored thereon that, in response to being executed, cause a system to: receive a first image and a second image, the first image captured by the first camera, the second image captured by the second camera; predict, using a neural network model, a rotation matrix between the first image and the second image; and calibrate the stereo camera pair by rectifying, using differentiable rectification, the first image and the second image, based on the rotation matrix.
10 . The one or more non-transitory computer-readable storage media of claim 9 wherein the instructions further cause the system to: estimate rectification homographies of a pose represented by the rotation matrix, the rectification homographies including i) a first rectification rotation and ii) a second rectification rotation, the first rectification rotation indicating a relative rotation of the first image and the second rectification rotation indicating a relative rotation of the second image; and rectify, using the rectification homographies, the first image and the second image.
11 . The one or more non-transitory computer-readable storage media of claim 10 , wherein the pose includes a translation vector, and the instructions further cause the system to: rotate the translation vector into a rotated translation vector.
12 . The one or more non-transitory computer-readable storage media of claim 11 , wherein the instructions further cause the system to: half-rotate the translation vector into a half-rotated translation vector such that t half =R half t, where t is the translation vector, t half is the half-rotated translation vector and R half is a half rotation matrix generated based on the rotation matrix.
13 . The one or more non-transitory computer-readable storage media of claim 11 , wherein the instructions further cause the system to: generate a unit vector along an axis of the rotated translation vector; determine a direction vector indicating a rotation aligning the rotated translation vector with the unit vector; and determine the rectification homographies, based on the direction vector and the rotation matrix.
14 . The one or more non-transitory computer-readable storage media of claim 10 , wherein the instructions further cause the system to: rectify the first image and the second image by: converting a two-dimensional (2D) pixel of the first image or the second image in first pixel coordinates to normalized camera coordinates, based on the rectification homographies; converting the normalized camera coordinates into rectified camera coordinates by applying the rectification homographies to the normalized camera coordinates; projecting, using intrinsics of a respective camera of the first image or the second image, the rectified camera coordinates to second pixel coordinates; and assigning a pixel value at the first pixel coordinates to a pixel at the second pixel coordinates.
15 . The one or more non-transitory computer-readable storage media of claim 9 , wherein the instructions further cause the system to: train the neural network model by optimizing a loss function including an optical flow of the rectified first image and the rectified second image.
16 . The one or more non-transitory computer-readable storage media of claim 9 , wherein the instructions further cause the system to: calibrate the stereo camera pair while an autonomous vehicle associated with the stereo camera pair is operating.
17 . A computer-implemented method of rectifying a stereo camera pair of an autonomous vehicle, the stereo camera pair including a first camera and a second camera separated by a baseline distance, the method comprising: receiving a first image and a second image, the first image captured by the first camera, the second image captured by the second camera; predicting, using a neural network model, a rotation matrix between the first image and the second image; and calibrating the stereo camera pair by rectifying, using differentiable rectification, the first image and the second image, based on the rotation matrix.
18 . The computer-implemented method of claim 17 , further comprising: estimating rectification homographies of a pose represented by the rotation matrix, the rectification homographies including i) a first rectification rotation and ii) a second rectification rotation, the first rectification rotation indicating a relative rotation of the first image and the second rectification rotation indicating a relative rotation of the second image; and rectifying, using the rectification homographies, the first image and the second image.
19 . The computer-implemented method of 18 , wherein the pose includes a translation vector, the method further comprising: rotating the translation vector into a rotated translation vector.
20 . The computer-implemented method of claim 19 , further comprising: half-rotating the translation vector into a half-rotated translation vector such that t half =R half t, where t is the translation vector, t half is the half-rotated translation vector and R half is a half rotation matrix generated based on the rotation matrix.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation application of U.S. patent application Ser. No. 19/357,478, entitled “FLOW-GUIDED ONLINE STEREO RECTIFICATION,” filed on Oct. 14, 2025, which is a continuation application of U.S. patent application Ser. No. 18/817,540, entitled “FLOW-GUIDED ONLINE STEREO RECTIFICATION,” filed on Aug. 28, 2024. U.S. patent application Ser. No. 18/817,540 is now U.S. Pat. No. 12,470,683 issued on Nov. 11, 2025 and claims priority to and the benefit of U.S. Provisional Patent Application Ser. No. 63/667,026, entitled “FLOW GUIDED ONLINE STEREO RECTIFICATION,” filed on Jul. 2, 2024. The contents of the above noted patents and applications are incorporated herein in their entirety by reference. TECHNICAL FIELD The field of the disclosure relates generally to image processing and, more specifically, rectifying stereo images. BACKGROUND OF THE INVENTION Many systems, including autonomous vehicles, make use of stereo camera systems. During use, it is possible for stereo camera systems to become misaligned due to vibrations or environmental factors, leading to deviations from the base calibration. This issue is exacerbated with wide baseline camera system, such as those used on wider vehicles like autonomous trucks. The mounting structures for large baseline camera systems may stretch, twist, or otherwise deform due to temperature and stress gradients. Misalignment of stereo camera systems can result in poor performance of down-stream tasks, such as depth estimation, object detection, and semantic segmentation, among others. This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure described or claimed below. This description is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light and not as admissions of prior art. SUMMARY OF THE INVENTION This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure described or claimed below. This description is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light and not as admissions of prior art. One general aspect includes an autonomous vehicle, the autonomous vehicle includes a stereo camera pair disposed on the autonomous vehicle, the stereo camera pair may include a first camera and a second camera separated by a baseline distance, the first camera and the second camera configured to capture a first image and a second image, respectively; at least one memory device storing computer executable instructions. The vehicle also includes at least one processor coupled to the at least one memory device and the stereo camera pair, the least one processor, upon execution of the computer executable instructions, configured to: receive the first image and the second image captured using respective cameras in the stereo camera pair; predict, using a neural network model, a rotation matrix between the first image and the second image by: extracting a first feature map and a second feature map based on the first image and the second image; applying positional feature enhancement on the first feature map and the second feature map to derive a first enhanced feature map and a second enhanced feature map; computing a correlation volume across the first enhanced feature map and the second feature map; determining a set of likely matches between the first enhanced feature map and the second feature map based on the correlation volume; computing a predicted relative pose based on the set of likely matches; and computing the rotation matrix based on the predicted relative pose. The vehicle also includes calibrate the stereo camera pair by: employing differentiable rectification to rectify the first image and the second image based on the rotation matrix. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. One general aspect includes a computer-implemented method of calibrating a stereo camera pair. The computer-implemented method of calibrating includes capturing a first image and a second image using respective cameras in a stereo camera pair. The calibrating also includes predicting, using a neural network model, a rotation matrix between the first image and the second image by: extracting a first feature map and a second feature map based on the first image and the second image, applying positional feature enhancement on the first fe