US-12620129-B2 - Pose quantization-based keyframe pruning for simultaneous localization and mapping

US12620129B2US 12620129 B2US12620129 B2US 12620129B2US-12620129-B2

Abstract

Embodiments of the present invention relate to techniques for managing keyframe data in a Simultaneous Localization and Mapping (SLAM) system of an Augmented Reality (AR) device. The method involves obtaining a plurality of keyframes, each linked to pose data comprising spatial and orientation data derived from raw data captured by sensors. The pose data for each keyframe is quantized according to predefined parameters, creating a structured pose grid of quantized cells. The technique includes analyzing the quantized pose data to identify excess keyframes that exceed a predetermined threshold within these cells. Redundant keyframes are pruned from memory, optimizing the SLAM system's efficiency by reducing computational load and memory usage. This selective pruning process ensures that the AR device retains a comprehensive and accurate environmental map while operating within the constraints of limited system resources.

Inventors

Georg Halmetschlager-Funek
Nikolaj Kuntner
Simon Schreiberhuber

Assignees

SNAP INC.

Dates

Publication Date: 20260505
Application Date: 20240521

Claims (20)

1 . A method for managing keyframe data in a Simultaneous Localization and Mapping (SLAM) system of an Augmented Reality (AR) device, the method comprising: obtaining, by the AR device, a plurality of keyframes, each keyframe comprising at least an image linked to an instance of pose data, wherein the pose data comprises spatial data in three dimensions (X, Y, and Z) derived from one or more spatial sensing systems, and orientation data in three dimensions derived from data obtained from one or more orientation sensing systems; quantizing, for each keyframe, the pose data according to predefined quantization parameters to generate quantized pose data, wherein the quantized pose data includes quantized spatial indices and quantized orientation components; analyzing the quantized pose data of the keyframes to determine that a number of keyframes having identical quantized spatial indices in all three dimensions and matching quantized orientation components exceeds a predetermined threshold; and pruning the number of keyframes by deleting from memory at least one keyframe from the number of keyframes when the predetermined threshold is exceeded.
2 . The method of claim 1 , wherein the quantization parameters include at least a first quantization parameter for the spatial data and at least a second quantization parameter for the orientation data, the first quantization parameter for the spatial data defining a spatial grid size segmenting space in the three dimensions into a plurality of discrete spatial cells of a specified size, and the second quantization parameter for the orientation data defining angular grid cell sizes that segment the orientation space into a plurality of discrete orientation cells.
3 . The method of claim 1 , wherein the spatial sensing system comprises at least one sensor selected from the group consisting of accelerometers, gyroscopes, magnetometers, Global Positioning System (GPS) receivers, and camera systems, and wherein the spatial data is derived from measurements obtained from any combination of one or more of the selected sensors, including image data from a camera system.
4 . The method of claim 1 , wherein the orientation sensing system comprises at least one sensor selected from the group consisting of gyroscopes, magnetometers, inertial measurement units (IMUs), and camera systems, and wherein the orientation data is derived from measurements obtained from any combination of one or more of the selected sensors, including image data from the camera systems.
5 . The method of claim 1 , wherein pruning the number of keyframes when the predetermined threshold is exceeded further comprises selecting at least one keyframe for removal based on criteria comprising image clarity, feature richness of the image linked to the pose data, and temporal proximity of a timestamp of the keyframe to other keyframes, wherein keyframes with lower image clarity, lesser feature richness, or closer timestamp proximity to other keyframes are prioritized for removal.
6 . The method of claim 1 , further comprising: dynamically adjusting the quantization parameters based on factors comprising a variability of spatial features, a rate of change in orientation data, and diversity of lighting conditions, and wherein the adjustment of the quantization parameters is performed to balance the granularity of the quantized pose data with a computational load and memory usage of the AR device.
7 . The method of claim 1 , wherein quantizing the pose data further comprises: incorporating lighting conditions as an additional dimension in the quantization process, wherein the lighting conditions are quantified based on a metric that characterizes the illumination of the environment, and wherein the quantized pose data includes a lighting condition index that distinguishes keyframes based on the quantified lighting conditions.
8 . An augmented reality (AR) device configured to manage keyframe data in a Simultaneous Localization and Mapping (SLAM) system, the AR device comprising: a display; a processor; a spatial sensing system; an orientation sensing system; and a memory storing instructions thereon, which, when executed by the processor, cause the AR device to perform operations comprising: obtaining, by the AR device, a plurality of keyframes, each keyframe comprising at least an image linked to an instance of pose data, wherein the pose data comprises spatial data in three dimensions (X, Y, and Z) derived from the spatial sensing system, and orientation data in three dimensions obtained from the orientation sensing system; quantizing, for each keyframe, the pose data according to predefined quantization parameters to generate quantized pose data, wherein the quantized pose data includes quantized spatial indices and quantized orientation components; analyzing the quantized pose data of the keyframes to determine that a number of keyframes having identical quantized spatial indices in all three dimensions and matching quantized orientation components exceeds a predetermined threshold; and pruning the number of keyframes by deleting from memory at least one keyframe from the number of keyframes when the predetermined threshold is exceeded.
9 . The AR device of claim 8 , wherein the quantization parameters include at least a first quantization parameter for the spatial data and at least a second quantization parameter for the orientation data, the first quantization parameter for the spatial data defining a spatial grid size segmenting space in the three dimensions into a plurality of discrete spatial cells of a specified size, and the second quantization parameter for the orientation data defining angular grid cell sizes that segment the orientation space into a plurality of discrete orientation cells.
10 . The AR device of claim 8 , wherein the spatial sensing system comprises at least one sensor selected from the group consisting of accelerometers, gyroscopes, magnetometers, Global Positioning System (GPS) receivers, and camera systems, and wherein the spatial data is derived from measurements obtained from any combination of one or more of the selected sensors, including visual data from a camera system used to estimate the pose.
11 . The AR device of claim 8 , wherein the orientation sensing system comprises at least one sensor selected from the group consisting of gyroscopes, magnetometers, inertial measurement units (IMUs), and camera systems, and wherein the orientation data is derived from measurements obtained from any combination of one or more of the selected sensors, including image data from the camera systems.
12 . The AR device of claim 8 , wherein pruning the number of keyframes when the predetermined threshold is exceeded further comprises selecting at least one keyframe for removal based on criteria comprising image clarity, feature richness of the image linked to the pose data, and temporal proximity of a timestamp of the keyframe to other keyframes, wherein keyframes with lower image clarity, lesser feature richness, or closer timestamp proximity to other keyframes are prioritized for removal.
13 . The AR device of claim 8 , further comprising: dynamically adjusting the quantization parameters based on factors including, but not limited to, variability of spatial features, a rate of change in orientation data, and diversity of lighting conditions, and wherein the adjustment of the quantization parameters is performed to balance the granularity of the quantized pose data with the computational load and memory usage of the AR device.
14 . The AR device of claim 8 , wherein quantizing the pose data further comprises: incorporating lighting conditions as an additional dimension in the quantization process, wherein the lighting conditions are quantified based on a metric that characterizes the illumination of the environment, and wherein the quantized pose data includes a lighting condition index that distinguishes keyframes based on the quantified lighting conditions.
15 . A non-transitory computer-readable medium storing instructions thereon, which, when executed by one or more processors of an augmented reality (AR) device, cause the AR device to perform operations comprising: obtaining, by the AR device, a plurality of keyframes, each keyframe comprising at least an image linked to an instance of pose data, wherein the pose data comprises spatial data in three dimensions (X, Y, and Z) derived from one or more spatial sensing systems, and orientation data in three dimensions obtained from one or more orientation sensing systems; quantizing, for each keyframe, the pose data according to predefined quantization parameters to generate quantized pose data, wherein the quantized pose data includes quantized spatial indices and quantized orientation components; analyzing the quantized pose data of the keyframes to determine that a number of keyframes having identical quantized spatial indices in all three dimensions and matching quantized orientation components exceeds a predetermined threshold; and pruning the number of keyframes by deleting from memory at least one keyframe from the number of keyframes when the predetermined threshold is exceeded.
16 . The non-transitory computer-readable medium of claim 15 , wherein the quantization parameters include at least a first quantization parameter for the spatial data and at least a second quantization parameter for the orientation data, the first quantization parameter for the spatial data defining a spatial grid size segmenting space in the three dimensions into a plurality of discrete spatial cells of a specified size, and the second quantization parameter for the orientation data defining angular grid cell sizes that segment the orientation space into a plurality of discrete orientation cells.
17 . The non-transitory computer-readable medium of claim 15 , wherein the spatial sensing system comprises at least one sensor selected from the group consisting of accelerometers, gyroscopes, magnetometers, Global Positioning System (GPS) receivers, and camera systems, and wherein the spatial data is derived from measurements obtained from any combination of one or more of the selected sensors, including visual data from a camera system used to estimate the pose.
18 . The non-transitory computer-readable medium of claim 15 , wherein the orientation sensing system comprises at least one sensor selected from the group consisting of gyroscopes, magnetometers, inertial measurement units (IMUs), and camera systems, and wherein the orientation data is derived from measurements obtained from any combination of one or more of the selected sensors, including image data from the camera systems.
19 . The non-transitory computer-readable medium of claim 15 , wherein pruning the number of keyframes when the predetermined threshold is exceeded further comprises selecting at least one keyframe for removal based on criteria comprising image clarity, feature richness of the image linked to the pose data, and temporal proximity of a timestamp of the keyframe to other keyframes, wherein keyframes with lower image clarity, lesser feature richness, or closer timestamp proximity to other keyframes are prioritized for removal.
20 . The non-transitory computer-readable medium of claim 15 , further comprising: dynamically adjusting the quantization parameters based on the complexity of the environment as determined by the AR device, wherein the complexity is assessed based on factors including, but not limited to, a measure of variability of spatial features, a rate of change in orientation data, a measure of diversity of lighting conditions, a degree of blur in captured images, and estimated co-variances in sensor data, herein the adjustment of the quantization parameters is performed to balance the granularity of the quantized pose data with the computational load and memory usage of the AR device.

Description

TECHNICAL FIELD The present application pertains to the field of augmented reality (AR) and mixed-reality devices. More specifically, the subject matter of the present application relates to a technique for enhancing the efficiency of Simultaneous Localization and Mapping (SLAM) algorithms and systems. This is achieved through a method of pose quantization-based keyframe pruning, which continuously refines the observed and stored keyframe data used by SLAM algorithms to map and interact with real-world environments. BACKGROUND Augmented reality (AR) and mixed-reality devices are designed to overlay digital content onto a real-world view of an environment or scene in a way that is interactive and contextually relevant. These devices operate by integrating a combination of camera systems and motion sensors to perceive and “understand” the real-world environment around them. The camera systems capture visual data, which can include images and videos of the surrounding area, while the motion sensors provide data on the device's movements and orientation in three-dimensional space. The motion sensors typically include accelerometers, gyroscopes, and sometimes magnetometers. Accelerometers measure the rate of change of velocity with respect to time, allowing the device to detect linear acceleration along the X, Y, and Z axes. Gyroscopes measure the rate of rotation around the device's three physical axes, providing angular velocity data that helps determine orientation changes. Magnetometers, functioning as digital compasses, measure the strength and direction of the magnetic field, aiding in the determination of the device's heading relative to the Earth's magnetic North. Together, these sensors feed data into a Simultaneous Localization and Mapping (SLAM) algorithm, which is a component for AR and mixed-reality systems. The SLAM algorithm enables the device to perform two essential functions concurrently: it localizes the device within the real-world environment by determining its position and orientation, and it maps the structure of the environment in real-time. This dual capability allows AR and mixed-reality devices to place digital objects in the physical world with accuracy and consistency, as the device understands both its own movement and the layout of the space around it. BRIEF DESCRIPTION OF THE DRAWINGS In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. To easily identify the discussion of any particular element or operation, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced. Some non-limiting examples are illustrated in the figures of the accompanying drawings in which: FIG. 1 is a diagram illustrating an example of an individual wearing an augmented reality (AR) device, in the specific form of AR glasses, consistent with an embodiment of the invention, while walking through a real-world environment dotted with trees, while subtly turning their head from side to side to engage with their surroundings. FIG. 2 is a diagram illustrating a two-dimensional table, where each cell in the table indicates a number of data points relating to spatial data, each data point associated with a keyframe that has been obtained by an AR device executing an algorithm for Simultaneous Localization and Mapping (SLAM), consistent with some examples. FIG. 3 is a diagram illustrating a bar chart, where each bar in the chart indicates a number of keyframes obtained by an AR device executing a SLAM algorithm, grouped by their orientation data (e.g., yaw), consistent with some examples. FIG. 4 is a diagram illustrating a mapping pipeline for a SLAM algorithm or system, consistent with examples. FIG. 5 is a diagram illustrating a flow chart corresponding to a method for pruning keyframe data, according to some examples. FIG. 6 is a block diagram illustrating an example of the functional components (e.g., hardware components) of an AR device (e.g., AR glasses) with which the methods and techniques described herein, may be implemented, consistent with examples. FIG. 7 is a block diagram illustrating a software architecture, which can be installed on any one or more of the devices described herein. DETAILED DESCRIPTION Presented herein are techniques for enhancing the efficiency of Simultaneous Localization and Mapping (SLAM) algorithms used in augmented reality (AR) and mixed-reality devices. More specifically, described herein are techniques for the continuous pruning of data (e.g., keyframes) by employing pose quantization-based keyframe pruning, which significantly reduces the computational load and memory requirements of an AR device while maintaining the integrity of the environmental map. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of the various aspects of different embodiments of t