US-20260126861-A1 - TRACTABLE BODY-BASED AR SYSTEM INPUT

US20260126861A1US 20260126861 A1US20260126861 A1US 20260126861A1US-20260126861-A1

Abstract

A hand-tracking platform generates gesture components for use as user inputs into an application of an Augmented Reality (AR) system. In some examples, the hand-tracking platform generates real-world scene environment frame data based on gestures being made by a user of the AR system using a camera component of the AR system. The hand-tracking platform recognizes a gesture component based on the real-world scene environment frame data and generates gesture component data based on the gesture component. The application utilizes the gesture component data as user input in a user interface of the application.

Inventors

Attila ALVAREZ
Márton Gergely Kajtár
Peter Pocsi
Jennica Pounds
David RETEK
Zsolt ROBOTKA

Assignees

SNAP INC.

Dates

Publication Date: 20260507
Application Date: 20251229

Claims (20)

1 . A method, comprising: generating, using a camera component of an Augmented Reality (AR) system, video frame data of a gesture being made by a user of the AR system; recognizing gesture components based on the video frame data; recognizing gestures based on the gesture components using gesture identification models identifying specific gestures and a previously determined gesture grammar; generating gesture input event data based on the recognized gestures; and utilizing the gesture input event data as user input in a user interface of an application of the AR system.
2 . The method of claim 1 , further comprising: recognizing symbols based on the gesture components using symbol models identifying at least one of specific characters, words, and commands and a previously determined symbol grammar; generating symbol input event data based on the recognized symbols; and utilizing the symbol input event data as user input in the user interface of the application of the AR system.
3 . The method of claim 1 , further comprising: generating skeletal model data using the video frame data; and wherein recognizing the gesture components comprises recognizing the gesture components based on the skeletal model data.
4 . The method of claim 1 , wherein recognizing gestures further comprises comparing gesture components to the gesture identification models identifying the specific gestures.
5 . The method of claim 2 , wherein recognizing symbols further comprises comparing the gesture components to the symbol models identifying the specific characters, words, and commands.
6 . The method of claim 1 , wherein recognizing gestures further comprises recognizing the gestures using artificial intelligence methodologies and one or more gesture models previously generated using machine learning methodologies.
7 . The method of claim 1 , wherein the AR system is a head-worn apparatus.
8 . A machine comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the machine to perform operations comprising: generating, using a camera component of an Augmented Reality (AR) system, video frame data of a gesture being made by a user of the AR system; recognizing gesture components based on the video frame data; recognizing gestures based on the gesture components using gesture identification models identifying specific gestures and a previously determined gesture grammar; generating gesture input event data based on the recognized gestures; and utilizing the gesture input event data as user input in a user interface of an application of the AR system.
9 . The machine of claim 8 , wherein the operations further comprise: recognizing symbols based on the gesture components using symbol models identifying at least one of specific characters, words, and commands and a previously determined symbol grammar; generating symbol input event data based on the recognized symbols; and utilizing the symbol input event data as user input in the user interface of the application of the AR system.
10 . The machine of claim 8 , wherein the operations further comprise: generating skeletal model data using the video frame data; and wherein recognizing the gesture components comprises recognizing the gesture components based on the skeletal model data.
11 . The machine of claim 8 , wherein recognizing gestures further comprises comparing gesture components to the gesture identification models identifying the specific gestures.
12 . The machine of claim 9 , wherein recognizing symbols further comprises comparing the gesture components to the symbol models identifying the specific characters, words, and commands.
13 . The machine of claim 8 , wherein recognizing gestures further comprises recognizing the gestures using artificial intelligence methodologies and one or more gesture models previously generated using machine learning methodologies.
14 . The machine of claim 8 , wherein the AR system is a head-worn apparatus.
15 . A machine-storage medium including instructions that, when executed by a machine, cause the machine to perform operations comprising: generating, using a camera component of an Augmented Reality (AR) system, video frame data of a gesture being made by a user of the AR system; recognizing gesture components based on the video frame data; recognizing gestures based on the gesture components using gesture identification models identifying specific gestures and a previously determined gesture grammar; generating gesture input event data based on the recognized gestures; and utilizing the gesture input event data as user input in a user interface of an application of the AR system.
16 . The machine-storage medium of claim 15 , wherein the operations further comprise: recognizing symbols based on the gesture components using symbol models identifying at least one of specific characters, words, and commands and a previously determined symbol grammar; generating symbol input event data based on the recognized symbols; and utilizing the symbol input event data as user input in the user interface of the application of the AR system.
17 . The machine-storage medium of claim 15 , wherein the operations further comprise: generating skeletal model data using the video frame data; and wherein recognizing the gesture components comprises recognizing the gesture components based on the skeletal model data.
18 . The machine-storage medium of claim 15 , wherein recognizing gestures further comprises comparing gesture components to the gesture identification models identifying the specific gestures.
19 . The machine-storage medium of claim 16 , wherein recognizing symbols further comprises comparing the gesture components to the symbol models identifying the specific characters, words, and commands.
20 . The machine-storage medium of claim 15 , wherein recognizing gestures further comprises recognizing the gestures using artificial intelligence methodologies and one or more gesture models previously generated using machine learning methodologies.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation of U.S. patent application Ser. No. 17/964,770, filed Oct. 12, 2022, which applications and publications are incorporated herein by reference in their entirety. TECHNICAL FIELD The present disclosure relates generally to user interfaces and more particularly to user interfaces used in augmented and virtual reality. BACKGROUND A head-worn device may be implemented with a transparent or semi-transparent display through which a user of the head-worn device can view the surrounding environment. Such devices enable a user to see through the transparent or semi-transparent display to view the surrounding environment, and to also see objects (e.g., virtual objects such as a rendering of a 2D or 3D graphic model, images, video, text, and so forth) that are generated for display to appear as a part of, and/or overlaid upon, the surrounding environment. This is typically referred to as “augmented reality” or “AR.” A head-worn device may additionally completely occlude a user's visual field and display a virtual environment through which a user may move or be moved. This is typically referred to as “virtual reality” or “VR.” In a hybrid form, a view of the surrounding environment is captured using cameras, and then that view is displayed along with augmentation to the user on displays the occlude the user's eyes. As used herein, the term AR refers to augmented reality, virtual reality and any of hybrids of these technologies unless the context indicates otherwise. A user of the head-worn device may access and use computer software applications to perform various tasks or engage in an entertaining activity. Performing the tasks or engaging in the entertaining activity may require entry of various commands and text into the head-worn device. Therefore, it is desirable to have a mechanism for entering commands and text. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced. FIG. 1 is a perspective view of a head-worn device, in accordance with some examples. FIG. 2 illustrates a further view of the head-worn device of FIG. 1, in accordance with some examples. FIG. 3 is a diagrammatic representation of a machine within which a set of instructions may be executed for causing the machine to perform any one or more of the methodologies discussed herein in accordance with some examples. FIG. 4A is collaboration diagram of a hand-tracking platform for an AR system in accordance with some examples. FIG. 4B illustrates a data structure in accordance with some examples. FIG. 5 is a sequence diagram of an AR gesture application process used by an AR system to provide an AR application to a user in accordance with some examples. FIG. 6 is a block diagram showing a software architecture within which the present disclosure may be implemented, in accordance with some examples. FIG. 7 is a block diagram illustrating a networked system including details of a head-worn AR system, in accordance with some examples. FIG. 8 is a block diagram showing an example messaging system for exchanging data (e.g., messages and associated content) over a network in accordance with some examples DETAILED DESCRIPTION AR systems implemented on a head-worn device such as glasses are limited when it comes to available user input modalities. As compared other mobile devices, such as mobile phones, it is more complicated for a user of an AR system to indicate user intent and invoke an action or application. When using a mobile phone, a user may go to a home screen and tap on a specific icon to start an application. However, because of a lack of a physical input device such as a touchscreen or keyboard, such interactions are not as easily performed on an AR system. Typically, users can indicate their intent by pressing a limited number of hardware buttons or using a small touchpad. Therefore, it is desirable to have input modalities that would allow for a greater range of inputs that could be utilized by a user to indicate their intent through a user input. Computer vision-based hand-tracking provides such input modalities. An example of a hand-tracking input modality that may be utilized with AR systems is hand-tracking combined with Direct Manipulation of Virtual Objects (DMVO). In DMVO methodologies, a user is provided with a user interface that is displayed to the user in an AR overlay having a 2D or 3D rendering. The rendering is of a graphic model in 2D or 3D where virtual objects located in the model correspond to interactive elements of the user interface. In this way, the user perceives the virtual objects as objects within an overlay in the user's field of view of the real-world scene environment while wearing the AR system, or perceives the virtual objects as objects within a virtual