US-20260126885-A1 - RECORDING FOLLOWING BEHAVIORS BETWEEN VIRTUAL OBJECTS AND USER AVATARS IN AR EXPERIENCES

US20260126885A1US 20260126885 A1US20260126885 A1US 20260126885A1US-20260126885-A1

Abstract

Described are recording tools for generating following behaviors and creating interactive AR experiences. The following recording application enables a user with little or no programming skills to virtually connect virtual objects to other elements, including virtual avatars representing fellow users, thereby creating an interactive story in which multiple elements are apparently and persistently connected. The following interface includes methods for selecting objects and instructions for connecting a virtual object to a target object. In one example, the recording application presents on the display a virtual tether between the objects until a connecting action is detected. The following interface is presented on the display as an overlay, in the foreground relative to the physical environment.

Inventors

Lei Zhang
Ava Robinson
Daekun Kim
Youjean Cho
Yu Jiang Tham
Rajan Vaish
Andrés Monroy-Hernández

Assignees

SNAP INC.

Dates

Publication Date: 20260507
Application Date: 20251231

Claims (20)

1 . A method of generating interactive experiences, comprising: detecting a spoken name using a microphone coupled to an electronic device, wherein the electronic device comprises a camera, a display, and a memory; presenting on the display a virtual object based on the spoken name, wherein the virtual object is located at a virtual object position relative to a physical environment; capturing frames of video data using the camera; estimating a current electronic device position relative to the virtual object position based on the frames of video data; receiving a user input associated with a connecting action and a virtual target; presenting on the display the virtual target at a target location based on the user input, wherein the target location is based on the current electronic device position; and presenting on the display a virtual tether extending from the virtual object position to the target location, in accordance with the connecting action.
2 . The method of claim 1 , wherein presenting the virtual tether comprises presenting the virtual object persistently together with the virtual target.
3 . The method of claim 1 , further comprising: storing in the memory an instruction comprising the spoken name, the virtual object, the connecting action, and the virtual target.
4 . The method of claim 1 , wherein presenting the virtual object comprises: in response to detecting the spoken name, establishing the virtual object position based on the current electronic device position relative to the physical environment.
5 . The method of claim 1 , wherein presenting the virtual target comprises: varying the target location based on the current electronic device position relative to the physical environment.
6 . The method of claim 1 , further comprising: presenting a question and one or more answer buttons, wherein the question is associated with a scene, and wherein receiving the user input comprises receiving an answer selected from the one or more answer buttons; and storing in the memory an instruction comprising the spoken name, the virtual object, the answer, and a scene identifier associated with the scene.
7 . The method of claim 1 , wherein receiving the user input comprises: detecting a selecting action relative to the virtual object, wherein the selecting action comprises at least one of tapping a finger on the display, pressing the finger on the display, or placing on the display a cursor near the virtual object position.
8 . The method of claim 1 , wherein receiving the user input comprises: detecting a selecting action relative to the virtual object position; and detecting the connecting action relative to the display, wherein the connecting action comprises at least one of tapping a finger near the target location, sliding the finger toward the target location, or sliding a cursor toward the target location.
9 . The method of claim 8 , further comprising: in response to receiving detecting the connecting action, executing a connect response comprising an action selected from a response group consisting of presenting on the display a transcription of a name associated with the virtual target, playing a sound through a speaker, and playing a message through the speaker.
10 . The method of claim 1 , further comprising: storing in the memory an instruction comprising the spoken name, the virtual object, the connecting action, and the virtual target; receiving a playback input; retrieving the instruction from the memory; and presenting on the display an interactive experience based on the instruction.
11 . An electronic device, comprising: a microphone, a camera, a display, a processor, and a memory operative to store programming, wherein execution of the programming by the processor configures the electronic device to perform functions, including functions to: detect a spoken name using the microphone; present on the display a virtual object based on the spoken name, wherein the virtual object is located at a virtual object position relative to a physical environment; capture frames of video data using the camera; estimate a current electronic device position relative to the virtual object position based on the frames of video data; receive a user input associated with a connecting action and a virtual target; present on the display the virtual target at a target location based on the user input, wherein the target location is based on the current electronic device position; and present on the display a virtual tether extending from the virtual object position to the target location, in accordance with the connecting action.
12 . The electronic device of claim 11 , wherein execution of the programming by the processor configures the electronic device to perform further functions, including further functions to: store in the memory an instruction comprising the spoken name, the virtual object, the connecting action, and the virtual target.
13 . The electronic device of claim 11 , wherein execution of the programming by the processor configures the electronic device to perform further functions, including further functions to: present a question and one or more answer buttons, wherein the question is associated with a scene, and wherein the function to receive the user input comprises a function to receive an answer selected from the one or more answer buttons; and store in the memory an instruction comprising the spoken name, the virtual object, the answer, and a scene identifier associated with the scene.
14 . The electronic device of claim 11 , wherein the function to receive the user input comprises further functions to: detect a selecting action relative to the virtual object, wherein the selecting action comprises at least one of tapping a finger on the display, pressing the finger on the display, or placing on the display a cursor near the virtual object position; detect a selecting action relative to the virtual object position; and detect the connecting action relative to the display, wherein the connecting action comprises at least one of tapping the finger near the target location, sliding the finger toward the target location, or sliding a cursor toward the target location.
15 . The electronic device of claim 11 , wherein execution of the programming by the processor configures the electronic device to perform further functions, including further functions to: store in the memory an instruction comprising the spoken name, the virtual object, the connecting action, and the virtual target; receive a playback input; retrieve the instruction from the memory; and present on the display an interactive experience based on the instruction.
16 . A non-transitory computer-readable medium operative to store program code that, when executed, is operative to cause a processor coupled to an electronic device to perform the steps of: detecting a spoken name using a microphone coupled to an electronic device, wherein the electronic device comprises a camera, a display, and a memory; presenting on the display a virtual object based on the spoken name, wherein the virtual object is located at a virtual object position relative to a physical environment; capturing frames of video data using the camera; estimating a current electronic device position relative to the virtual object position based on the frames of video data; receiving a user input associated with a connecting action and a virtual target; presenting on the display the virtual target at a target location based on the user input, wherein the target location is based on the current electronic device position; and presenting on the display a virtual tether extending from the virtual object position to the target location, in accordance with the connecting action.
17 . The non-transitory computer-readable medium of claim 16 , wherein the program code when executed is operative to cause the processor to perform the further steps of: storing in the memory an instruction comprising the spoken name, the virtual object, the connecting action, and the virtual target.
18 . The non-transitory computer-readable medium of claim 16 , wherein the program code when executed is operative to cause the processor to perform the further steps of: presenting a question and one or more answer buttons, wherein the question is associated with a scene, and wherein receiving the user input comprises receiving an answer selected from the one or more answer buttons; and storing in the memory an instruction comprising the spoken name, the virtual object, the answer, and a scene identifier associated with the scene.
19 . The non-transitory computer-readable medium of claim 16 , wherein the program code when executed is operative to cause the processor to perform the further steps of: detecting a selecting action relative to the virtual object, wherein the selecting action comprises at least one of tapping a finger on the display, pressing the finger on the display, or placing on the display a cursor near the virtual object position.
20 . The non-transitory computer-readable medium of claim 16 , wherein the program code when executed is operative to cause the processor to perform the further steps of: detecting a selecting action relative to the virtual object position; and detecting the connecting action relative to the display, wherein the connecting action comprises at least one of tapping a finger near the target location, sliding the finger toward the target location, or sliding a cursor toward the target location.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a Continuation of U.S. application Ser. No. 17/901,611 filed on Sep. 1, 2022, the contents of which is incorporated fully herein by reference. TECHNICAL FIELD Examples set forth in the present disclosure relate to the field of augmented reality (AR) experiences for electronic devices, including portable electronic devices and wearable devices such as eyewear. More particularly, but not by way of limitation, the present disclosure describes recording applications that enable users to virtually connect virtual objects to other elements, including avatars representing fellow users, when creating an interactive AR story experience. BACKGROUND Many types of computers and electronic devices available today, such as mobile devices (e.g., smartphones, tablets, and laptops), handheld devices, and wearable devices (e.g., smart glasses, digital eyewear, headwear, headgear, and head-mounted displays), include a variety of cameras, sensors, wireless transceivers, input systems, and displays. Users sometimes refer to information on these devices during physical activities such as exercise. The so-called “Internet of Things” (IoT) refers to and includes physical products that are embedded with sensors, software, and other technologies for enabling connection and exchange of data with other devices, in a network, often over the Internet. For example, IoT products are used in home automation to control lighting, heating and air conditioning, media and security systems, and camera systems. A number of IoT-enabled devices have been provided that function as smart home hubs to connect different smart home products. IoT devices have been used in a number of other applications as well. Application layer protocols and supporting frameworks have been provided for implementing such IoT applications. For example, some IoT products include an application programming interface (API) that allows the IoT product to pair with and otherwise communicate with other products and electronic devices, such as portable computers. Artificial intelligence has also been combined with the Internet of Things infrastructure to achieve more efficient IoT network operations, improve human-machine interactions, and enhance data management and analytics. Virtual reality (VR) technology generates a complete virtual environment including realistic images, sometimes presented on a VR headset or other head-mounted display. VR experiences allow a user to move through the virtual environment and interact with virtual objects. AR is a type of VR technology that combines real objects in a physical environment with virtual objects and displays the combination to a user. The combined display gives the impression that the virtual objects are authentically present in the environment, especially when the virtual objects appear and behave like the real objects. Cross reality (XR) is generally understood as an umbrella term referring to systems that include or combine elements from AR, VR, and MR (mixed reality) environments. Automatic speech recognition (ASR) is a field of computer science, artificial intelligence, and linguistics which involves receiving spoken words and converting the spoken words into audio data suitable for processing by a computing device. Human speech received as an analog audio signal can be converted to a digital audio signal and analyzed using ASR to identify the discrete words in the speech. The identified words can be used to execute actions or commands which are useful in controlling and interacting with computer software applications. ASR processing may be used by computers, handheld devices, wearable devices, telephone systems, automobiles, and a wide variety of other devices to facilitate human-computer interactions. BRIEF DESCRIPTION OF THE DRAWINGS Features of the various examples described will be readily understood from the following detailed description, in which reference is made to the figures. A reference numeral is used with each element in the description and throughout the several views of the drawing. When a plurality of similar elements is present, a single reference numeral may be assigned to like elements, with an added upper or lower-case letter referring to a specific element. The various elements shown in the figures are not drawn to scale unless otherwise indicated. The dimensions of the various elements may be enlarged or reduced in the interest of clarity. The several figures depict one or more implementations and are presented by way of example only and should not be construed as limiting. Included in the drawing are the following figures: FIG. 1A is a side view (right) of an eyewear device suitable for use in an example following behaviors system; FIG. 1B is a perspective, partly sectional view of optical components and electronics in a portion of the eyewear device illustrated in FIG. 1A; FIG. 1C is a side view (left) of the eyewear device of FIG. 1A; FIG