US-12625557-B2 - Detecting head gestures using inertial measurement unit signals

US12625557B2US 12625557 B2US12625557 B2US 12625557B2US-12625557-B2

Abstract

In one embodiment, a method includes presenting a suggestion to a user of a head-mounted device by the head-mounted device via an assistant xbot during a dialog session between the user and the assistant xbot, wherein the suggestion is associated with a plurality of actions to be performed by an assistant system associated with the assistant xbot, accessing signals from inertial measurement unit (IMU) sensors of the head-mounted device by the head-mounted device during the dialog session, determining a head gesture performed by the user during the dialog session by an on-device head-gesture detection model and based only on the signals from the IMU sensors, and executing a first action from multiple actions by the assistant system executing on the head-mounted device, wherein the first action is selected based on the determined head gesture during the dialog session.

Inventors

Shervin Ghasemlou
Devashish Prasad Joshi
Rongzhou Shen
Riza Kazemi

Assignees

META PLATFORMS, INC.

Dates

Publication Date: 20260512
Application Date: 20240510

Claims (20)

1 . A method, comprising: detecting, by a head-mounted device, a placement of the head-mounted device on a head of a user; in response to detecting a user activity performed by the user and without receiving another input from the user, determining a suggestion based on the user activity; presenting, by the head-mounted device, the suggestion; determining, by a head-gesture detection model associated with the head-mounted device and based on at least one signal from at least one inertial measurement unit (IMU) sensor of the head-mounted device, a head gesture performed by the user; while the user activity is ongoing and in accordance with a determination that the head-gesture performed by the user corresponds to an acceptance of the suggestion, executing, by an assistant system associated with the head-mounted device, an action to be performed by the assistant system based on the head gesture and the suggestion; in response to detecting a conclusion of the user activity performed by the user and without receiving another input from the user, determining a different suggestion based on the conclusion of the user activity; presenting, by the head-mounted device, the different suggestion; determining, by the head-gesture detection model associated with the head-mounted device and based on at least one signal from at least one IMU sensor of the head-mounted device, the head gesture performed by the user; and in accordance with a determination that the head-gesture performed by the user corresponds to an acceptance of the different suggestion, executing, by the assistant system associated with the head-mounted device, a different action to be performed by the assistant system based on the head gesture and the different suggestion.
2 . The method of claim 1 , wherein detecting the placement of the head-mounted device on the head of the user is based on a machine-learning model comprising one or more finite state machines.
3 . The method of claim 1 , wherein presenting the suggestion comprises presenting the suggestion to the user during a dialog session between the user and an assistant chatbot associated with the assistant system.
4 . The method of claim 1 , wherein determining the suggestion is further based on contextual information associated with the user.
5 . The method of claim 1 , further comprising: in response to detecting a second user activity, distinct from the user activity, the user and without receiving another input from the user, determining a second suggestion based on the second user activity; presenting, by the head-mounted device, the second suggestion; determining, by the head-gesture detection model and based on at least one second signal from the at least IMU sensor, a second head gesture performed by the user; and while the second user activity is ongoing and in accordance with a determination that the second head-gesture performed by the user corresponds to an acceptance of the second suggestion, executing, by the assistant system, a second action to be performed by the assistant system based on the second head gesture and the second suggestion.
6 . The method of claim 1 , wherein the at least one signal from the at least one IMU sensor is sampled at a frequency not greater than 26 Hz.
7 . The method of claim 1 , wherein the head-gesture detection model is based on one or more neural networks that are operable to learn contexts and track relationships in sequential data associated with the at least one signal from the at least one IMU sensor.
8 . The method of claim 1 , wherein a size of the head-gesture detection model is no greater than 1 MB.
9 . The method of claim 1 , wherein the action comprises at least one of: executing a task associated with the suggestion; not executing the task associated with the suggestion; executing the task associated with the suggestion in a particular way; and activating the assistant system.
10 . The method of claim 1 , wherein: the head gesture comprises an up-down sagittal head nod; and the action comprises executing a task associated with the suggestion.
11 . The method of claim 1 , wherein: the head gesture comprises a left-right transverse head shake; and the action comprises not executing a task associated with the suggestion.
12 . The method of claim 1 , wherein: the action comprises a first action of a plurality of actions to be performed by the assistant system; and the action is selected from the plurality of actions based on the head gesture.
13 . The method of claim 12 , wherein the plurality of actions comprise at least one of: executing a task associated with the suggestion; not executing the task associated with the suggestion; executing the task associated with the suggestion in a particular way; and activating the assistant system.
14 . The method of claim 1 , further comprising: accessing, from a database associated with the assistant system, a plurality of pairs between head gestures and indications corresponding to the head gestures; and identifying a first pair from the plurality of pairs, wherein the first pair is between the head gesture and an indication corresponding to the head gesture, and wherein the action is selected from a plurality of actions based on the indication corresponding to the head gesture.
15 . The method of claim 14 , wherein the plurality of pairs between the head gestures and the indications corresponding to the head gestures are personalized to the user.
16 . The method of claim 14 , wherein the plurality of pairs between the head gestures and the indications corresponding to the head gestures are customized for a region with which the user is associated.
17 . The method of claim 1 , wherein the head gesture comprises at least one of: an up-down sagittal head nod; a left-right transverse head shake; a one-way head tilt along a frontal plane associated with the head of the user; a side-to-side head tilt across the frontal plane; and a one-way transverse head rotation.
18 . The method of claim 1 , further comprising: activating the head-gesture detection model in response to the user initiating a dialog session between the user and an assistant chatbot associated with the assistant system.
19 . A system, comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: detect, by a head-mounted device, a placement of the head-mounted device on a head of a user; in response to detecting a user activity performed by the user and without receiving another input from the user, determine a suggestion based on the user activity; present, by the head-mounted device, the suggestion; determine, by a head-gesture detection model associated with the head-mounted device and based on at least one signal from at least one inertial measurement unit (IMU) sensor of the head-mounted device, a head gesture performed by the user; while the user activity is ongoing and in accordance with a determination that the head-gesture performed by the user corresponds to an acceptance of the suggestion, execute, by an assistant system associated with the head-mounted device, an action to be performed by the assistant system based on the head gesture and the suggestion; in response to detecting a conclusion of the user activity performed by the user and without receiving another input from the user, determine a second suggestion, distinct from the suggestion, based on the conclusion of the user activity; present, by the head-mounted device, the second suggestion; determine, by the head-gesture detection model associated with the head-mounted device and based on at least one signal from at least one IMU sensor of the head-mounted device, the head gesture performed by the user; and in accordance with a determination that the head-gesture performed by the user corresponds to an acceptance of the second suggestion, execute, by the assistant system associated with the head-mounted device, a second action to be performed by the assistant system based on the head gesture and the second suggestion.
20 . One or more computer-readable non-transitory storage media embodying software that is operable when executed to: detect, by a head-mounted device, a placement of the head-mounted device on a head of a user; in response to detecting a user activity performed by the user and without receiving another input from the user, determine a suggestion based on the user activity; cause the head-mounted device to present the suggestion; determine, by a head-gesture detection model associated with the head-mounted device and based on at least one signal from at least one inertial measurement unit (IMU) sensor of the head-mounted device, a head gesture performed by the user; while the user activity is ongoing and in accordance with a determination that the head-gesture performed by the user corresponds to an acceptance of the suggestion, execute, by an assistant system associated with the head-mounted device, an action to be performed by the assistant system based on the head gesture and the suggestion; in response to detecting a conclusion of the user activity performed by the user and without receiving another input from the user, determine a second suggestion, distinct from the suggestion, based on the conclusion of the user activity; cause the head-mounted device to detect the second suggestion; determine, by the head-gesture detection model associated with the head-mounted device and based on at least one signal from at least one IMU sensor of the head-mounted device, the head gesture performed by the user; and in accordance with a determination that the head-gesture performed by the user corresponds to an acceptance of the second suggestion, execute, by the assistant system associated with the head-mounted device, a second action to be performed by the assistant system based on the head gesture and the second suggestion.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This present application claims the benefit of priority under 35 U.S.C. 120 as a continuation of U.S. patent application Ser. No. 18/061,838, filed Dec. 5, 2022, now U.S. U.S. Pat. No. 11,983,329 issued May 14, 2024, the disclosure of which is hereby incorporated by reference in its entirety for all purposes. FIELD This disclosure generally relates to databases and file management within network environments, and in particular relates to hardware and software for smart assistant systems. BACKGROUND An assistant system can provide information or services on behalf of a user based on a combination of user input, location awareness, and the ability to access information from a variety of online sources (such as weather conditions, traffic congestion, news, stock prices, user schedules, retail prices, etc.). The user input may include text (e.g., online chat), especially in an instant messaging application or other applications, voice, images, motion, or a combination of them. The assistant system may perform concierge-type services (e.g., making dinner reservations, purchasing event tickets, making travel arrangements) or provide information based on the user input. The assistant system may also perform management or data-handling tasks based on online information and events without user initiation or interaction. Examples of those tasks that may be performed by an assistant system may include schedule management (e.g., sending an alert to a dinner date that a user is running late due to traffic conditions, update schedules for both parties, and change the restaurant reservation time). The assistant system may be enabled by the combination of computing devices, application programming interfaces (APIs), and the proliferation of applications on user devices. A social-networking system, which may include a social-networking website, may enable its users (such as persons or organizations) to interact with it and with each other through it. The social-networking system may, with input from a user, create and store in the social-networking system a user profile associated with the user. The user profile may include demographic information, communication-channel information, and information on personal interests of the user. The social-networking system may also, with input from a user, create and store a record of relationships of the user with other users of the social-networking system, as well as provide services (e.g. profile/news feed posts, photo-sharing, event organization, messaging, games, or advertisements) to facilitate social interaction between or among users. The social-networking system may send over one or more networks content or messages related to its services to a mobile or other computing device of a user. A user may also install software applications on a mobile or other computing device of the user for accessing a user profile of the user and other data within the social-networking system. The social-networking system may generate a personalized set of content objects to display to a user, such as a newsfeed of aggregated stories of other users connected to the user. SUMMARY OF PARTICULAR EMBODIMENTS In particular embodiments, the assistant system may assist a user to obtain information or services. The assistant system may enable the user to interact with the assistant system via user inputs of various modalities (e.g., audio, voice, text, image, video, gesture, motion, location, orientation) in stateful and multi-turn conversations to receive assistance from the assistant system. As an example and not by way of limitation, the assistant system may support mono-modal inputs (e.g., only voice inputs), multi-modal inputs (e.g., voice inputs and text inputs), hybrid/multi-modal inputs, or any combination thereof. User inputs provided by a user may be associated with particular assistant-related tasks, and may include, for example, user requests (e.g., verbal requests for information or performance of an action), user interactions with an assistant application associated with the assistant system (e.g., selection of UI elements via touch or gesture), or any other type of suitable user input that may be detected and understood by the assistant system (e.g., user movements detected by the client device of the user). The assistant system may create and store a user profile comprising both personal and contextual information associated with the user. In particular embodiments, the assistant system may analyze the user input using natural-language understanding (NLU). The analysis may be based on the user profile of the user for more personalized and context-aware understanding. The assistant system may resolve entities associated with the user input based on the analysis. In particular embodiments, the assistant system may interact with different agents to obtain information or services that are associated with the resolved entities. The assistant system may g