US-12625559-B2 - Gesture detection using external sensors

US12625559B2US 12625559 B2US12625559 B2US 12625559B2US-12625559-B2

Abstract

The technology provides for a system for determining a gesture provided by a user. In this regard, one or more processors of the system may receive image data from one or more visual sensors of the system capturing a motion of the user, and may receive motion data from one or more wearable computing devices worn by the user. The one or more processors may recognize, based on the image data, a portion of the user's body that corresponds to a gesture to perform a command. The one or more processors may also determine one or more correlations between the image data and the received motion data. Based on the recognized portion of the user's body and the one or more correlations between the image data and the received motion data, the one or more processors may detect the gesture.

Inventors

Katherine Blair Huffman
Gregory Granito

Assignees

GOOGLE LLC

Dates

Publication Date: 20260512
Application Date: 20240911

Claims (20)

1 . A method, comprising: receiving, by one or more processors, image data; analyzing, by the one or more processors, the image data to detect a movement of one or more portions of a body of a user; using, by the one or more processors, one or more machine learning models to recognize that the detected movement corresponds to a gesture by the user; generating, by the one or more processors, a time-based series of motion data including positions for the detected movement corresponding to multiple points of the one or more portions of the body of the user; identifying, by the one or more processors based on the time-based series of motion data, the gesture; and causing, by the one or more processors, a computing device to perform a command in response to the identified gesture.
2 . The method of claim 1 , wherein the time-based series of motion data includes a set of velocities for the one or more portions of the body of the user.
3 . The method of claim 1 , wherein the time-based series of motion data includes a set of accelerations for the one or more portions of the body of the user.
4 . The method of claim 1 , wherein the analyzing comprises arranging frames of the image data chronologically.
5 . The method of claim 4 , wherein each of the frames has a respective timestamp, and the frames are arranged according to their respective timestamps.
6 . The method of claim 4 , further comprising: using, by the one or more processors, image or video segmentation to separate portions of one or more images that correspond to the one or more portions of the body of the user from portions that correspond to other objects or background.
7 . The method of claim 1 , wherein the one or more machine learning models include one or more pattern recognition models used to recognize the one or more portions of the body of the user that correspond to the gesture to perform the command.
8 . The method of claim 1 wherein the one or more machine learning models include one or more one or more object recognition models to recognize the one or more portions of the body of the user that correspond to the gesture to perform the command.
9 . The method of claim 1 , wherein the one or more portions of the body of the user include a hand of the user, and the time-based series of motion data includes a time-based series of positions for the hand.
10 . The method of claim 9 , wherein the multiple points of the hand correspond to an outline of the hand.
11 . A system, comprising: a computing device; and one or more processors configured to: receive image data; analyze the image data to detect a movement of one or more portions of a body of a user; use one or more machine learning models to recognize that the detected movement corresponds to a gesture by the user; generate a time-based series of motion data including positions for the detected movement corresponding to multiple points of the one or more portions of the body of the user; identify, based on the time-based series of motion data, the gesture; and cause the computing device to perform a command in response to the identified gesture.
12 . The system of claim 11 , wherein the time-based series of motion data includes a set of velocities for the one or more portions of the body of the user.
13 . The system of claim 11 , wherein the time-based series of motion data includes a set of accelerations for the one or more portions of the body of the user.
14 . The system of claim 11 , wherein the image data is analyzed by arranging frames of the image data chronologically.
15 . The system of claim 14 , wherein each of the frames has a respective timestamp, and the frames are arranged according to their respective timestamps.
16 . A non-transitory computer readable medium on which instructions are stored, the instructions, when executed by one or more processors, cause the one or more processors to perform a method, the method comprising: receiving image data; analyzing the image data to detect a movement of one or more portions of a body of a user; using one or more machine learning models to recognize that the detected movement corresponds to a gesture by the user; generating a time-based series of motion data including positions for the detected movement corresponding to multiple points of the one or more portions of the body of the user; identifying, based on the time-based series of motion data, the gesture; and causing, a computing device to perform a command in response to the identified gesture.
17 . The non-transitory computer readable medium of claim 16 , wherein the one or more machine learning models include one or more pattern recognition models used to recognize the one or more portions of the body of the user that correspond to the gesture to perform the command.
18 . The non-transitory computer readable medium of claim 16 wherein the one or more machine learning models include one or more one or more object recognition models to recognize the one or more portions of the body of the user that correspond to the gesture to perform the command.
19 . The non-transitory computer readable medium of claim 16 , wherein the one or more portions of the body of the user include a hand of the user, and the time-based series of motion data includes a time-based series of positions for the hand.
20 . The non-transitory computer readable medium of claim 19 , wherein the multiple points of the hand correspond to an outline of the hand.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 18/491,837, filed Oct. 23, 2023, which is a continuation of U.S. application Ser. No. 17/969,823, filed Oct. 20, 2022, and issued as U.S. Pat. No. 11,822,731 on Nov. 21, 2023, which is a continuation of U.S. application Ser. No. 17/692,833, filed Mar. 11, 2022, and issued as U.S. Pat. No. 11,507,198 on Nov. 22, 2022, which is a continuation of U.S. application Ser. No. 17/139,241, filed Dec. 31, 2020 and issued as U.S. Pat. No. 11,301,052 on Apr. 12, 2022, which is a continuation of U.S. application Ser. No. 16/373,901 filed Apr. 3, 2019, now U.S. Pat. No. 10,908,695 issued Feb. 2, 2021, the entire disclosures of which are incorporated by reference herein. BACKGROUND Computing devices such as desktop and laptop computers have various user interfaces that allow users to interact with the computing devices. For example, such interfaces may include a keyboard, a mouse, a touchpad, a touch screen, buttons, etc. A user may control various functions of the computing devices and user applications installed on the computing devices through these interfaces. However, interactions with these interfaces can be inconvenient or unnatural, such as manipulating a three-dimensional object on the screen by typing on a keyboard or clicking on a mouse. For wearable devices such as a smartwatch and head mounts, interfaces such as keyboard and mouse may be impractical or impossible due to the form factors of the wearable devices. For example, a virtual keyboard on a smartwatch may be too small for some users to reliably operate. As such, wearable devices may be designed to enable user interactions that are more convenient and natural when using such devices, such as by voice, touch, or gesture. To do so, wearable devices are equipped with various sensors, such as microphones and inertial measurement units (IMU), and users may use those sensors for the purpose of interacting with the device. Examples of IMUs may typically include an accelerometer and a gyroscope. BRIEF SUMMARY The present disclosure provides for receiving, by one or more processors, image data from one or more visual sensors capturing a motion of a user; receiving, by the one or more processors, motion data from one or more wearable computing devices worn by the user; recognizing, by the one or more processors based on the image data, a portion of the user's body that corresponds with a gesture to perform a command; determining, by the one or more processors, one or more correlations between the image data and the received motion data; and detecting, by the one or more processors, the gesture based on the recognized portion of the user's body and the one or more correlations between the image data and the received motion data. Determining the one or more correlations may further include synchronizing timestamps associated with the image data and timestamps associated with the received motion data. The method may further comprise: determining, by the one or more processors, a first coordinate system from a perspective of the one or more visual sensors; determining, by the one or more processors, a second coordinate system from a perspective of the one or more wearable computing devices; determining, by the one or more processors, one or more transformations between the first coordinate system and the second coordinate system, wherein determining the one or more correlations further includes determining the one or more transformations. The method may further comprise determining, by the one or more processors, where the recognized portion of the user's body includes a hand of the user, a position for one or more fingers of the user's hand, wherein detecting the gesture is further based on the position of the one or more fingers. The method may further comprise generating, by the one or more processors, a time-based series of motion data for the recognized portion of the user's body based on the image data, the generated time-based series of motion data including at least one of a time-based series of positions, a time-based series of velocities, and a time-based series of accelerations. The received motion data may include a time-based series of inertial measurements, and wherein determining the one or more correlations may include matching the time-based series of motion data generated based on the image data to the time-based series of inertial measurements. The method may further comprise determining, by the one or more processors, depth information for the motion of the user based on the received motion data, wherein detecting the gesture is further based on the depth information. The method may further comprise determining, by the one or more processors, orientation of the one or more wearable computing devices based on the received motion data, wherein detecting the gesture is further based on the orientation of the one or more wearable computing devices.