US-12625479-B1 - Proactive presentment of personalized activity recommendations

US12625479B1US 12625479 B1US12625479 B1US 12625479B1US-12625479-B1

Abstract

Techniques for presenting personalized content to a user are described. A smart home system may determine activity data (including present activity data and device usage history) specific to interactions involving a first device, and generate activity embedding data representing the activity data. The smart home system may also receive a first list content for presentment, and generate a second list of content using the first list and the activity embedding data. The smart home system may send the second list to the first device. The first device may determine a third list of content based on the second list and an identify of a user presently interacting with the first device, such that the third list of content is specific to interactions involving the first device and the user. The first device may then present at least one instance of content represented in the third list.

Inventors

Sven Eberhardt
Maisie Wang
Kunal Pramod Ghogale
Emmett Barton
Dustin D Clark

Assignees

AMAZON TECHNOLOGIES, INC.

Dates

Publication Date: 20260512
Application Date: 20220331

Claims (20)

1 . A computing system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the computing system to: determine activity representing at least: a first usage history indicating a first smart home device performed a first action in response to a first user input received by a first device, and a second of usage history indicating the first smart home device performed a second action in response to a second user input received by a second device; based on the activity data, determine a list of smart home devices that performed actions in response to user inputs received by the first device, instead of the second device; send, to the first device, the list of smart home devices; in response to receiving the list of smart home devices, determine, by the first device, user data representing characteristics of a user of the first device; using the user data, determine, by the first device and from the list of smart home devices, one or more smart home devices that performed one or more actions in response to user inputs of the user to the first device; after determining the one or more smart home devices that performed one or more actions in response to user inputs of the user to the first device, determine, by the first device, occurrence of a trigger event indicating the user is near or engaging with the first device; and in response to determining occurrence of the trigger event, display, by the first device, a representation of at least one of the one or more smart home devices.
2 . The computing system of claim 1 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to, by the first device: determine image data using at least one camera of the first device; process the image data to determine a portion of the image data corresponding to a face; process the portion of the image data corresponding to the face to determine first facial feature embedding data representing facial features of the face; determine the user data to be stored facial feature embedding data corresponding to the first facial feature embedding data; determine usage history data associated with the user data; and determine the one or more smart home devices that performed one or more actions in response to user inputs of the user to the first device using the list of smart home devices and the usage history data.
3 . The computing system of claim 1 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to, by the first device: using the user data, determine the user corresponds to a first user type; and based on determining the user corresponds to the first user type, determine the one or more smart home devices that performed one or more actions in response to user inputs of the user to the first device to omit a third smart home device represented in the list of smart home devices, wherein the third smart home device corresponds to a smart home device type inappropriate to be controlled by a user of the first user type.
4 . The computing system of claim 1 , wherein the trigger event corresponds to one or more of: determining image data includes a representation of a face; determining audio data includes speech directed to the first device; determining the user has navigated to a graphical user interface screen of the first device; determining a period of time has elapsed; detecting motion of an object within a vicinity of the first device; detecting a wireless signal emitted from a third device; detecting a service set identifier of a wearable device; detecting a palm the user; receiving second data representing a height of the user; receiving third data representing a gait of the user; receiving fourth data representing at least one of a hair color and a hair length; and receiving fifth data indicating a fourth device has detected the user.
5 . A computing device comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the computing device to: receive a list of content generated based on activity data associated with the computing device; determine user data representing characteristics of a user of the computing device; using the user data, determine, from the list of content, one or more instances of content that was generated or output in response to user inputs of the user to the computing device; after determining the one or more instances of content that was generated or output in response to user inputs of the user to the computing device, determine occurrence of a trigger event indicating the user is near or engaging with the computing device; and in response to determining occurrence of the trigger event, present an output requesting an input requesting performance of an action corresponding to at least one of the one or more instances of content that was generated or output in response to user inputs of the user to the computing device.
6 . The computing device of claim 5 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing device to: determine usage history data associated with the user data; and determine the one or more instances of content that was generated or output in response to user inputs of the user to the computing device using the list of content and the usage history data.
7 . The computing device of claim 6 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing device to: receive image data using at least one camera of the computing device; perform facial detection processing on the image data to determine a portion of the image data corresponds to a face; process the portion of the image data corresponding to the face to determine first facial feature embedding data representing facial features of the face; determine stored facial feature embedding data corresponding to the first facial feature embedding data; and determine the usage history data to be associated with the stored facial feature embedding data.
8 . The computing device of claim 6 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing device to: receive audio data corresponding to a spoken user input; process the audio data to determine first speech characteristic embedding data representing speech characteristics of the spoken user input; determine stored speech characteristic embedding data corresponding to the first speech characteristic embedding data; and determine the usage history data to be associated with the stored speech characteristic embedding data.
9 . The computing device of claim 5 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing device to: based on the user data, determine the user corresponds to a first user type; and determine the one or more instances of content that was generated or output in response to user inputs of the user to the computing device further based on determining the user corresponds to the first user type.
10 . The computing device of claim 5 , wherein the list of content comprises a device name corresponding to a smart home device that is controllable using the computing device.
11 . The computing device of claim 5 , wherein the list of content comprises an identifier corresponding to a group of actions capable of being performed in response to a single user input.
12 . The computing device of claim 5 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing device to: receive input data corresponding to a user input, the user input corresponding to the input requesting performance of an action corresponding to at least one of the one or more instances of content that was generated or output in response to user inputs of the user to the computing device; and commence performance of skill processing in response to the input data.
13 . The computing device of claim 5 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing device to: determine a period of time has elapsed since commencing display of first content based on past content that was generated or output in response to user inputs of the user to the computing device; and based on determining the period of time has elapsed, determine the one or more instances of content that was generated or output in response to user inputs of the user to the computing device and present the output request the input requesting performance of an action corresponding to at least one of the one or more instances of content that was generated or output in response to user inputs of the user to the computing device.
14 . The computing device of claim 5 , wherein: The computing device corresponds to a first computing device; the activity data represents at least: a first usage history indicating first content was generated or output in response to a first user input received by the computing device, and a second usage history indicating second content was generated or output in response to a second user input received by a second computing device; and the list of content represents content that was generated or output in response to user inputs received by the first computing device, instead of the second computing device.
15 . A computer-implemented method performed by a computing device, the computer-implemented method comprising: receiving a list of content generated based on activity data associated with the computing device; determining user data representing characteristics of a user of the computing device; using the user data, determining from the list of content, one or more instances of content that was generated or output in response to user inputs of the user to the computing device; after determining the one or more instances of content that was generated or output in response to user inputs of the user to the computing device, determining occurrence of a trigger event indicating the user is near or engaging with the computing device; and in response to determining occurrence of the trigger event, presenting an output requesting an input requesting performance of an action corresponding to at least one of the one or more instances of content.
16 . The computer-implemented method of claim 15 , further comprising: determining usage history data associated with the user data; and determining the one or more instances of content that was generated or output in response to user inputs of the user to the computing device using the list of content and the usage history data.
17 . The computer-implemented method of claim 16 , further comprising: receiving image data using at least one camera of the computing device; performing facial detection processing on the image data to determine a portion of the image data corresponds to a face; processing the portion of the image data corresponding to the face to determine first facial feature embedding data representing facial features of the face; determining stored facial feature embedding data corresponding to the first facial feature embedding data; and determining the usage history data to be associated with the stored facial feature embedding data.
18 . The computer-implemented method of claim 16 , further comprising: receiving audio data corresponding to a spoken user input; processing the audio data to determine first speech characteristic embedding data representing speech characteristics of the spoken user input; determining stored speech characteristic embedding data corresponding to the first speech characteristic embedding data; and determining the usage history data to be associated with the stored speech characteristic embedding data.
19 . The computer-implemented method of claim 15 , further comprising: based on the user data, determining the user corresponds to a first user type; and determining the one or more instances of content that was generated or output in response to user inputs of the user to the computing device further based on determining the user corresponds to the first user type.
20 . The computer-implemented method of claim 15 , wherein the list of content comprises at least one of: a device name corresponding to a smart home device that is controllable using the computing device; or an identifier corresponding to a group of actions capable of being performed in response to a single user input.

Description

BACKGROUND Natural language processing systems have progressed to the point where humans can interact with computing devices using their voices and natural language textual input. Such systems employ techniques to identify the words spoken and written by a human user based on the various qualities of received input data. Speech recognition combined with natural language understanding processing techniques enable speech-based user control of computing devices to perform tasks based on the user's spoken inputs. Speech recognition and natural language understanding processing techniques may be referred to collectively or separately herein as spoken language understanding (SLU) processing. SLU processing may be used by computers, hand-held devices, telephone computer systems, kiosks, and a wide variety of other devices to improve human-computer interactions. BRIEF DESCRIPTION OF DRAWINGS For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings. FIG. 1A is a conceptual diagram illustrating a system for generating and presenting a list of options and/or other content personalized to a device and user thereof, according to embodiments of the present disclosure. FIG. 1B is a conceptual diagram illustrating an example layout of devices in an environment, according to embodiments of the present disclosure. FIG. 1C is a conceptual diagram illustrating processing that may be performed by a user recognition component of a device, according to embodiments of the present disclosure. FIG. 2 is a conceptual diagram illustrating another example configuration of the system for generating and presenting a list of content, according to embodiments of the present disclosure. FIG. 3 is a conceptual diagram illustrating another example configuration of the system for generating and presenting a list of content, according to embodiments of the present disclosure. FIG. 4 is a conceptual diagram illustrating example system components that may be used to process a user input, according to embodiments of the present disclosure. FIG. 5 is a conceptual diagram of components of a device, according to embodiments of the present disclosure. FIG. 6 is a block diagram conceptually illustrating example components of a smart home system, according to embodiments of the present disclosure. FIG. 7 is a schematic diagram of an illustrative architecture in which sensor data is combined to recognize one or more users, according to embodiments of the present disclosure. FIG. 8 is a system flow diagram illustrating speech-based user recognition processing, according to embodiments of the present disclosure. FIG. 9 is a block diagram conceptually illustrating example components of a device, according to embodiments of the present disclosure. FIG. 10 is a block diagram conceptually illustrating example components of a system, according to embodiments of the present disclosure. FIG. 11 illustrates an example of a computer network for use with the overall system, according to embodiments of the present disclosure. DETAILED DESCRIPTION Automatic speech recognition (ASR) is a field of computer science, artificial intelligence, and linguistics concerned with transforming audio data associated with speech into a token or other textual representation of that speech. Similarly, natural language understanding (NLU) is a field of computer science, artificial intelligence, and linguistics concerned with enabling computers to derive meaning from natural language inputs (such as spoken inputs). ASR and NLU are often used together as part of a language processing component of a system. Text-to-speech (TTS) is a field of computer science concerning transforming textual and/or other data into audio data that is synthesized to resemble human speech. Natural language generation (NLG) is a field of artificial intelligence concerned with automatically transforming data into natural language (e.g., English) content. An environment (e.g., a house, an apartment, an office, etc.) may include one or more smart home devices. As used herein, a “smart home device” refers to a device that may be controlled by another device or system in response to receiving a user input (e.g., a spoken input or GUI input). Example smart home devices include, but are not limited to, light switches, TVs, plugs, outlets, light bulbs, motion sensors, speakers, door locks, window locks, garage doors, ovens, temperature sensors, and thermostats. A user may utilize one or more smart devices of an environment on a frequent basis and may do so in a recognizable pattern. For example, the user may routinely utilize a smart speaker to play a radio station at a particular time of day. For further example, a user may routinely dim smart lights in a living room and turn on a TV in the living at a particular time of day on a particular day of the week. The present disclosure provides techniques for displayin