US-12620395-B2 - Generating a group automated assistant session to provide content to a plurality of users via headphones

US12620395B2US 12620395 B2US12620395 B2US 12620395B2US-12620395-B2

Abstract

Systems and methods for creating a group automated assistant session and processing requests that are intended for the users that are included in the group. A plurality of users can indicate intentions to create a group session that includes selecting an automated assistant, from the automated assistants executing on the devices of the user and providing the selected automated assistant with audio data that is captured by microphones of the user devices. In response, the selected automated assistant processes the audio data and generates a response that is provided, via one or more speakers of the device that is executing the selected automated assistant. Further, fulfillment data is provided to the automated assistants executing on other devices and, in response to being provided the fulfillment data, each automated assistant causes audio data to be rendered, via one or more speakers of each respective device, that is responsive to the request.

Inventors

Victor Carbune
Matthew Sharifi

Assignees

GOOGLE LLC

Dates

Publication Date: 20260505
Application Date: 20221205

Claims (20)

1 . A method, implemented by one or more processors, comprising: receiving a first indication from a first user of intent to join a group automated assistant session, wherein the first indication is provided via a first client device executing a first automated assistant, and wherein the first client device is paired with a first headphone, and wherein audio responses of the first automated assistant are rendered to the first user via the first headphone; receiving a second indication from a second user of intent to join the group automated assistant session, wherein the second indication is provided via a second client device executing a second automated assistant, wherein the second client device is paired with a second headphone, and wherein audio responses of the second automated assistant are rendered to the second user via the second headphone; selecting, when the first automated assistant is executing on the first client device and is not executing on the second client device, the first automated assistant as a primary automated assistant for the group automated assistant session; in response to receiving the first indication, the second indication, and a group invocation phrase: creating the group automated assistant session, wherein while the group automated assistant session is active, the primary automated assistant is configured to receive group requests from both the first user and the second user, and cause group responses to be rendered to both the first user and the second user; receiving, by the primary automated assistant during the group automated assistant session, audio data captured by a first microphone, associated with the first client device and/or captured by a second microphone, associated with the second client device; processing, by the primary automated assistant, the audio data to determine that the audio data includes a group request, wherein determining that the audio data processed by the primary automated assistant includes the group request comprises determining that the audio data includes the group invocation phrase; generating, by the primary automated assistant, an audio response that is responsive to the group request; causing the audio response to be rendered via the first headphone in response to the group request; and providing, to the second automated assistant, fulfillment data that causes the second automated assistant to render the audio response via the second headphone in response to the group request.
2 . The method of claim 1 , wherein the group automated assistant session is further created in response to the first microphone and/or the second microphone capturing audio data that includes a group invocation phrase.
3 . The method of claim 1 , wherein at least one of the first indication and the second indication is an active group session mode.
4 . The method of claim 1 , wherein at least one of the first indication and the second indication indicates that the first client device is within a threshold distance from the second client device.
5 . The method of claim 4 , further comprising: determining that the first client device and the second client device are no longer within the threshold distance; and ending the group automated assistant session.
6 . The method of claim 1 , wherein the group request is for audio playback, and wherein the audio response is playback of audio data to the first user and the second user.
7 . The method of claim 1 , further comprising: receiving, by a third client device, executing a third automated assistant, a third indication from a third user of intent to join a group automated assistant session, wherein the third client device is paired with a third headphone, and wherein audio responses of the third automated assistant are rendered to the third user via the third headphone.
8 . The method of claim 7 , further comprising: providing, via the third headphone, a summary of one or more previous requests and/or responses of the group automated assistant session.
9 . The method of claim 1 , wherein at least one of the first indication and the second indication is a calendar entry of the first user and/or the second user.
10 . The method of claim 1 , further comprising: receiving, by the primary automated assistant during the group automated assistant session, audio data captured by the first microphone; processing, by the primary automated assistant, the audio data to determine that the audio data includes a private request, wherein the private request is intended only for the first user; generating, by the primary automated assistant, an audio response that is responsive to the private request; and causing the audio response to be rendered via the first headphone in response to the private request without providing, to the second automated assistant, fulfillment data.
11 . The method of claim 1 , wherein the first indication and the second indication are received by a third device in communication with the first client device and the second client device.
12 . The method of claim 1 , wherein the first indication and the second indication are received by the first client device.
13 . A system, comprising: a first client device, wherein the first client device is executing a first automated assistant, wherein the first client device is paired with a first headphone, and wherein audio responses of the first automated assistant are rendered to a first user via the first headphone; a second client device, wherein the second client device is executing a second automated assistant, wherein the second client device is paired with a second headphone, and wherein audio responses of the second automated assistant are rendered to a second user via the second headphone; and a group automated assistant client, wherein the group automated assistant client is configured to: receive a first indication from the first client device of intent to join a group automated assistant session; receive a second indication from a second client device of intent to join the group automated assistant session; select, when the first automated assistant is executing on the first client device and is not executing on the second client device, a primary automated assistant for the group automated assistant session; and in response to receiving the first indication, the second indication, and a group invocation phrase: create the group automated assistant session, wherein while the group automated assistant session is active, the primary automated assistant is configured to receive group requests from both the first user and the second user, and cause group responses to be rendered to both the first user and the second user, wherein in receiving the group request, the primary automated assistant is to process audio data and determine, based on processing the audio data, that the audio data includes the group invocation phrase.
14 . The system of claim 13 , wherein the primary automated assistant is the first automated assistant.
15 . The system of claim 13 , wherein the group automated assistant client is executing on one of the first client device and/or the second client device.
16 . The system of claim 13 , wherein the group automated assistant client is executing on a remote device in communication with the first client device and the second client device.
17 . The system of claim 13 , wherein the primary automated assistant is further configured to: receive, during the group automated assistant session, audio data captured by a first microphone, associated with the first client device and/or captured by a second microphone, associated with the second client device; process the audio data to determine that the audio data includes a group request; generate an audio response that is responsive to the group request; cause the audio response to be rendered via the first headphone in response to the group request; and cause the audio response to be rendered via the second headphone in response to the group request.
18 . At least one non-transitory computer-readable medium comprising instructions that, in response to execution of the instructions by one or more processors, cause the one or more processors to perform the following operations: receiving a first indication from a first user of intent to join a group automated assistant session, wherein the first indication is provided via a first client device executing a first automated assistant, and wherein the first client device is paired with a first headphone, and wherein audio responses of the first automated assistant are rendered to the first user via the first headphone; receiving a second indication from a second user of intent to join the group automated assistant session, wherein the second indication is provided via a second client device executing a second automated assistant, wherein the second client device is paired with a second headphone, and wherein audio responses of the second automated assistant are rendered to the second user via the second headphone; selecting, when the first automated assistant is executing on the first client device and is not executing on the second client device, the first automated assistant as a primary automated assistant for the group automated assistant session; in response to receiving the first indication, the second indication, and a group invocation phrase: creating the group automated assistant session, wherein while the group automated assistant session is active, the primary automated assistant is configured to receive group requests from both the first user and the second user, and cause group responses to be rendered to both the first user and the second user; receiving, by the primary automated assistant during the group automated assistant session, audio data captured by a first microphone, associated with the first client device and/or captured by a second microphone, associated with the second client device; processing, by the primary automated assistant, the audio data to determine that the audio data includes a group request, wherein determining that the audio data processed by the primary automated assistant includes the group request comprises determining that the audio data includes the group invocation phrase; generating, by the primary automated assistant, an audio response that is responsive to the group request; causing the audio response to be rendered via the first headphone in response to the group request; and providing, to the second automated assistant, fulfillment data that causes the second automated assistant to render the audio response via the second headphone in response to the group request.
19 . The at least one non-transitory computer-readable medium of claim 18 , wherein the first indication and the second indication are received by a third device in communication with the first client device and the second client device.
20 . The at least one non-transitory computer-readable medium of claim 18 , wherein the first indication and the second indication are received by the first client device.

Description

BACKGROUND Humans can engage in human-to-computer dialogs with interactive software applications referred to herein as “automated assistants” (also referred to as “chat bots,” “interactive personal assistants,” “intelligent personal assistants,” “personal voice assistants,” “conversational agents,” etc.). For example, a human (which when interacting with an automated assistant may be referred to as a “user”) may provide an explicit input (e.g., commands, queries, and/or requests) to the automated assistant that can cause the automated assistant to generate and provide responsive output, to control one or more Internet of things (IoT) devices, and/or to perform one or more other functionalities (e.g., assistant actions). This explicit input provided by the user can be, for example, spoken natural language input (i.e., spoken utterances) which may in some cases be converted into text (or other semantic representation) and then further processed, and/or typed natural language input. In some cases, automated assistants may include automated assistant clients that are executed locally by assistant devices and that are engaged directly by users, as well as cloud-based counterpart(s) that leverage the virtually limitless resources of the cloud to help automated assistant clients respond to users' inputs. For example, an automated assistant client can provide, to the cloud-based counterpart(s), audio data of a spoken utterance of a user (or a text conversion thereof), and optionally data indicative of the user's identity (e.g., credentials). The cloud-based counterpart may perform various processing on the explicit input to return result(s) to the automated assistant client, which may then provide corresponding output to the user. In other cases, automated assistants may be exclusively executed locally by assistant devices and that are engaged directly by users to reduce latency. SUMMARY Implementations disclosed herein relate to selecting an automated assistant to utilize in a group environment whereby each user in the group can interact within the same automated assistant session, via individual headphones, while the group session is active. A first client device, executing an automated assistant that is providing audio output via one or more external speakers (e.g., headphones), can determine that a user of a second device, which is also executing an automated assistant that can provide audio output via one or more other external speakers, intends on joining the automated assistant session of the first device (or create a new automated assistant session in conjunction with the first device). The group can be created and one of the automated assistants can be selected as the primary automated assistant, which can process requests that are received from either a user of the first device and/or a user of the second device while the session is active. While the session is active, audio data that is captured by microphones associated with the first device and audio data that is captured by microphones associated with the second device can both be processed by the primary automated assistant. When the captured audio data includes a request that is intended for the group, the primary automated assistant can process the request and provide a response that can be rendered by both the speakers associated with the first device and the speakers associated with the second device. In some implementations, the primary automated assistant can directly provide a response via the speakers of both devices. In some implementations, the primary automated assistant can provide fulfillment data to the secondary automated assistant (i.e., the automated assistant that was not selected as the primary automated assistant) for generating a response that can be rendered via the speakers associated with the device that is executing the secondary automated assistant. For example, a user may be engaged with an automated assistant that is executing on a mobile device via headphones that allow the automated assistant to provide audio output to the user. The user may utter one or more requests that are captured by a microphone associated with the mobile device (e.g., a microphone of the mobile device and/or a microphone that is a component of the headphone(s)). In response, the automated assistant can process the utterance and cause one or more actions to be performed, such as rendering an audio response to a query via the headphones and/or cause one or more other applications to perform an action. For example, the user can utter “OK Assistant, play my playlist,” which can be captured as audio data by one or more microphones of the user's device (or headphones). The request can be fulfilled by one or more components of the automated assistant. For example, the automated assistant can provide fulfillment data to a music player application, which can cause the user's playlist of music to be rendered via the headphones. Also, for example, the user c