EP-4327199-B1 - ACTIVE LISTENING FOR ASSISTANT SYSTEMS

EP4327199B1EP 4327199 B1EP4327199 B1EP 4327199B1EP-4327199-B1

Inventors

STEWART, RYAN FREDERICK
CUI, ZHENHUA
GRAVES, Ian
GURUNATH, PRAMOD
HILAIEL, LLOYD
SRINIVAS, Krishna Chaitanya Gopisetty
CHEN, Yuanhui
HANSON, MICHAEL ROBERT
LIU, BAIYANG
LIU, HONGLEI
SURKOV, Alexey Gennadyevich
LEVISON, DAVID
MOHAMED, AHMED MAGDY HAMED
DIRAFZOON, Alireza
Bearman, Amy Lawson
PU, Yiming
LIU, BING
De Paoli, Christopher
BALMES, CHRISTOPHER E.
WIGDOR, DANIEL JOHN
SAVENKOV, DENIS
NORTHUP, ERIC ROBERT
RAMANAN, Tara
MOSKEY, Gabrielle Catherine
VENKATESH, GANESH
ZHOU, HAO
XU, Hu
Shalowitz, Ilana Orly
Rushing, Jackson
BLAKELEY, JOHN JACOB
KAHN, Jeremy Gillmor
KITCHENS, Jessica
LI, Jihang
NOERTKER, Samuel Steele
YU, JINSONG
VINCENT, JOSHUAH
DENNEY, JUSTIN
Archie, Kyle
PARENT, MARK
FEISZLI, MATTHEW DAN
JHUNJHUNWALA, MEGHA
TIWARI, MEGHA
GLUECK, MICHAEL
Flores, Nicholas Jorge
MARTINSON, LEIF HAVEN
KHEMKA, Piyush
SETHI, Pooja
MOON, SEUNGWHAN
SANTOSA, STEPHANIE
Goel, Swati
GAN, Xin
BLACK, Heath William
CHALAND, CHRISTOPHE
ZUO, Zhengping
DABAS, ROHIN

Dates

Publication Date: 20260506
Application Date: 20220420

Claims (14)

A method comprising, at a computing system comprising a client system, a companion device paired to the client system, and an assistant system connected to the client system and the companion device, wherein a companion application, executing on the companion device, is associated with an assistant xbot, executing on the client system, that interacts with the user of the client system including receiving user inputs and presenting outputs: receiving, by the assistant system from the client system via the companion application, a first user input comprising a wake word associated with the assistant xbot, wherein the companion device is in a locked state when the wake word is received, wherein the companion application is executing as a background application on the companion device, and wherein the companion application is allowed by the companion device to access a computing capacity lower than a threshold capacity; setting, by the assistant system via the companion application, the assistant xbot into a listening mode, wherein a continuous non-visual feedback is provided via the first client system while the assistant xbot is in the listening mode, the continuous non-visual feedback being based on one or more of sound, vibration, or haptics; providing, by the assistant system via the companion device, a continuous visual feedback while the assistant xbot is in the listening mode; in response to providing the continuous visual feedback: executing, by the assistant system, the companion application as a foreground application in response to the continuous non-visual feedback; and; increasing, by the assistant system, the computing capacity the companion application is allowed to access a computing capacity greater than a threshold capacity without unlocking the companion device to run the assistant stack; receiving, by the assistant system from the client system via the companion application, a second user input comprising a user utterance while the assistant xbot is in the listening mode; determining, by the assistant system, that the second user input has ended based on a completion of the user utterance, for which the assistant xbot may detect silence; setting, by the assistant system, the assistant xbot into an inactive mode, wherein the non-visual feedback is discontinued via the client system while the assistant xbot is in the inactive mode.
The method of Claim 1, wherein the companion device is in a locked state when the wake word is received.
The method of Claim 1 or 2, wherein the assistant xbot is associated with a companion application executing on the companion device.
The method of Claim 3, wherein the companion application is executing as a background application on the companion device, and wherein the companion application is allowed by the companion device to access a computing capacity lower than a threshold capacity; and/ or preferably further comprising: executing the companion application as a foreground application in response to the continuous non-visual feedback; and increasing the computing capacity the companion application is allowed to access a computing capacity greater than a threshold capacity.
The method according to any of the preceding Claims, wherein the client system comprises a smart phone, smart glasses, augmented-reality (AR) glasses, a virtual-reality (VR) headset, or a smart watch.
The method according to any of the preceding Claims, further comprising: providing, via the client system in response to the first user input, an initial non-visual feedback before the continuous non-visual feedback, where the initial non-visual feedback indicates the initiation of the listening mode of the assistant xbot; and/ or preferably wherein the continuous non-visual feedback is based on one or more of sound, vibration, or haptics.
The method according to any of the preceding Claims, further comprising: providing, via the client system, a continuous visual feedback while the assistant xbot is in the listening mode; and/ or preferably wherein the continuous visual feedback is based on one or more of an icon associated with the assistant xbot, a visual indication of the listening mode, or light.
The method of any preceding claim, wherein the continuous visual feedback is based on one or more of an icon associated with the assistant xbot, a visual indication of the listening mode, or light.
One or more computer-readable non-transitory storage media embodying software that when executed on a computing system comprising a client system, a companion device paired to the client system, and an assistant system connected to the client system and the companion device, wherein a companion application, executing on the companion device, is associated with an assistant xbot, executing on the client system, that interacts with the user of the client system including receiving user inputs and presenting outputs to, cause the computing system to execute the steps of: receive, by the assistant system from the client system via the companion application, a first user input comprising a wake word associated with the assistant xbot, wherein the companion device is in a locked state when the wake word is received, wherein the companion application is executing as a background application on the companion device, and wherein the companion application is allowed by the companion device to access a computing capacity lower than a threshold capacity; set, by the assistant system via the companion application, the assistant xbot into a listening mode, wherein a continuous non-visual feedback is provided via the client system while the assistant xbot is in the listening mode, the continuous non-visual feedback being based on one or more of sound, vibration, or haptics; provide, by the assistant system via the companion device, a continuous visual feedback while the assistant xbot is in the listening mode; in response to providing the continuous visual feedback: execute, by the assistant system, the companion application as a foreground application in response to the continuous non-visual feedback; and; increase, by the assistant system, the computing capacity the companion application is allowed to access a computing capacity greater than a threshold capacity without unlocking the companion device to run the assistant stack; receive, by the assistant system from the client system via the companion application, a second user input comprising a user utterance while the assistant xbot is in the listening mode; determine, by the assistant system, the second user input has ended based on a completion of the user utterance, for which the assistant xbot may detect silence; set, by the assistant system, the assistant xbot into an inactive mode, wherein the non-visual feedback is discontinued via the first client system while the assistant xbot is in the inactive mode.
The media of Claim 9, wherein the client system comprises one of a smart phone, smart glasses, augmented-reality (AR) glasses, a virtual-reality (VR) headset, or a smart watch.
The media of Claim 9 or 10 wherein the software is further operable when executed to: provide, via the client system in response to the first user input, an initial non-visual feedback before the continuous non-visual feedback, where the initial non-visual feedback indicates the initiation of the listening mode of the assistant xbot.
A system comprising a client system, a companion device paired to the client system, and an assistant system connected to the client system and the companion device, wherein a companion application, executing on the companion device, is associated with an assistant xbot, executing on the client system, that interacts with the user of the client system including receiving user inputs and presenting outputs having: one or more processors; and a non-transitory memory coupled to the processors comprising instructions that, when executed by the processors, cause the processors to execute the steps of: receive, by the assistant system from the client system via the companion application, a first user input comprising a wake word associated with an assistant xbot, wherein the companion device is in a locked state when the wake word is received, wherein the companion application is executing as a background application on the companion device, and wherein the companion application is allowed by the companion device to access a computing capacity lower than a threshold capacity; set, by the assistant system via the companion application, the assistant xbot into a listening mode, wherein a continuous non-visual feedback is provided via the first client system while the assistant xbot is in the listening mode, the continuous non-visual feedback being based on one or more of sound, vibration, or haptics; provide, by the assistant system via the companion device, a continuous visual feedback while the assistant xbot is in the listening mode; in response to providing the continuous visual feedback: execute, by the assistant system, the companion application as a foreground application in response to the continuous non-visual feedback; and; increase, by the assistant system, the computing capacity the companion application is allowed to access a computing capacity greater than a threshold capacity without unlocking the companion device to run the assistant stack; receive, by the assistant system from the client system via the companion application, a second user input comprising a user utterance while the assistant xbot is in the listening mode; determine, by the assistant system, the second user input has ended based on a completion of the user utterance, for which the assistant xbot may detect silence; set, by the assistant system, the assistant xbot into an inactive mode, wherein the non-visual feedback is discontinued via the client system while the assistant xbot is in the inactive mode.
The system of Claim 12, wherein the client system comprises one of a smart phone, smart glasses, augmented-reality (AR) glasses, a virtual-reality (VR) headset, or a smart watch.
The system of Claim 12 or 13 wherein the processors are further operable when executing the instructions to: provide, via the client system in response to the first user input, an initial non-visual feedback before the continuous non-visual feedback, where the initial non-visual feedback indicates the initiation of the listening mode of the assistant xbot.

Description

TECHNICAL FIELD This disclosure generally relates to databases and file management within network environments, and in particular relates to hardware and software for smart assistant systems. BACKGROUND An assistant system can provide information or services on behalf of a user based on a combination of user input, location awareness, and the ability to access information from a variety of online sources (such as weather conditions, traffic congestion, news, stock prices, user schedules, retail prices, etc.). The user input may include text (e.g., online chat), especially in an instant messaging application or other applications, voice, images, motion, or a combination of them. The assistant system may perform concierge-type services (e.g., making dinner reservations, purchasing event tickets, making travel arrangements) or provide information based on the user input. The assistant system may also perform management or data-handling tasks based on online information and events without user initiation or interaction. Examples of those tasks that may be performed by an assistant system may include schedule management (e.g., sending an alert to a dinner date that a user is running late due to traffic conditions, update schedules for both parties, and change the restaurant reservation time). The assistant system may be enabled by the combination of computing devices, application programming interfaces (APIs), and the proliferation of applications on user devices. US2018308486A1 describes systems and processes for operating an automated assistant. In one example process, an electronic device provides an audio output via a speaker of the electronic device. While providing the audio output, the electronic device receives, via a microphone of the electronic device, a natural language speech input. The electronic device derives a representation of user intent based on the natural language speech input and the audio output, identifies a task based on the derived user intent; and performs the identified task. A social-networking system, which may include a social-networking website, may enable its users (such as persons or organizations) to interact with it and with each other through it. The social-networking system may, with input from a user, create and store in the social-networking system a user profile associated with the user. The user profile may include demographic information, communication-channel information, and information on personal interests of the user. The social-networking system may also, with input from a user, create and store a record of relationships of the user with other users of the social-networking system, as well as provide services (e.g. profile/news feed posts, photo-sharing, event organization, messaging, games, or advertisements) to facilitate social interaction between or among users. The social-networking system may send over one or more networks content or messages related to its services to a mobile or other computing device of a user. A user may also install software applications on a mobile or other computing device of the user for accessing a user profile of the user and other data within the social-networking system. The social-networking system may generate a personalized set of content objects to display to a user, such as a newsfeed of aggregated stories of other users connected to the user. SUMMARY OF PARTICULAR EMBODIMENTS In particular embodiments, the assistant system may assist a user to obtain information or services. The assistant system may enable the user to interact with the assistant system via user inputs of various modalities (e.g., audio, voice, text, image, video, gesture, motion, location, orientation) in stateful and multi-turn conversations to receive assistance from the assistant system. As an example and not by way of limitation, the assistant system may support mono-modal inputs (e.g., only voice inputs), multi-modal inputs (e.g., voice inputs and text inputs), hybrid/multi-modal inputs, or any combination thereof. User inputs provided by a user may be associated with particular assistant-related tasks, and may include, for example, user requests (e.g., verbal requests for information or performance of an action), user interactions with an assistant application associated with the assistant system (e.g., selection of UI elements via touch or gesture), or any other type of suitable user input that may be detected and understood by the assistant system (e.g., user movements detected by the client device of the user). The assistant system may create and store a user profile comprising both personal and contextual information associated with the user. In particular embodiments, the assistant system may analyze the user input using natural-language understanding (NLU). The analysis may be based on the user profile of the user for more personalized and context-aware understanding. The assistant system may resolve entities associated with the user input based