EP-4736001-A1 - SCRIPT EDITOR FOR ROUTINE CREATION

EP4736001A1EP 4736001 A1EP4736001 A1EP 4736001A1EP-4736001-A1

Abstract

A method (400) include receiving a natural language prompt (202) from a user comprising a command to generate a code script (205) for an automated assistant (200) to perform a routine (108) that includes multiple discrete actions (100). The method further includes processing, by a pre-trained large language model (LLM (225)), the natural language prompt to generate the code script as an LLM output, and processing the code script to determine the code script is incomplete, thereby rendering the code script unsuitable for the automated assistant to fulfill performance of the routine. Based on determining the code script is incomplete, the method includes issuing a user prompt soliciting the user to provide additional information needed to complete the code script and receiving user input of the additional information needed to complete the code script. The method includes supplementing the code script with the additional information to render completed code script.

Inventors

GOODMAN, MICHAEL ANDREW
GOYAL, DEEPAK

Assignees

Google LLC

Dates

Publication Date: 20260506
Application Date: 20240618

Claims (20)

1. A computer-implemented method (400) executed on data processing hardware (510) that causes the data processing hardware (510) to perform operations comprising: receiving a natural language prompt (202) from a user comprising a command to generate a code script (205) for an automated assistant (200) to perform a routine (108), the routine (108) comprising multiple discrete actions (110) specified by the natural language prompt (202); processing, by a pre-trained large language model (LLM (225)), the natural language prompt (202) to generate the code script (205) as an LLM output; processing the code script (205) generated by the pre-trained LLM (225) to determine the code script (205) is incomplete, thereby rendering the code script (205) unsuitable for the automated assistant (200) to fulfill performance of the routine (108); based on determining the code script (205) is incomplete: issuing a user prompt soliciting the user to provide additional information needed to complete the code script (205); and receiving user input of the additional information needed to complete the code script (205); and supplementing the code script (205) with the additional information to render completed code script (205), the completed code script (205) rendered suitable for the automated assistant (200) to fulfill performance of the routine (108) when triggered by an initiator for the routine (108).
2. The computer-implemented method (400) of claim 1, wherein the operations further comprise: processing the code script (205) generated by the pre-trained LLM (225) to identify the multiple discrete actions (110) to be performed by the automated assistant (200) to fulfill the routine (108); and presenting, for output in a graphical user interface displayed on a screen of a user device (10) associated with the user, corresponding graphical representations for the multiple discrete actions (110) identified for the automated assistant (200) to perform to fulfill the routine (108).
3. The computer-implemented method (400) of claim 2, wherein selection of a corresponding one of the graphical representations presented for display in the graphical user interface causes the GUI to present options for editing the discrete action corresponding to the graphical representation selected by the user.
4. The computer-implemented method (400) of any of claims 1-3, wherein: processing the code script (205) generated by the pre-trained LLM (225) to determine the code script (205) is incomplete comprises identifying a presence of an ambiguity in the code script (205); and issuing the user prompt comprises issuing the user prompt to solicit the user to provide additional information to resolve the ambiguity identified in the code script (205).
5. The computer-implemented method (400) of any of claims 1-4, wherein: processing the code script (205) generated by the pre-trained LLM (225) to determine the code script (205) is incomplete comprises determining that the code script (205) includes a slot that lacks a fixed value; and issuing the user prompt comprises issuing the user prompt to solicit the user to provide additional information that includes the fixed value for the slot in the code script (205) that lacked the fixed value.
6. The computer-implemented method (400) of any of claims 1-5, wherein: processing the code script (205) generated by the pre-trained LLM (225) to determine the code script (205) is incomplete comprises determining that the code script (205) fails to convey the initiator for the routine (108); and issuing the user prompt comprises issuing the user prompt to solicit the user to provide additional information that includes the initiator for the routine (108).
7. The computer-implemented method (400) of any of claims 1-6, wherein issuing the user prompt comprises: generating, using a text-to-speech engine, synthesized speech of an utterance requesting the user to provide the additional information; and engaging a dialog with the user via the user device (10) by instructing the user device (10) to audibly output the synthesized speech of the utterance requesting the user to provide the additional information.
8. The computer-implemented method (400) of claim 7, wherein receiving the user input of the additional information comprises receiving audio data characterizing the additional information spoken by the user.
9. The computer-implemented method (400) of any of claims 1-8, wherein issuing the user prompt comprises generating a graphical representation for output from a user device (10) associated with the user that solicits the user to provide the additional information.
10. The computer-implemented method (400) of any of claims 1-9, wherein receiving the natural language prompt (202) comprises receiving audio data (106) characterizing an utterance (104) of the natural language prompt (202) spoken by the user.
11. The computer-implemented method (400) of any of claims 1-10, wherein the operations further comprise: obtaining a set of user features (290) associated with the user; and determining, using the set of user features (290) associated with the user, a user prompt embedding (294) for the user, wherein processing the natural language prompt (202) to generate the code script (205) as the LLM output comprises processing, by the LLM (225), the natural language prompt (202) conditioned on the user prompt embedding (294) for the user to generate the code script (205) as the LLM output.
12. The computer-implemented method (400) of claim 11, wherein the set of user features (290) associated with the user comprises at least one of: user preferences; past interactions between the user and the automated assistant (200); or available peripheral devices associated with the user and capable of receiving commands from the automated assistant (200) to perform actions (110).
13. The computer-implemented method (400) of claim 11, wherein the user prompt embedding (294) comprises a soft prompt configured to guide the LLM (225) to generate the code script (205) while parameters of the LLM (225) are held fixed.
14. The computer-implemented method (400) of any of claims 1-13, wherein a training process (300) trains the LLM (225) by: receiving a training dataset (320) of training routines (330), each training routine (330) comprising: a corresponding training natural language prompt (202) specifying one or more discrete actions (110) for an automated assistant routine; and corresponding ground-truth code script (205) paired with the corresponding training natural language prompt (202); for each training routine (330) in the training dataset (320): processing, using the LLM (225), the corresponding training natural language prompt (202) to generate a corresponding predicted code script (205) for the routine (108) as output from the LLM (225); and determining a training loss (352) based on the corresponding predicted code script (205) and the corresponding ground-truth code script (205) paired with the corresponding training natural language prompt (202); and training the LLM (225) to learn how to predict the ground-truth code scripts (205) from the corresponding training natural language prompts (202) based on the training losses (352) determined for the training routines (330) in the training dataset (320).
15. The computer-implemented method (400) of claim 14, wherein: each training routine (330) further comprises a corresponding training user prompt embedding (294) generated from a corresponding set of training user features (290); and processing the corresponding training natural language prompt (202) comprises processing, using the LLM (225), the corresponding training natural language prompt (202) conditioned on the corresponding training user prompt embedding (294) to generate the corresponding predicted code script (205) for the routine (108) as output from the LLM (225).
16. A system (100) comprising: data processing hardware (510); and memory hardware (520) in communication with the data processing hardware (510), the memory hardware (520) storing instructions that when executed on the data processing hardware (510) cause the data processing hardware (510) to perform operations comprising: receiving a natural language prompt (202) from a user comprising a command to generate a code script (205) for an automated assistant (200) to perform a routine (108), the routine (108) comprising multiple discrete actions (110) specified by the natural language prompt (202); processing, by a pre-trained large language model (LLM (225)), the natural language prompt (202) to generate the code script (205) as an LLM output; processing the code script (205) generated by the pre-trained LLM (225) to determine the code script (205) is incomplete, thereby rendering the code script (205) unsuitable for the automated assistant (200) to fulfill performance of the routine (108); based on determining the code script (205) is incomplete: issuing a user prompt soliciting the user to provide additional information needed to complete the code script (205); and receiving user input of the additional information needed to complete the code script (205); and supplementing the code script (205) with the additional information to render completed code script (205), the completed code script (205) rendered suitable for the automated assistant (200) to fulfill performance of the routine (108) when triggered by an initiator for the routine (108).
17. The system (100) of claim 16, wherein the operations further comprise: processing the code script (205) generated by the pre-trained LLM (225) to identify the multiple discrete actions (110) to be performed by the automated assistant (200) to fulfill the routine (108); and presenting, for output in a graphical user interface displayed on a screen of a user device (10) associated with the user, corresponding graphical representations for the multiple discrete actions (110) identified for the automated assistant (200) to perform to fulfill the routine (108).
18. The system (100) of claim 17, wherein selection of a corresponding one of the graphical representations presented for display in the graphical user interface causes the GUI to present options for editing the discrete action corresponding to the graphical representation selected by the user.
19. The system (100) of any of claims 16-18, wherein: processing the code script (205) generated by the pre-trained LLM (225) to determine the code script (205) is incomplete comprises identifying a presence of an ambiguity in the code script (205); and issuing the user prompt comprises issuing the user prompt to solicit the user to provide additional information to resolve the ambiguity identified in the code script (205).
20. The system (100) of any of claims 16-19, wherein: processing the code script (205) generated by the pre-trained LLM (225) to determine the code script (205) is incomplete comprises determining that the code script (205) includes a slot that lacks a fixed value; and issuing the user prompt comprises issuing the user prompt to solicit the user to provide additional information that includes the fixed value for the slot in the code script (205) that lacked the fixed value.

Description

Script Editor for Routine Creation TECHNICAL FIELD [0001] This disclosure relates to a script editor for routine creation. BACKGROUND [0002] Automated assistants (also known as “personal assistant modules”, “mobile assistants”, “digital assistants”, or “chat bots”) may be interacted with by a user via a variety of computing devices, such as smart phones, tablet computers, wearable devices, automobile systems, standalone personal assistant devices, and so forth. The automated assistants receive input from the user (e.g., typed and/or spoken natural language input) and respond with responsive content (e.g., visual and/or audible natural language output). [0003] Automated assistants can perform a routine of multiple actions in response to, for example, receiving a particular command (e.g., a shortcut command). For example, in response to receiving a spoken utterance of “Good Night”, an automated assistant can cause a sequence of actions to be performed, such as causing networked lights to be turned off, tomorrow’s weather forecast to be rendered, and a user’s agenda for tomorrow to be rendered. An automated assistant routine can be particularized to a user and/or to an ecosystem of client devices, and the user can be provided control to manually add certain actions to certain routines. [0004] Large language models (LLMs) that generate text in response to a user input are becoming increasingly popular as generative artificial intelligence (Al) grows in popularity. Certain LLMs are pre-trained to generate code recommendations from contextual information SUMMARY [0005] One aspect of the disclosure provides a computer-implemented method that when executed on data processing hardware causes the data processing hardware to perform operations that include receiving a natural language prompt from a user comprising a command to generate a code script for an automated assistant to perform a routine. The routine includes multiple discrete actions specified by the natural language prompt. The operations further include processing, by a pre-trained large language model (LLM), the natural language prompt to generate the code script as an LLM output, and processing the code script generated by the pre-trained LLM to determine the code script is incomplete, thereby rendering the code script unsuitable for the automated assistant to fulfill performance of the routine. Based on determining the code script is incomplete, the operations include issuing a user prompt soliciting the user to provide additional information needed to complete the code script and receiving user input of the additional information needed to complete the code script. The operations also include supplementing the code script with the additional information to render completed code script. The completed code script is rendered suitable for the automated assistant to fulfill performance of the routine when triggered by an initiator for the routine. [0006] In some implementations, the operations also include processing the code script generated by the pre-trained LLM to identify the multiple discrete actions to be performed by the automated assistant to fulfill the routine and presenting, for output in a graphical user interface displayed on a screen of a user device associated with the user, corresponding graphical representations for the multiple discrete actions identified for the automated assistant to perform to fulfill the routine. Here, selection of a corresponding one of the graphical representations presented for display in the graphical user interface may cause the GUI to present options for editing the discrete action corresponding to the graphical representation selected by the user. [0007] In some examples, processing the code script generated by the pre-trained LLM to determine the code script is incomplete includes identifying a presence of an ambiguity in the code script and issuing the user prompt includes issuing the user prompt to solicit the user to provide additional information to resolve the ambiguity identified in the code script. Additionally or alternatively, processing the code script generated by the pre-trained LLM to determine the code script is incomplete may include determining that the code script includes a slot that lacks a fixed value and issuing the user prompt may include issuing the user prompt to solicit the user to provide additional information that includes the fixed value for the slot in the code script that lacked the fixed value. In yet an addition example, processing the code script generated by the pre-trained LLM to determine the code script is incomplete includes determining that the code script fails to convey the initiator for the routine and issuing the user prompt includes issuing the user prompt to solicit the user to provide additional information that includes the initiator for the routine. [0008] In some implementations, issuing the user prompt includes generating, using a text-to-speech engine, s