US-20260127232-A1 - MERGING GENERATIVE MODEL PROMPTS BASED ON CONTEXT

US20260127232A1US 20260127232 A1US20260127232 A1US 20260127232A1US-20260127232-A1

Abstract

Implementations are described herein for accounting for preferences and/or attributes of multiple users and/or computing devices in a context that is shared between the multiple users and/or multiple computing devices and a generative model-powered automated assistant. Data indicative of preferences and/or attributes of user(s) and/or their computing device(s) can be assembled into “merged” input prompts, e.g., along with a natural language query issued by one of the users. The merged input prompts may then be processed using generative model(s) to generate output that is conditioned on the preferences and/or attributes of the user(s) and/or their computing device(s).

Inventors

Matthew Sharifi
Victor Carbune

Assignees

GOOGLE LLC

Dates

Publication Date: 20260507
Application Date: 20241101

Claims (20)

1 . A method implemented using one or more processors, comprising: receiving a natural language query; determining, based on one or more signals provided by one or more computing devices, that a first user is in a shared context with at least a second user; determining which user, of the first user and the second user, provided the natural language query; determining a first user prompt for the first user and a second user prompt for the second user, wherein the first user prompt conveys one or more known preferences of the first user and the second user prompt conveys one or more known preferences of the second user; determining, based on determining which user, of the first user and the second user, provided the natural language query, one or more weights for the first user prompt and/or the second user prompt; assembling, based on the one or more of the weights for the first user prompt and/or the second user prompt, into a merged input prompt, data indicative of: the natural language query, the first user prompt, and the second user prompt; processing the merged input prompt using one or more generative models to generate output that is conditioned on the first and second user prompts, and that includes content responsive to the natural language query; and causing the content to be rendered at one or more output devices.
2 . The method of claim 1 , wherein the shared context comprises a shared physical environment.
3 . The method of claim 2 , wherein the one or more signals comprise a wireless signal generated by a mobile device carried by the first or second user.
4 . The method of claim 2 , wherein the one or more signals comprise contemporaneous detection of one or more biometrics of the first user and one or biometrics of the second user.
5 . The method of claim 1 , wherein the shared context comprises a multi-participant message exchange thread in which the first and second users are participants.
6 . The method of claim 5 , wherein the multi-participant message exchange thread comprises a text messaging thread.
7 . The method of claim 1 , wherein the first user prompt comprises one or more natural language statements that convey one or more of the known preferences of the first user.
8 . The method of claim 1 , further comprising: retrieving one or more digital files created or interacted with by the first user; assembling, into a user preference generation prompt, data indicative of or derived from the one or more digital files; and processing the user preference generation prompt using one or more of the generative models to generate data indicative of the first user prompt.
9 . The method of claim 8 , wherein one or more of the digital files comprises a digital image, digital audio, or digital video.
10 . The method of claim 1 , further comprising: assembling, into a user preference generation prompt, data indicative of or derived from one or more past natural language queries issued by the first user; and processing the user preference generation prompt using one or more of the generative models to generate data indicative of the first user prompt.
11 . The method of claim 1 , further comprising: assembling, into a user preference generation prompt, data indicative of or derived from one or more past search engine queries issued by the first user; and processing the user preference generation prompt using one or more of the generative models to generate data indicative of the first user prompt.
12 . The method of claim 1 , further comprising: determining one or more device prompts for one or more computing devices available in the shared context, wherein the one or more device prompts convey one or more attributes of the one or more computing devices available in the shared context; and assembling, into the merged input prompt, data indicative of the one or more device prompts.
13 . The method of claim 12 , wherein the one or more attributes comprise one or more of: one or more preferences for operating one or more of the computing devices available in the shared context to render content; one or more states of one or more sensors of one or more of the computing devices available in the shared context; or one or more resource constraints of one or more of the computing devices available in the shared context.
14 . The method of claim 1 , further comprising determining respective weights for the first and second user prompts, wherein the assembling is based on the respective weights.
15 . The method of claim 14 , wherein the respective weights for the first and second user prompts are determined based on relative proximities of the first and second users to a shared audio or vision sensor.
16 . The method of claim 15 , wherein the respective weights for the first and second user prompts are determined based on which of the first or second user issued the natural language query.
17 . The method of claim 16 , wherein the assembling comprises allocating different numbers of tokens to each of the first and second user prompts based on the respective weights.
18 . The method of claim 17 , wherein the assembling comprises: assembling, into a summarization input prompt, data indicative of the first user prompt and a target length constraint, wherein the target length constraint is selected based on one or more of the respective weights for the first and second user input prompts; and processing the summarization input prompt using one or more of the generative models to generate a summary of the first user prompt that satisfies the target length constraint.
19 . The method of claim 14 , further comprising assembling, into the merged input prompt, data indicative of relative priorities to be assigned to known preferences conveyed in the first and second user prompts, wherein the relative priorities are determined based on the respective weights.
20 . The method of claim 1 , wherein the assembling comprises: assembling, as a prompt merging input prompt, data indicative of: the first and second user prompts, and a request to combine the first and second user prompts into the merged input prompt while resolving any conflicts between the first and second user prompts; and processing the prompt merging input prompt using one or more of the generative models to generate at least a portion of the merged input prompt.

Description

BACKGROUND Generative models such as unimodal or multimodal large language models (LLMs) can be used to process sequences of input tokens to generate sequences of output tokens. Generative models are applicable across a wide range of tasks. For example, generative models are increasingly being used to power automated assistants (also referred to as “virtual assistants” or “chatbots”), which enable humans (which are referred to as “users” when interacting with automated assistants) to participate in natural language dialogs with automated assistants. In many instances, an automated assistant powered by generative model(s) may act as a participant in context that is shared among multiple users. A shared context may be, for instance, a shared physical environment where co-present users can interact with a shared assistant device, a shared virtual environment such as a message exchange thread and/or video conference call, etc. SUMMARY Implementations described herein relate to accounting for preferences and/or attributes of multiple users and/or computing devices in a context that is shared between the multiple users and a generative model. More particularly, but not exclusively, implementations are described herein for assembling data indicative of preferences and/or attributes of user(s) and/or their computing device(s) into “merged” input prompts, e.g., along with a natural language query issued by one of the users. The merged input prompts may then be processed using generative model(s) to generate output that is conditioned on the preferences and/or attributes of the user(s) and/or their computing device(s). In various implementations, what will be referred to herein as “user prompts” may be determined/formulated for one or more users in a shared context. These user prompts may be used to condition generative model(s) to generate output that accounts for the user(s) preference(s) and/or attribute(s). For example, preferences and/or attributes of a user may, with user consent, be inferred from various electronic sources, such as explicit statement(s) from the user, past queries submitted to automated assistant(s), past search engine queries, digital files created or interacted with by the user, electronic correspondence (e.g., emails, text messages) sent and/or received by the user, social media posts of the user, web browsing history, past online bookings, past travel trajectories, etc. These preferences and/or attributes may be used to formulate a user prompt of the user. In some implementations, user prompts may be formulated as natural language statements, such as “I like jazz but let's avoid rock style” or “I like Chinese cuisine but try to eat vegetarian if at all possible, and I prefer public transit over driving or walking” In other implementations, user prompts may be formulated in other ways such as structured text (e.g., XML, JSON, etc.). In some implementations, a generative model such as an LLM or similar may be used to process data obtained from the various electronic sources to formulate a single user prompt that summarizes the user's preferences and/or attributes. Similarly, in various implementations, what will be referred to herein as “device prompts” may be determined/formulated for one or more computing devices that are operated by one or more users in a shared context. These device prompts may include various attributes of the computing devices, such as user preferences for how the devices are used (e.g., “I prefer not to use this device for video playback”), one or more capabilities and/or states of the device (e.g., display or no display, muted or unmuted, volume level, amount of memory, display size, etc.), position coordinates of the device, and so forth. Like the user prompts, these device prompts may be formulated in some implementations as natural language, such as “I prefer not to use this device for video playback” or “this device is currently muted.” In other implementations, device prompts may be formulated in other ways such as structured text (e.g., XML, JSON, etc.). In some implementations, a generative model such as an LLM or similar may be used to process data obtained from various electronic sources (e.g., the device itself) to formulate a single device prompt that summarizes the device's attributes. In some implementations, device prompts can be inferred for various devices. For example, a device prompt for a particular device can be inferred based on the usage history of the particular device, such as which content a user typically consumes via that particular device and/or other details about the device and/or the content that the user typically consumes via that particular device. Data indicative of one or more user prompt(s) and/or one or more device prompt(s) may be assembled into what will be referred to as a “merged input prompt,” e.g., along with various other data. This other data may include, for instance, a natural language query issued by a user to an autom