EP-4742084-A1 - EFFICIENT ALIGNMENT OF GENERATIVE RESPONSE(S) TO NATURAL LANGUAGE INPUT(S)

EP4742084A1EP 4742084 A1EP4742084 A1EP 4742084A1EP-4742084-A1

Abstract

Implementations relate to receiving natural language (NL) input associated with a client device; and generating response(s) to the NL input. Generating the response(s) includes: determining, based on the NL input, a base prompt; processing, using a first generative model (GM), first GM input to generate corresponding first GM output, the first GM input including the base prompt; determining, based on the first GM output, a plurality of extended prompts; obtaining, based on filtering the plurality of extended prompts, a subset of extended prompts; for each extended prompt of the subset: processing, using the first GM or a second GM, second GM input to generate corresponding second GM output, the second GM input including the respective extended prompt, and determining, based on the second GM output, a respective candidate response corresponding to the respective extended prompt; and obtaining, based on filtering the candidate response(s), the response(s) to the NL input.

Inventors

WEISZ, Ágoston
Petrovski, Igor
AKERLUND, OSCAR
GUPTA, KHYATTI
SLUZHAEV, EVGENY
WANG, OLIVER
BALDRIDGE, JASON
URIA, Benigno
GLADCHENKO, Evgeny

Assignees

GOOGLE LLC

Dates

Publication Date: 20260513
Application Date: 20251023

Claims (15)

A method implemented by one or more processors, the method comprising: receiving natural language (NL) input associated with a client device; and generating one or more responses that are responsive to the NL input, wherein generating the one or more responses that are responsive to the NL input comprises: determining, based on the NL input, a base prompt; processing, using a first generative model (GM), first GM input to generate corresponding first GM output, the first GM input comprising the base prompt; determining, based on the corresponding first GM output, a plurality of extended prompts; obtaining, based on filtering the plurality of extended prompts, a subset of extended prompts; for each extended prompt of the subset of extended prompts: processing, using the first GM or a second GM, second GM input to generate corresponding second GM output, the second GM input comprising the respective extended prompt, and determining, based on the corresponding second GM output, a respective candidate response corresponding to the respective extended prompt; and obtaining, based on filtering the one or more candidate responses, the one or more responses that are responsive to the NL input.
The method of claim 1, further comprising: causing the client device to render at least one of the one or more responses that are responsive to the NL input, and optionally: wherein the one or more responses rendered by the client device are primary responses and the method further comprises: causing the client device to cache, in a local memory of the client device, one or more secondary responses, wherein the secondary responses comprise at least one of the one or more responses that are responsive to the NL input other than the primary responses; and responsive to an input from a user of the client device, causing the client device to further render at least one of the secondary responses.
The method of claim 1 or claim 2, wherein the first GM is a first large language model (LLM).
The method of any one of the preceding claims, wherein the second GM is an image generation model, and wherein each of the one or more candidate responses is an image.
The method of any one of claims 1 to 3, wherein: the second GM is a video generation model, and wherein each of the one or more candidate responses is a portion of video data; or the second GM is an audio generation model, and wherein each of the one or more candidate responses is a portion of audio data; or the second GM is a second LLM, and wherein each of the one or more candidate responses is a portion of text data.
The method of any one of the preceding claims, wherein the first GM and the second GM are components of an end-to-end GM.
The method of any one of the preceding claims, wherein the plurality of extended prompts includes N 1 extended prompts, wherein N 1 is a positive integer.
The method of claim 7, wherein N 1 is a fixed integer, or wherein N 1 is a dynamic integer that is based on one or more of: a token limit for the first GM, a temporal constraint for the first GM, and/or a computational constraint for the first GM.
The method of any one of the preceding claims, wherein determining the base prompt comprises: processing, using the first GM, the second GM, or a third GM, third GM input to generate corresponding third GM output, the third GM input comprising the NL input; and determining, based on the corresponding third GM output, the base prompt.
The method of claim 9, wherein the third GM is a third LLM, and/or wherein the third GM is a component of an end-to-end GM which also comprises the first GM and/or the second GM.
The method of any one of the preceding claims, further comprising: filtering, using a first evaluation model, the plurality of extended prompts, wherein filtering the plurality of extended prompts comprises: for each extended prompt of the plurality of extended prompts: processing, using the first evaluation model, first evaluation input to generate corresponding first evaluation output, the first evaluation input comprising the respective extended prompt and the base prompt, and determining, based on the corresponding first evaluation output, a respective entailment score corresponding to the respective extended prompt; and determining, based on the respective entailment score, whether the respective extended prompt is to be included in the subset of extended prompts, wherein optionally: the first evaluation model is a component of an end-to-end GM which also comprises the first GM and/or the second GM.
The method of claim 11, wherein determining whether the respective extended prompt is to be included in the subset of extended prompts comprises: comparing the respective entailment score to a threshold entailment score; responsive to the respective entailment score being greater than or equal to the threshold entailment score, determining that the respective extended prompt is to be included in the subset of extended prompts; and responsive to the respective entailment score being less than the threshold entailment score, determining that the respective extended prompt is not to be included in the subset of extended prompts, and/or wherein determining whether the respective extended prompt is to be included in the subset of extended prompts comprises: for each extended prompt of the plurality of extended prompts: ranking the respective extended prompt based on the respective entailment score corresponding to the respective extended prompt; and determining that the N 2 highest ranking extended prompts of the plurality of extended prompts are to be included in the subset of extended prompts, wherein N 2 is a positive fixed integer.
The method of any one of the preceding claims, further comprising: filtering, using a second evaluation model, the one or more candidate responses, wherein filtering the one or more candidate responses comprises: for each candidate response of the one or more candidate responses: processing, using the second evaluation model, second evaluation input to generate corresponding second evaluation output, the second evaluation input comprising the respective candidate response as well as the base prompt and/or the respective extended prompt which corresponds to the respective candidate response, and determining, based on the corresponding second evaluation output, a respective alignment score corresponding to the respective candidate response; and determining, based on the respective alignment score, whether the respective candidate response is to be included in the one or more responses that are responsive to the NL input.
A system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the at least one processor to be operable to: receive natural language (NL) input associated with a client device; and generate one or more responses that are responsive to the NL input, wherein the instructions to generate the one or more responses that are responsive to the NL input comprise instructions to: determine, based on the NL input, a base prompt; process, using a first generative model (GM), first GM input to generate corresponding first GM output, the first GM input comprising the base prompt; determine, based on the corresponding first GM output, a plurality of extended prompts; obtain, based on filtering the plurality of extended prompts, a subset of extended prompts; for each extended prompt of the subset of extended prompts: process, using the first GM or a second GM, second GM input to generate corresponding second GM output, the second GM input comprising the respective extended prompt, and determine, based on the corresponding second GM output, a respective candidate response corresponding to the respective extended prompt; and obtain, based on filtering the one or more candidate responses, the one or more responses that are responsive to the NL input.
A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to be operable to: receive natural language (NL) input associated with a client device; and generate one or more responses that are responsive to the NL input, wherein, in generating the one or more responses that are responsive to the NL input, the at least one processor is operable to: determine, based on the NL input, a base prompt; process, using a first generative model (GM), first GM input to generate corresponding first GM output, the first GM input comprising the base prompt; determine, based on the corresponding first GM output, a plurality of extended prompts; obtain, based on filtering the plurality of extended prompts, a subset of extended prompts; for each extended prompt of the subset of extended prompts: process, using the first GM or a second GM, second GM input to generate corresponding second GM output, the second GM input comprising the respective extended prompt, and determine, based on the corresponding second GM output, a respective candidate response corresponding to the respective extended prompt; and obtain, based on filtering the one or more candidate responses, the one or more responses that are responsive to the NL input.

Description

BACKGROUND Various generative model(s) (GM(s)) have been proposed that can be used to process natural language (NL) content and/or other input(s), to generate output that reflects generative content that is responsive to the input(s). For example, large language models (LLM(s)) have been developed that can be used to process NL content and/or other input(s), to generate LLM output that reflects generative NL content and/or other generative content that is responsive to the input(s). As another example, image generation models have been developed that can be used to process NL content and/or other input(s), to generate visual outputs such as image data that is responsive to the input(s). In some instances, GM(s) can be used to process NL input that is associated with a client device in order to generate response(s) that are responsive to the NL input (which, for example, could be rendered at the client device). However, the quality of these response(s) can be affected by the quality of the GM(s) used in generating them and the quality of the underlying training data used to train these GM(s). GM(s) are typically trained on enormous amounts of diverse data including data from, but not limited to, webpages, electronic books, software code, electronic news articles, and machine translation data. Accordingly, these GM(s) leverage the underlying data on which they were trained in performing various NL processing (NLP) tasks. Characteristics (e.g., accuracy, breadth, and/or quantity) of training data for GM(s) can lead to trained GM(s) which are unable to reliably provide high-quality response(s) to some kinds of NL input. For these and other reasons, it can be desirable to generate response(s) using techniques which maximize the ability of trained GM(s) to provide accurate, high-quality response(s) which are aligned with (e.g., successfully responsive to) NL input in a computationally efficient manner. SUMMARY Implementations described herein relate to efficient generation of response(s) to natural language (NL) input which are aligned with (e.g., successfully responsive to) the NL input. More particularly, but not exclusively, according to the techniques described herein, generative model(s) (GM(s)) can be used in determining generative response(s) to NL input (e.g., including a user query for completion of a generative task), and these generative response(s) can be determined accurately and efficiently (e.g., with respect to computational and network resources). Processor(s) of a system can: receive NL input associated with a client device; and generate one or more responses that are responsive to the NL input. In other words, the system can be configured to receive an input including NL input (referred to herein as "free form NL input" interchangeably), e.g., from a user of the client device. The NL input may request completion of one or more generative tasks. The system can further be configured to generate one or more responses (e.g., one, two, four, eight or any fixed or variable number of responses) which are responsive to the NL input, for example by utilizing one or more GMs. As a specific example, the free form NL input may be a user query requesting a generative task of "Design a battery cell suitable for use in an electric vehicle", which could be received at a client device of the user. In this specific example, the system may utilize one or more GMs to generate one or more images (i.e., one or more responses) which illustrate possible battery cell designs (e.g., illustrating structure(s), dimension(s), material(s), etc.) suitable for use in an electric vehicle. In generating the one or more responses that are responsive to the NL input, the processor(s) can further: determine, based on the NL input, a base prompt. In some scenarios, the NL input may include parameter(s) and/or variable(s) for one or more generative tasks, but may also contain other information which is not necessary or desirable for efficient processing of the NL input and/or efficient completion of the generative task(s). The base prompt may preserve these core parameter(s) and/or variable(s) for the generative task(s), whilst removing the extraneous or otherwise unnecessary parts of the NL input. Returning to the above example, where the NL input is to "Design a battery cell suitable for use in an electric vehicle", the base prompt may be a condensed (e.g., shortened) form of the NL input such as "Battery cell design for electric vehicle", or "Electric vehicle battery cell". Put another way, the base prompt may preserve at least the core parameters specifying that the generative task relates to designing a "battery cell" for an "electric vehicle", but may remove other aspects of the NL input, e.g., "...suitable for..." which are not necessary or desirable for efficiently processing the prompt using one or more GMs. It will be appreciated that the base prompt may take other forms which retain core parameters. In additional or alter