US-12619603-B2 - Generating a distilled generative response engine trained on distillation data generated with a language model program

US12619603B2US 12619603 B2US12619603 B2US 12619603B2US-12619603-B2

Abstract

The present technology provides a distilled generative response engine for generating responses to Internet search queries. The present technology utilizes a language model program that is made up of a collection of conditional dependencies that branch into many different sequences of prompts that are configured to transform answers to different types of search queries. This design of the language model program facilitates rapid iteration to improve responses. The language model program is used to generate distillation data that is used to train the distilled generative response engine.

Inventors

Alex Tachard Passos
Michael Janner

Assignees

OpenAi OPCo, LLC.

Dates

Publication Date: 20260505
Application Date: 20240531

Claims (20)

1 . A method comprising: selecting sources from search results resulting from a search query; writing headers to be used by a generative response engine when generating a final response to the search query; evaluating conditions in a language model program; selecting at least one guiding prompt associated with a condition among the conditions in the language model program when the condition is relevant to the search query or selected sources; providing, to the generative response engine, a prompt that is made up of: the search query and the sources from the search results that are responsive to the search query, the at least one guiding prompt from the language model program, wherein the at least one guiding prompt is configured to guide the generative response engine to present the final response to the search query including the headers; receiving the final response from the generative response engine, wherein the generative response engine was guided to generate the final response by the at least one guiding prompt from the language model program; and performing a language model distillation by training the generative response engine to produce the final response from the prompt without the benefit of the at least one guiding prompt from the language model program, wherein the final response is used as distillation data, wherein the result of the language model distillation is a distilled generative response engine that generates responses to search queries.
2 . The method of claim 1 , wherein the language model program provides a tree of prompts and based on an initial response from the generative response engine using one of the tree of prompts from the language model program as the guiding prompt.
3 . The method of claim 1 , further comprising: determining that the language model program is complete when the language model program does not contain additional relevant guiding prompts to provide to the generative response engine.
4 . The method of claim 1 , wherein the language model program is configured to iterate through nodes in a collection of conditional transformations to determine if the condition applies, and when the condition applies, applying respective conditional transformation relevant to the condition, wherein the respective conditional transformation is used to create the guiding prompt.
5 . The method of claim 4 , wherein the language model program completes when transformations of the relevant conditions have been applied.
6 . The method of claim 1 , further comprising: receiving the search query; determining whether the search query can be improved; revising the search query when it is determined that the search query can be improved to become the search query.
7 . The method of claim 6 , further comprising: obtaining the search results from a search index; providing the search results and the search query to the distilled generative response engine; and receiving a search response that is responsive to the search query, wherein the search response is generated from the search results and the search index, wherein the search response includes at least one citation to the search results.
8 . The method of claim 7 , wherein the search response is generated by the distilled generative response engine.
9 . A computing system comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the computing system to: selecting sources from search results resulting from a search query: writing headers to be used by a generative response engine when generating a final response to the search query; evaluating conditions in a language model program; selecting at least one guiding prompt associated with a condition among the conditions in the language model program when the condition is relevant to the search query or selected sources; provide, to the generative response engine, a prompt that is made up of: the search query and the sources from the search results that are responsive to the search query, the at least one guiding prompt from the language model program, wherein the at least one guiding prompt is configured to guide the generative response engine to present the final response to the search query including the headers; receive the final response from the generative response engine, wherein the generative response engine was guided to generate the final response by the at least one guiding prompt from the language model program; and perform a language model distillation by training the generative response engine to produce the final response from the prompt without the benefit of the at least one guiding prompt from the language model program, wherein the final response is used as distillation data, wherein the result of the language model distillation is a distilled generative response engine that generates responses to search queries.
10 . The computing system of claim 9 , wherein the language model program is organized into a collection of conditional transformations.
11 . The computing system of claim 10 , wherein the language model program is configured to iterate through nodes in the collection of conditional transformations to determine if the condition applies, and when the condition applies, apply respective conditional transformation relevant to the condition.
12 . The computing system of claim 11 , wherein the language model program completes when transformations of the relevant conditions have been applied.
13 . The computing system of claim 9 , wherein the instructions further configure the computing system to: receive the search query; determine whether the search query can be improved; revise the search query when it is determined that the search query can be improved to become the search query.
14 . The computing system of claim 13 , wherein the instructions further configure the computing system to: obtain the search results from a search index; provide the search results and the search query to the distilled generative response engine; and receive a search response that is responsive to the search query, wherein the search response is generated from the search results and the search index, wherein the search response includes at least one citation to the search results.
15 . The computing system of claim 14 , wherein the search response is generated by the distilled generative response engine.
16 . A non-transitory computer-readable storage medium comprising instructions stored thereon that when executed by at least one processor, cause the at least one processor to: selecting sources from search results resulting from a search query; writing headers to be used by a generative response engine when generating a final response to the search query; evaluating conditions in a language model program; selecting at least one guiding prompt associated with a condition among the conditions in the language model program when the condition is relevant to the search query or selected sources; provide, to the generative response engine, a prompt that is made up of: the search query and the sources from the search results that are responsive to the search query, the at least one guiding prompt from the language model program, wherein the at least one guiding prompt is configured to guide the generative response engine to present the final response to the search query including the headers; receive the final response from the generative response engine, wherein the generative response engine was guided to generate the final response by the at least one guiding prompt from the language model program; and perform a language model distillation by training the generative response engine to produce the final response from the prompt without the benefit of the at least one guiding prompt from the language model program, wherein the final response is used as distillation data, wherein the result of the language model distillation is a distilled generative response engine that generates responses to search queries.
17 . The computer-readable storage medium of claim 16 , wherein the instructions further configure the at least one processor to: receive the search query; determine whether the search query can be improved; revise the search query when it is determined that the search query can be improved to become the search query.
18 . The computer-readable storage medium of claim 17 , wherein the instructions further configure the at least one processor to: obtain the search results from a search index; provide the search results and the search query to the distilled generative response engine; and receive a search response that is responsive to the search query, wherein the search response is generated from the search results and the search index, wherein the search response includes at least one citation to the search results.
19 . The computer-readable storage medium of claim 16 , wherein the language model program is organized into a collection of conditional transformations.
20 . The computer-readable storage medium of claim 19 , wherein the language model program is configured to iterate through nodes in the collection of conditional transformations to determine if the condition applies, and when the condition applies, apply respective conditional transformation relevant to the condition.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of priority to U.S. provisional application No. 63/645,386, filed on May 10, 2024, entitled GENERATING A DISTILLED GENERATIVE RESPONSE ENGINE TRAINED ON DISTILLATION DATA GENERATED WITH A LANGUAGE MODEL, which is expressly incorporated by reference herein in its entirety. BACKGROUND Generative response engines such as large language models represent a significant milestone in the field of artificial intelligence, revolutionizing computer-based natural language understanding and generation. Generative response engines, powered by advanced deep learning techniques, have demonstrated astonishing capabilities in tasks such as text generation, translation, summarization, and even code generation. Generative response engines can sift through vast amounts of text data, extract context, and provide coherent responses to a wide array of queries. One task to which generative response engines have recently been applied is to interface with a search engine to summarize search results. This is a powerful use case because it can potentially help a user avoid having to review numerous search results to find the information that the user is ultimately seeking. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS Details of one or more embodiments of the subject matter described in this disclosure are set forth in the accompanying drawings and the description below. However, the accompanying drawings illustrate only some typical embodiments of this disclosure and are therefore not to be considered limiting of its scope. Other features, embodiments, and advantages will become apparent from the description, the drawings, and the claims. FIG. 1 is a block diagram illustrating an exemplary machine learning platform for implementing various embodiments of this disclosure in accordance with some embodiments of the present technology. FIG. 2 is a block diagram illustrating an example machine learning platform for interfacing with a search index and generating a distilled generative response engine in accordance with some embodiments of the present technology. FIG. 3 illustrates an example routine for generating a distilled generative response engine using a language model program to generate distillation data in accordance with some embodiments of the present technology. FIG. 4 illustrates a conceptual outline of a language model program for using a generative response engine to generate responses to search queries and transformations to the response as a result of prompts provided by the language model program in accordance with some embodiments of the present technology. FIG. 5A and FIG. 5B illustrate an example of a language model program that is organized into a tree of conditional transformations in accordance with some embodiments of the present technology. FIG. 5B illustrates example prompts from a language model program to cause response transformations in accordance with some embodiments of the present technology. FIG. 6 is a block diagram illustrating an example system utilizing the distilled generative response engine to generate responses to search queries in accordance with some embodiments of the present technology. FIG. 7 illustrates an aspect of the subject matter in accordance with one embodiment. FIG. 8 illustrates an example lifecycle of an ML model in accordance with some embodiments of the present technology. FIG. 9 illustrates an example of a deep learning neural network according to some embodiments of the present technology. FIG. 10 shows an example of a system for implementing certain embodiments of the present technology. DETAILED DESCRIPTION Generative response engines such as large language models represent a significant milestone in the field of artificial intelligence, revolutionizing computer-based natural language understanding and generation. Generative response engines, powered by advanced deep learning techniques, have demonstrated astonishing capabilities in tasks such as text generation, translation, summarization, and even code generation. However, despite their remarkable linguistic prowess, these generative response engines operate on a foundation of publicly available information and do not possess personal information about individual users. One task to which generative response engines have recently been applied is to interface with a search engine to summarize search results. This is a powerful use case because it can potentially help a user avoid having to review numerous search results to find the information that the user is ultimately seeking. However, simply utilizing a generative response engine to summarize search results or to extract an answer from search results leaves many problems to be solved before the technology can provide a good user experience to everyday users. One limitation is that base models of generative response engines are not particularly fast when asked to review search results and p