US-12625919-B2 - Generating and processing summaries of search results using a language model

US12625919B2US 12625919 B2US12625919 B2US 12625919B2US-12625919-B2

Abstract

Technology is disclosed for programmatically generating a summary by a language model of search results based on the corresponding relevance of the search results to an input search query to a search engine. A user inputs a search query into a search engine and the search engine determines and ranks a set of search results based on the relevance of each of the search results. A snippet of information is determined for the most relevant search results to the input search query. The snippets of information are used to generate an input prompt to a language model with an instruction to generate a summary of the snippets of information based on the input search query. The generated summary is provided in response to the user to the search query and/or is cached in order to provide the generated summary in response to similar search queries.

Inventors

Siyu Zhou
Xin Jin
Tong Wang
Mi Yan
SUBHOJIT SOM
Katherine GU

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260512
Application Date: 20240503

Claims (17)

1 . A computerized system for preserving computing and network resources for search queries, comprising: at least one processor; and computer memory storing computer-useable instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising: determining a set of search results responsive to a search query; extracting a plurality of snippets from a threshold amount of high relevance search results from the set of search results by: determining a maximum token size of a snippet based on a total number of snippets of the plurality of snippets and processing speed of a large language model; and extracting each snippet of the plurality of snippets within the maximum token size and based on relevance to the search query; determining, using a language model, a high relevance portion of snippets from the plurality of snippets, wherein the language model qualifies the high relevance portion of the plurality of snippets as having above a threshold likelihood of providing an answer to the search query; generating, based on applying the high relevance portion of snippets and the search query to the large language model, a summary of the high relevance portion of snippets; and causing presentation of a user interface in response to the search query by causing rendering of a portion of the user interface providing the set of search results to the search query before rendering of a different portion of the user interface providing a representation of the summary when latency in providing the representation of the summary is above a threshold amount of time.
2 . The computerized system of claim 1 , wherein the representation of the summary comprises a list of items and wherein each of the plurality of snippets comprises at least one of extracted text data from a corresponding search result, extracted image data from the corresponding search result, metadata associated with the corresponding search result, and a generative summary based on the corresponding search result.
3 . The computerized system of claim 1 , wherein causing presentation of the user interface further comprises: providing an indication that the summary is still loading when causing rendering of the portion of the user interface providing the set of search results to the search query before rendering of the different portion of the user interface providing the representation of the summary.
4 . The computerized system of claim 1 , wherein generating the summary further comprises: generating an input prompt for the language model, the input prompt including the high relevance portion of snippets, a corresponding representation of the search query, and an instruction for the language model to generate the summary using the high relevance portion of snippets and the search query; receiving an output of the language model in response to providing the input prompt to the language model; and determining the summary from the output of the language model.
5 . The computerized system of claim 4 , wherein generating the summary further comprises: determining a source of a corresponding portion of the summary based on a corresponding one of the high relevance portion of snippets; and wherein causing presentation of the representation of the summary further comprises: providing a citation to a search result corresponding to the corresponding one of the high relevance portion of snippets in the representation of the summary.
6 . The computerized system of claim 1 , wherein causing presentation of the user interface further comprises: providing the representation of the summary in a delineated space of the user interface; and providing at least one search result corresponding to one of the plurality of snippets in the delineated space beneath the summary.
7 . The computerized system of claim 1 , the operations further comprising: caching the representation of the summary and the search query; and in response to a subsequent search query within a threshold similarity to the search query, causing presentation of the representation of the summary from cache in response to the subsequent search query by causing rendering of the different portion of the user interface providing the representation of the summary from the cache in near real-time with the portion of the user interface providing corresponding search results.
8 . The computerized system of claim 1 , the operations further comprising: responsive to a hover action over a corresponding portion of the representation of the summary, causing presentation of a source of the corresponding portion of the representation of the summary.
9 . A computer-implemented method for preserving computing and network resources for search queries, comprising: determining a set of search results responsive to a search query; extracting a plurality of snippets from a threshold amount of high relevance search results from the set of search results by: determining a maximum token size of a snippet based on a total number of snippets of the plurality of snippets and processing speed of a language model; and extracting each snippet of the plurality of snippets within the maximum token size and based on relevance to the search query; determining, using a transformer encoder model trained to qualify extracted snippets with respect to an input search query, a high relevance portion of snippets from the plurality of snippets, wherein the transformer encoder model qualifies the high relevance portion of the plurality of snippets as having above a threshold likelihood of providing an answer to the search query; generating, based on applying the search query and the high relevance portion of snippets of search results to the language model, a summary of the high relevance portion of snippets; causing presentation of a user interface in response to the search query by causing rendering of a portion of the user interface providing the set of search results to the search query before rendering of a different portion of the user interface providing the summary when latency in providing the summary is above a threshold amount of time; caching the summary and the search query; and causing presentation of a representation of the summary in response to a subsequent search query by accessing the summary from cache based on the subsequent search query being within a threshold similarity to the search query and causing rendering of the different portion of the user interface providing the representation of the summary in near real-time with the portion of the user interface providing corresponding search results.
10 . The computer-implemented method of claim 9 , wherein each of the plurality of snippets comprises at least one of extracted text data from a corresponding search result, extracted image data from the corresponding search result, metadata associated with the corresponding search result, and a generative summary based on the corresponding search result.
11 . The computer-implemented method of claim 9 , wherein causing presentation of the user interface further comprises: providing an indication that the summary is still loading when causing rendering of the portion of the user interface providing the set of search results to the search query before rendering of the different portion of the user interface providing the summary.
12 . The computer-implemented method of claim 9 , wherein generating the summary further comprises: determining a source of a corresponding portion of the summary based on a corresponding one of the high relevance portion of snippets; and wherein causing presentation of the representation of the summary further comprises: providing a citation to a search result corresponding to the corresponding one of the high relevance portion of snippets in the representation of the summary.
13 . The computer-implemented method of claim 9 , wherein causing presentation of the representation of the summary further comprises: providing the representation of the summary in a delineated space on the user interface; and providing at least one search result corresponding to one of the plurality of snippets in the delineated space beneath the summary.
14 . The computer-implemented method of claim 9 , further comprising: updating the summary in the cache periodically by: generating, based on applying the search query and a different plurality of snippets of search results relevant to the search query to the language model, an updated summary.
15 . One or more computer storage media having computer-executable instructions embodied thereon that, when executed by a computing system having at least one processor and at least one memory, cause the at least one processor to perform operations comprising: determining, based on a search query less than a threshold similarity to cached generated summaries, to generate a real-time summary of search results in response to the search query; determining a set of search results responsive to the search query; extracting a plurality of snippets from a threshold amount of high relevance search results from the set of search results by: determining a maximum token size of a snippet based on a total number of snippets of the plurality of snippets and processing speed of a language model; and extracting each snippet of the plurality of snippets within the maximum token size and based on relevance to the search query; determining, using a transformer model, a high relevance portion of snippets from the plurality of snippets, wherein the transformer model qualifies the high relevance portion of the plurality of snippets as having above a threshold likelihood of providing an answer to the search query; generating, based on applying the high relevance portion of snippets and the search query to the language model, the real-time summary of search results; and causing presentation of a representation of the real-time summary of search results in response to the search query by causing rendering of a portion of a user interface providing the set of search results to the search query before rendering of a different portion of the user interface providing the representation of the real-time summary of search results when latency in providing the representation of the real-time summary of search results is above a threshold amount of time.
16 . The one or more computer storage media of claim 15 , wherein causing presentation of the user interface further comprises: providing an indication that the real-time summary is still loading when causing rendering of the portion of the user interface providing the set of search results to the search query before rendering of the different portion of the user interface providing the representation of the real-time summary of search results.
17 . The one or more computer storage media of claim 15 , wherein generating the real-time summary further comprises: determining a source of a corresponding portion of the real-time summary based on a corresponding one of the high relevance portion of snippets; and wherein causing presentation of the representation of the real-time summary further comprises: providing a citation to a search result corresponding to the corresponding one of the high relevance portion of snippets in the representation of the real-time summary.

Description

BACKGROUND Search engines play a crucial role in helping users find relevant information on the internet by filtering through vast amounts of content to make it easier for users to locate relevant resources without sifting through irrelevant web pages. In addition to the uniform resource locator (URL) of the website of the search result, existing search engines provide the title of the website and a portion of the website that is relevant to the query in order to make it easier for users to locate relevant resources, but often there is information relevant to the query that is shared across multiple relevant search results. In these instances, the user is required to access each of the multiple relevant search results and read through each search result to determine whether the information is indeed relevant to the query. SUMMARY This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in isolation as an aid in determining the scope of the claimed subject matter. Embodiments described in the present disclosure are directed towards technologies for improving electronic search engine technology and enhanced computing services for a user, based on determining generative summaries for search results in response to a search query. In particular, this disclosure provides technologies to programmatically generate a summary of search results by a language model, such as a large language model (LLM), a language model that is fine-tuned to generate a summary of search results, and/or the like, based on the corresponding relevance of the search results to a live search query. In some implementations, the summary is generated by the language model in real-time and provided as a live response to a live search query. For example and according to an embodiment, a user inputs a search query into a search engine and the search engine determines and ranks a set of search results (for example, websites, images, videos, documents, such as news articles, and/or any search results provided by a search engine) based on the relevance of each of the search results. From a subset of the most relevant search results, a snippet of information is determined and extracted, based on the relevance of the extracted information of the snippet to the input search query. The snippets of information for the subset of the most relevant search results are used to generate an input prompt to a language model with an instruction to generate a summary of the snippets of information based on the input search query. The language model outputs a generated summary based on the input prompt that includes the snippets of information of the search results and the input search query. The generated summary is provided to the user in response to the input search query. In some implementations, the generated summary is included in a webpage of search results that is provided in response to the input search query. Further, when the latency in providing the generated summary is above a threshold amount of time, some embodiments render the portion of the web page providing the generated summary subsequent to rendering the portion of the web page providing the search results so that the user is more quickly provided at least a partial response to the query. After a generated summary is provided in response to an input search query, the generated summary can be cached in order to provide the generated summary in response to similar search queries in the future. BRIEF DESCRIPTION OF THE DRAWINGS Aspects of the disclosure are described in detail below with reference to the attached drawing figures, wherein: FIG. 1 is a block diagram of an example operating environment suitable for implementations of the present disclosure; FIG. 2 is a diagram depicting an example computing architecture suitable for implementing aspects of the present disclosure; FIGS. 3A-3G illustratively depict example schematic screenshots from a personal computing device showing aspects of an example user interface, in accordance with an embodiment of the present disclosure; FIGS. 4-5 depict flow diagrams of methods for programmatically generating a summary by a language model of search results based on the corresponding relevance of the search results to an input search query, in accordance with an embodiment of the present disclosure; FIG. 6 is a block diagram of an example computing environment suitable for use in implementing an embodiment of the present disclosure; and FIG. 7 is a block diagram of an example computing environment suitable for use in implementing an embodiment of the present disclosure. DETAILED DESCRIPTION The subject matter of aspects of the present disclosure is described with specificity herein to meet statutory requirements. However, the description