US-20260127204-A1 - RETRIEVAL-AUGMENTED GENERATION AND RELEVANCY ANNOTATION USING GENERATIVE ARTIFICIAL INTELLIGENCE

US20260127204A1US 20260127204 A1US20260127204 A1US 20260127204A1US-20260127204-A1

Abstract

Methods and systems provide content searching and retrieval using generative artificial intelligence (AI) Models. The system is configured to receive a user search for content, media or item listings. The system receives a natural language-based input associated with a client device of a user. The system generates a search criterion for the received natural language-based input. The system provides a data set of retrieved content items to one or more large language models that annotate each of the content items in the data set. The system receives a new data set with the content items each including a relevancy annotation. Based on the relevancy annotations in the new data the system determines what additional processing to perform.

Inventors

Andrew Donald Yates

Assignees

DROPBOX, INC.

Dates

Publication Date: 20260507
Application Date: 20251223

Claims (20)

1 . A computer-implemented method comprising: assembling a data set of content items in response to receiving a query from a client device; determining whether the query is relevant to a domain; in response to determining that the query is relevant to the domain, providing a prompt to one or more generative artificial intelligence (AI) models, instructing the one or more generative AI models to generate one or more relevancy annotations for the data set of content items; and providing for display on a graphical user interface of the client device a portion of the data set of content items based, in part, on the one or more relevancy annotations.
2 . The computer-implemented method of claim 1 , further comprising: generating, utilizing the one or more generative AI models, a description providing a reasoning for generating the one or more relevancy annotations for the data set of content items.
3 . The computer-implemented method of claim 1 , further comprising: determining an irrelevance of the query to the domain; and aborting generation of the one or more relevancy annotations for the data set of content items by the one or more generative AI models based on the irrelevance of the query to the domain.
4 . The computer-implemented method of claim 1 , further comprising: determining a match between the query and a historical query from a set of historical queries stored in a cache; bypassing generation of the one or more relevancy annotations based on detecting a historical response to the historical query; and providing for display on the graphical user interface of the client device a pre-computed response based on the historical response.
5 . The computer-implemented method of claim 1 , further comprising: collecting contextual information related to at least the query or one or more features of a user account associated with the query; and adding the contextual information to the data set of content items.
6 . The computer-implemented method of claim 1 , further comprising: utilizing at least one of prompt tuning, assembly, inference optimization or supervised domain task refinement to improve an accuracy of the one or more generative AI models generating the one or more relevancy annotations for the data set of content items.
7 . The computer-implemented method of claim 1 , further comprising: providing for display on a graphical user interface of the client device a ranked subset of content items from the data set of content items based, in part, on the one or more relevancy annotations.
8 . A system comprising: at least one processor; and a non-transitory computer-readable medium storing instructions which, when executed by the at least one processor, cause the system to: receive a query from a client device; assemble a data set of content items in response to the query; determine whether the query is relevant to a domain; in response to determining that the query is relevant to the domain, generate a prompt instructing one or more generative artificial intelligence (AI) models to produce one or more relevancy annotations for at least a subset of content items from the data set of content items; and select, based at least in part on the one or more relevancy annotations, the subset of content items for display on a graphical user interface of the client device.
9 . The system of claim 8 , further storing instructions which, when executed by the at least one processor, cause the system to: rank the subset of content items based on the one or more relevancy annotations.
10 . The system of claim 8 , further storing instructions which, when executed by the at least one processor, cause the system to: determine at least an irrelevance of the query to the domain or cached relevance annotations for the data set of content items; and abort generation of the one or more relevancy annotations for the data set of content items by the one or more generative AI models based on at least the irrelevance of the query to the domain or the cached relevance annotations for the data set of content items.
11 . The system of claim 8 , further storing instructions which, when executed by the at least one processor, cause the system to: receive, via the client device, one or more user interactions with the subset of content items; and generate one or more additional relevancy annotations for the subset of content items based on the one or more user interactions.
12 . The system of claim 8 , further storing instructions which, when executed by the at least one processor, cause the system to: generate search criterion related to the one or more relevancy annotations; apply one or more numeric annotations to the data set of content items according to the search criterion; and generate the one or more relevancy annotations based on the one or more numeric annotations.
13 . The system of claim 8 , further storing instructions which, when executed by the at least one processor, cause the system to: determine a relevancy annotation threshold; and display the subset of the content items based on the subset of content items exceeding the relevancy annotation threshold.
14 . The system of claim 8 , wherein the one or more generative AI models generate the one or more relevancy annotations using a domain-specific ordinal scale.
15 . A non-transitory computer-readable medium storing executable instructions which, when executed by at least one processor, cause the at least one processor to: assembling a data set of content items in response to receiving a query from a client device; based on determining that a query is relevant to a domain, provide an initial prompt to a generative artificial intelligence (AI) model to generate one or more initial relevancy annotations for a subset of content items from the data set of content items; receive one or more user interactions with the subset of content items provided for display on the client device; provide an additional prompt to an additional generative AI model to generate one or more updated relevancy annotations for the data set of content items annotated by the generative AI model; and provide for display on the client device, an updated subset of content items based, in part, on the one or more updated relevancy annotations.
16 . The non-transitory computer-readable medium of claim 15 , further storing instructions which, when executed by the at least one processor, cause the at least one processor to: detect cached relevance annotations for the data set of content items; and abort generation of the one or more initial relevancy annotations for the data set of content items by the generative AI model based on the cached relevance annotations for the data set of content items.
17 . The non-transitory computer-readable medium of claim 15 , wherein the additional generative AI model is selected because a number of content items meeting a domain-specific relevancy threshold in the one or more initial relevancy annotations falls below a predetermined minimum number of content items.
18 . The non-transitory computer-readable medium of claim 15 , further storing instructions which, when executed by the at least one processor, cause the at least one processor to: generate, utilizing the generative AI model, an initial description providing a reasoning for generating the one or more initial relevancy annotations for the data set of content items; and generate, utilizing the additional generative AI model, an additional description providing an additional reasoning for generating the one or more updated relevancy annotations for the data set of content items.
19 . The non-transitory computer-readable medium of claim 15 , further storing instructions which, when executed by the at least one processor, cause the at least one processor to: rank the subset of content items based on the one or more initial relevancy annotations; and re-rank the updated subset of content items based on the one or more updated relevancy annotations.
20 . The non-transitory computer-readable medium of claim 15 , further storing instructions which, when executed by the at least one processor, cause the at least one processor to: collect one or more features of a user account associated with the query; and add the one or more features of the user account to the data set of content items.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/939,383, filed on Nov. 6, 2024. The aforementioned application is hereby incorporated by reference in its entirety. FIELD OF INVENTION Various embodiments relate generally to analysis of machine learning model operations, and more particularly, to systems and methods for retrieval-augmented generation and relevancy annotation using generative artificial intelligence. SUMMARY Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to methods of content retrieval, ranking and decision-making. The system provides for retrieval-augmented generation of content items and relevancy annotation of the content items. As described herein, a system performs retrieval-augmented generation (RAG) and ranking system that employs at least one large language model (LLM) to provide contextual relevance annotation for content retrieved from multiple retrieval systems. These relevance annotations are then used to assemble an optimized presentation of content by another system in combination with traditional statistical inference and other control mechanisms. Retrieval-Augmented Generation (RAG) ranking leverages the strengths of both LLMs and statistical inference to produce optimal allocations of content in search and recommendation. A Retrieval-Augmented Generation (RAG) system architecture is described where a traditional pool keyword and vector search retrieval systems produces a set of candidate content items. These retrieved content items are input to one or more LLM to an LLM, and the LLM generates a response based on that input. However, unlike the current state of the art, the generative output of this system is not final presentation to end users. Instead, in some embodiments, the system generates an intermediate output that is an input to another statistical inference and allocation system that generates the final presentation to end users. Effectively, the “Generation” is the semantic relevance annotation of a page rather than the presented content to end users. These annotations are inputs to statistical inference and allocation systems that generate the end user response, which is typically an ideal allocation of items in response to a user search query or user recommendations. This system may include previous LLM or expert label responses as live examples to improve the prompt to the LLM and to estimate the mean and variance in relevance judgements for use in downstream allocation systems. The previous LLM responses may be generated asynchronously using more advanced, but slower and more expensive labeling methods including expert human annotation. In some embodiments, the computer implemented methods and systems provide content searching and retrieval and provide relevancy annotation using generative artificial intelligence (AI) Models. The system is configured to receive a user search for content, media or item listings. The system receives a natural language-based input associated with a client device of a user. The system generates a search criterion for the received natural language-based input. The system provides a data set of retrieved content items to one or more large language models that annotate each of the content items in the data set. The system receives a new data set with the content items each including a relevancy annotation. Based on the relevancy annotations in the new data the system determines what additional processing to perform. The examples and appended claims may serve as a summary of this application. BRIEF DESCRIPTION OF THE DRAWINGS The present invention relates generally to content generation, and more particularly, to systems and methods for providing rich media presentation of recommendations in generative media. The present disclosure will become better understood from the detailed description and the drawings, wherein: FIG. 1A is a diagram illustrating an exemplary environment in which some embodiments may operate. FIG. 1B is a diagram illustrating an exemplary computer system that may execute instructions to perform some of the methods herein. FIGS. 2A-2B is a diagram illustrating an exemplary method according to an embodiment. FIG. 3 is a diagram illustrating an exemplary method 300 according to an embodiment. FIG. 4 is a diagram illustrating an exemplary relevancy annotation using one or more LLMs. FIG. 5 is a flow chart illustrating an exemplary method that may be performed in some embodiments. FIG. 6 is a diagram illustrating an exemplary computer that may perform processing in some embodiments. DETAILED DESCRIPTION In this specification, reference is made in detail to specific embodiments of the invention. Some of the embodiments or their aspects are illustrated in the drawings. For clarity in explanation, the invention has been described with reference to specific embodiments, however it should be understood that