US-20260127205-A1 - CONTEXTUAL RETRIEVAL FOR MULTI-TENANT RETRIEVAL-AUGMENTED GENERATION (RAG) WITH ADAPTIVE LEARNING

US20260127205A1US 20260127205 A1US20260127205 A1US 20260127205A1US-20260127205-A1

Abstract

Methods, systems, apparatuses, devices, and computer program products are described. A system may support retrieval-augmented generation (RAG) for a large language model (LLM). The system may use adaptive learning to improve the RAG process. For example, the system may implement a context-based embedding function to contextualize the RAG for the specific LLM or a specific tenant or user using the LLM. The context-based embedding function may project document vectors from a generic vector space into a context-based vector space for document retrieval. The system may retrieve a document using the context-based vector space to provide additional contextual information to the LLM to improve the LLM's output. The system may adaptively train the context-based embedding function based on the LLM, user feedback, or both. For example, the system may train the context-based embedding function to improve alignment of document retrieval likelihoods with confidence metrics for the outputs of the LLM.

Inventors

Shiva Kumar Pentyala
Bin Bi
Regunathan Radhakrishnan
Shashank HARINATH
Sitaram Asur
Claire Cheng

Assignees

SALESFORCE, INC.

Dates

Publication Date: 20260507
Application Date: 20241105

Claims (20)

1 . A method for context-based retrieval-augmented generation (RAG), comprising: projecting a first set of vectors embedded in a first vector space into a second set of vectors embedded in a second vector space based at least in part on a context-based embedding function, a first vector of the first set of vectors corresponding to a second vector of the second set of vectors and representing a first document of a set of documents, wherein the second vector space is different from the first vector space; retrieving one or more documents of the set of documents based at least in part on a query to a large language model (LLM) and the second set of vectors embedded in the second vector space; inputting, to the LLM, a prompt for the LLM, at least one document of the one or more documents, and at least a portion of the query, wherein the LLM outputs: a result based at least in part on the prompt, the at least one document, and the portion of the query, and a confidence metric associated with the result; and updating the context-based embedding function based at least in part on the at least one document and the confidence metric associated with the result.
2 . The method of claim 1 , further comprising: embedding the set of documents as the first set of vectors in the first vector space based at least in part on a document embedding function.
3 . The method of claim 2 , further comprising: refraining from updating the document embedding function based at least in part on a security parameter of the document embedding function, an owner of the document embedding function, or both.
4 . The method of claim 2 , further comprising: applying the updated context-based embedding function to a second document embedding function different from the document embedding function.
5 . The method of claim 1 , further comprising: converting the query into a search vector for the second vector space; and selecting one or more vectors of the second set of vectors embedded in the second vector space based at least in part on a proximity of the search vector to the one or more vectors, wherein the retrieved one or more documents correspond to the selected one or more vectors.
6 . The method of claim 1 , further comprising: receiving, from a user device, first user feedback indicating an accuracy of the result, wherein the updating the context-based embedding function is further based at least in part on the first user feedback.
7 . The method of claim 1 , further comprising: receiving, from a user device, second user feedback indicating a relevance of the at least one document, wherein the updating the context-based embedding function is further based at least in part on the second user feedback.
8 . The method of claim 1 , further comprising: determining respective retrieval likelihoods for the one or more documents based at least in part on the context-based embedding function, wherein the updating the context-based embedding function is further based at least in part on the respective retrieval likelihoods for the one or more documents and respective confidence metrics for results output based at least in part on the one or more documents.
9 . The method of claim 1 , further comprising: refraining from updating the LLM based at least in part on a security parameter of the LLM, an owner of the LLM, or both.
10 . The method of claim 1 , further comprising: applying the updated context-based embedding function to a second LLM different from the LLM.
11 . The method of claim 1 , wherein the updated context-based embedding function corresponds to a tenant of a multi-tenant database system, the LLM, or both.
12 . The method of claim 1 , wherein the context-based embedding function comprises a one-layer artificial neural network.
13 . An apparatus for context-based retrieval-augmented generation (RAG), comprising: one or more memories storing processor-executable code; and one or more processors coupled with the one or more memories and individually or collectively operable to execute the code to cause the apparatus to: project a first set of vectors embedded in a first vector space into a second set of vectors embedded in a second vector space based at least in part on a context-based embedding function, a first vector of the first set of vectors corresponding to a second vector of the second set of vectors and representing a first document of a set of documents, wherein the second vector space is different from the first vector space; retrieve one or more documents of the set of documents based at least in part on a query to a large language model (LLM) and the second set of vectors embedded in the second vector space; input, to the LLM, a prompt for the LLM, at least one document of the one or more documents, and at least a portion of the query, wherein the LLM outputs: a result based at least in part on the prompt, the at least one document, and the portion of the query, and a confidence metric associated with the result; and update the context-based embedding function based at least in part on the at least one document and the confidence metric associated with the result.
14 . The apparatus of claim 13 , wherein the one or more processors are individually or collectively further operable to execute the code to cause the apparatus to: embed the set of documents as the first set of vectors in the first vector space based at least in part on a document embedding function.
15 . The apparatus of claim 14 , wherein the one or more processors are individually or collectively further operable to execute the code to cause the apparatus to: refrain from updating the document embedding function based at least in part on a security parameter of the document embedding function, an owner of the document embedding function, or both.
16 . The apparatus of claim 14 , wherein the one or more processors are individually or collectively further operable to execute the code to cause the apparatus to: apply the updated context-based embedding function to a second document embedding function different from the document embedding function.
17 . The apparatus of claim 13 , wherein the one or more processors are individually or collectively further operable to execute the code to cause the apparatus to: convert the query into a search vector for the second vector space; and select one or more vectors of the second set of vectors embedded in the second vector space based at least in part on a proximity of the search vector to the one or more vectors, wherein the retrieved one or more documents correspond to the selected one or more vectors.
18 . The apparatus of claim 13 , wherein the one or more processors are individually or collectively further operable to execute the code to cause the apparatus to: receive, from a user device, first user feedback indicating an accuracy of the result, wherein the updating the context-based embedding function is further based at least in part on the first user feedback.
19 . The apparatus of claim 13 , wherein the one or more processors are individually or collectively further operable to execute the code to cause the apparatus to: receive, from a user device, second user feedback indicating a relevance of the at least one document, wherein the updating the context-based embedding function is further based at least in part on the second user feedback.
20 . A non-transitory computer-readable medium storing code for context-based retrieval-augmented generation (RAG), the code comprising instructions executable by one or more processors to: project a first set of vectors embedded in a first vector space into a second set of vectors embedded in a second vector space based at least in part on a context-based embedding function, a first vector of the first set of vectors corresponding to a second vector of the second set of vectors and representing a first document of a set of documents, wherein the second vector space is different from the first vector space; retrieve one or more documents of the set of documents based at least in part on a query to a large language model (LLM) and the second set of vectors embedded in the second vector space; input, to the LLM, a prompt for the LLM, at least one document of the one or more documents, and at least a portion of the query, wherein the LLM outputs: a result based at least in part on the prompt, the at least one document, and the portion of the query, and a confidence metric associated with the result; and update the context-based embedding function based at least in part on the at least one document and the confidence metric associated with the result.

Description

FIELD OF TECHNOLOGY The present disclosure relates generally to database systems and data processing, and more specifically to contextual retrieval for multi-tenant retrieval-augmented generation (RAG) with adaptive learning. BACKGROUND A cloud platform (i.e., a computing platform for cloud computing) may be employed by multiple users to store, manage, and process data using a shared network of remote servers. Users may develop applications on the cloud platform to handle the storage, management, and processing of data. In some cases, the cloud platform may utilize a multi-tenant database system. Users may access the cloud platform using various user devices (e.g., desktop computers, laptops, smartphones, tablets, or other computing systems, etc.). In one example, the cloud platform may support customer relationship management (CRM) solutions. This may include support for sales, service, marketing, community, analytics, applications, and the Internet of Things. A user may utilize the cloud platform to help manage contacts of the user. For example, managing contacts of the user may include analyzing data, storing and preparing communications, and tracking opportunities and sales. Some systems may use retrieval-augmented generation (RAG) to improve generative artificial intelligence (AI) results. For example, RAG may retrieve one or more documents that provide additional context to a large language model (LLM). However, in some cases, a retrieved document may introduce an error into the system (e.g., based on the document being irrelevant or otherwise misleading), and the error may propagate to the results of the LLM based on the LLM using the document as context. Such errors may cause hallucinations at the LLM or otherwise negatively affect the accuracy or effectiveness of the LLM. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an example of a system for cloud computing that supports contextual retrieval for multi-tenant retrieval-augmented generation (RAG) with adaptive learning in accordance with aspects of the present disclosure. FIG. 2 shows an example of a system that supports contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. FIG. 3 shows an example of a RAG pipeline that supports contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. FIG. 4 shows an example of a context-based training process that supports contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. FIG. 5 shows an example of a process flow that supports contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. FIG. 6 shows a block diagram of an apparatus that supports contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. FIG. 7 shows a block diagram of a RAG manager that supports contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. FIG. 8 shows a diagram of a system including a device that supports contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. FIGS. 9 and 10 show flowcharts illustrating methods that support contextual retrieval for multi-tenant RAG with adaptive learning in accordance with aspects of the present disclosure. DETAILED DESCRIPTION Some systems may use retrieval-augmented generation (RAG) to improve generative artificial intelligence (AI) results. For example, for a specific query to a large language model (LLM) or another AI component, a RAG pipeline may provide additional context to the LLM relevant to the query. A RAG process may involve a system retrieving one or more documents based on the query and including at least one retrieved document as an additional input to the LLM (e.g., in addition to the query, an LLM prompt, or both). However, in some cases, a retrieved document may introduce an error into the system. For example, the RAG process may retrieve a document that is irrelevant to the LLM or query, that includes misleading or false information, or that otherwise negatively affects a resulting output of the LLM. For example, such a document may cause hallucinations at the LLM or may otherwise lead to an inaccurate result generated by the LLM in response to the query. Training the LLM to account for such errors may involve a significant processing overhead (e.g., based on a quantity of layers, weights, or both at the LLM) or may be unsupported (e.g., if the LLM is an off-the-shelf LLM or is otherwise owned or operated by a different entity). To improve the contextual retrieval of a RAG process, a system may implement a context-based embedding function in the RAG process. The system may adaptively train the context-based embedding function to reduce errors and