US-12619588-B2 - System and methods for retrieval-augmented generation searches of unstructured and structured information

US12619588B2US 12619588 B2US12619588 B2US 12619588B2US-12619588-B2

Abstract

A Retrieval-Augmented Generation (“RAG”) framework may include an RAG vector data store with information about vector embeddings. An RAG pre-processing platform may access unstructured data and perform vector embedding to generate a vector embedding for unstructured data to be stored in the RAG vector data store. The RAG pre-processing platform may also access structured data from the knowledge base and create a summary and metadata about the structured data to be stored in the RAG vector data store. An RAG retriever platform may receive a user prompt from a user, perform vector embedding, and retrieve context-relevant information for unstructured and structured data by searching for similar embeddings in the RAG vector data store. An RAG reader platform may combine the context-relevant information with the user prompt and an RAG prompt to create a LLM prompt. A context-aware response is then output to the user.

Inventors

Oliver SCHIRMER
Frank Feinbube

Assignees

SAP SE

Dates

Publication Date: 20260505
Application Date: 20240614

Claims (20)

1 . A system associated with a Retrieval-Augmented Generation (“RAG”) framework, comprising: an RAG vector data store that contains information about a plurality of vector embeddings; and an RAG pre-processing platform, coupled to the RAG vector data store, including: a computer processor, and a computer memory storing instructions that when executed by the computer processor cause the RAG pre-processing platform to: access unstructured data associated with a first document from a knowledge base, perform vector embedding on the unstructured data to generate a vector embedding for unstructured data, store the vector embedding for unstructured data in the RAG vector data store along with the first document, access structured data associated with a second document from the knowledge base, create a summary and metadata about the structured data, perform vector embedding on the summary and metadata to generate a vector embedding for structured data, and store the vector embedding for structured data in the RAG vector data store along with the summary and metadata.
2 . The system of claim 1 , further comprising: an RAG retriever platform, coupled to the RAG vector data store, to: receive a user prompt from a user, perform vector embedding on the user prompt, and retrieve context-relevant information for unstructured and structured data by searching for similar embeddings in the RAG vector data store; and an RAG reader platform, coupled to the RAG retriever platform, to: combine the context-relevant information with the user prompt and an RAG prompt to create a Large Language Model (“LLM”) prompt, and output a context-aware response to the user prompt via the LLM prompt and a LLM.
3 . The system of claim 2 , wherein prior to performing vector embedding on the user prompt, the system performs pre-processing of the user prompt via at least one of: (i) reduction of specific sections via Natural Language Processing (“NLP”), (ii) filtering of specific sections via NLP, and (iii) an additional LLM prompt.
4 . The system of claim 2 , wherein the RAG retriever platform retrieves the first document directly from the RAG vector data store.
5 . The system of claim 4 , wherein the RAG retriever platform retrieves the second document based on the metadata.
6 . The system of claim 2 , wherein the summary describes content of the second document and includes important keywords.
7 . The system of claim 2 , wherein the RAG pre-processing platform creates the summary and metadata using a summary LLM and a summary prompt.
8 . The system of claim 7 , wherein the RAG pre-processing platform creates the summary and metadata further using a compression prompt.
9 . The system of claim 2 , wherein the metadata includes at least one of: (i) a source path for the second document, (ii) hierarchy information, and (iii) information about documents related to the second document.
10 . The system of claim 2 , wherein the searching finds top-k semantically similar text using at least one of: (i) Cosine similarity, (ii) a dot product, (iii) Euclidean distance, and (iv) any other state-of-the-art similarity search technique.
11 . The system of claim 2 , wherein the RAG pre-processing platform further splits information into chunks of a predetermined length.
12 . The system of claim 2 , wherein the user prompt and context-aware response are provided via an immersive virtual experience.
13 . The system of claim 1 , wherein the summary describes content of the second document, including one or more keywords descriptive of an overall theme of the second document.
14 . The system of claim 1 , wherein the RAG pre-processing platform creates the summary and metadata using a summary LLM and a summary prompt.
15 . The system of claim 1 , wherein the RAG pre-processing platform creates the summary and metadata using a compression prompt.
16 . One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by a computing system, cause the computing system to perform operations associated with a Retrieval-Augmented Generation (“RAG”) framework, comprising: accessing, by an RAG pre-processing platform, unstructured data associated with a first document from a knowledge base; performing, by the RAG pre-processing platform, vector embedding on the unstructured data to generate a vector embedding for unstructured data; storing, by the RAG pre-processing platform, the vector embedding for unstructured data in a RAG vector data store along with the first document, wherein the RAG vector data store contains information about a plurality of vector embeddings; accessing, by the RAG pre-processing platform, structured data associated with a second document from the knowledge base; creating, by the RAG pre-processing platform, a summary and metadata about the structured data; performing, by the RAG pre-processing platform, vector embedding on the summary and metadata to generate a vector embedding for structured data; storing, by the RAG pre-processing platform, the vector embedding for structured data in the RAG vector data store along with the summary and metadata; receiving, by a RAG retriever platform, a user prompt from a user; performing, by the RAG retriever platform, vector embedding on the user prompt; retrieving, by the RAG retriever platform, context-relevant information for unstructured and structured data by searching for similar embeddings in the RAG vector data store; combining, by a RAG reader platform, the context-relevant information with the user prompt to create a Large Language Model (“LLM”) prompt; and outputting, by the RAG reader platform, a context-aware response to the user prompt via the LLM prompt and a LLM.
17 . The media of claim 16 , wherein the RAG retriever platform retrieves the first document directly from the RAG vector data store.
18 . The media of claim 17 , wherein the RAG retriever platform retrieves the second document based on the metadata.
19 . The media of claim 18 , wherein the metadata includes at least one of: (i) a source path of the second document, (ii) hierarchy information, and (iii) information about documents related to the second document.
20 . The media of claim 17 , wherein the RAG retriever platform further splits information into chunks of a predetermined length and the user prompt and context-aware response are provided via an immersive virtual experience.

Description

BACKGROUND A Large Language Model (“LLM”) achieves general-purpose language generation and other natural language processing. Based on language models, LLMs acquire these abilities by learning statistical relationships from substantial amounts of text (e.g., from a knowledge base) during a training process. LLMs can be used for generative AI, by taking an input text or prompt and predicting future tokens or words using artificial neural networks. In some cases, an LLM may answer user queries in various contexts by cross-referencing knowledge sources. Some drawbacks of the basic LLM approach include presenting false information (or “hallucinations”) and responses with out-of-date or generic information. To address these and other issues, Retrieval-Augmented Generation (“RAG”) optimizes the output of a LLM so that it references an authoritative knowledge base outside of the original training data sources. RAG can extend LLM capabilities to specific domains or an organization's internal knowledge base without retraining the model. For example, FIG. 1 is a high-level system 100 RAG architecture that includes a LLM 110, a vector search 120, and a vector data store 130. FIG. 2 is a basic RAG method that begins with receiving a user query at S210. In response to the user query, the LLM 110 interprets the query using embedding at S220. A vector search 120 is performed using information in the vector data store 130 at S230. The vector data store 130 might be populated with, for example, with information gathered from a knowledge base of enterprise documents (e.g., emails, memos, reports, etc.). The vector search 120 returns relevant context information specific to that enterprise which is be used by the LLM 110 to generate an appropriate response to the user query at S240. In this way, RAG redirects the LLM 110 to retrieve relevant context information from authoritative, pre-determined knowledge sources gives an organization control over the text output that is generated. In this way, RAG may provide a cost-effective Artificial Intelligence (“AI”) implementation (because the LLM 110 does not need to be retrained with the new data) and more current information can be included without retraining. RAG has been very successful at determining relevant context information for a user prompt based on information extracted from unstructured data. As used herein, the phrase “unstructured data” may refer to information that does not have a pre-defined data model and is not organized in a pre-defined manner (e.g., documents, emails, chat group conversations, transcripts, etc.). RAG has been less successful, however, understanding and using structured data, such as a list of employee names and their roles within an organization, a list of projects and sub-projects, etc. The typical RAG embedding process and vector search cannot readily store this type of information within a vector data store and/or utilize the information to determine relevant context information responses to a user query. It would therefore be desirable to provide an RAG framework that supports unstructured and structured data in a secure, automatic, and efficient manner. SUMMARY According to some embodiments, methods and systems associated with an Retrieval-Augmented Generation (“RAG”) framework may include an RAG vector data store with information about vector embeddings. An RAG pre-processing platform may access unstructured data and perform vector embedding to generate a vector embedding for unstructured data to be stored in the RAG vector data store. The RAG pre-processing platform may also access structured data from the knowledge base and create a summary and metadata about the structured data to be stored in the RAG vector data store. An RAG retriever platform may receive a user prompt from a user, perform vector embedding, and retrieve context-relevant information for unstructured and structured data by searching for similar embeddings in the RAG vector data store. An RAG reader platform may combine the context-relevant information with the user prompt and an RAG prompt to create a LLM prompt. A context-aware response is then output to the user. Some embodiments comprise: means for receiving, by an RAG retriever platform, a user prompt from a user; means for performing, by the RAG retriever platform, vector embedding on the user prompt; means for retrieving, by the RAG retriever platform, context-relevant information for unstructured and structured data; means for searching, by the RAG retriever platform, for similar embeddings in the RAG vector data store; means for combining, by an RAG reader platform, the context-relevant information with the user prompt and an RAG prompt to create a LLM prompt; and means for outputting, by the RAG reader platform, a context-aware response to the user prompt via the LLM prompt and a LLM. Some technical advantages of some embodiments disclosed herein are improved systems and methods to provide a RAG framework that s