EP-4738145-A1 - METHOD AND SYSTEM FOR RETRIEVAL-AUGMENTED GENERATION

EP4738145A1EP 4738145 A1EP4738145 A1EP 4738145A1EP-4738145-A1

Abstract

An extended retrieval-augmented generation system stores documents in a vector database (ME-VDB), wherein for at least one text chunk of each document, an embedding model (EM) computes a vector embedding (VE), and wherein both are stored in the vector database. Metadata of each document are included in the vector database as well. An application (EA) then receives a query from a user (U). The embedding model (EM) receives the query as input and forms a query embedding (QE). The vector database receives the query embedding as input and identifies documents matching the query embedding. The application retrieves the matching documents and their metadata. Next, the application sends at least one prompt containing the query and the retrieved documents to a large language model (LLM). Later, the application receives a response from the large language model. A metadata score generator computes at least one score for the retrieved metadata. In addition or as an alternative, a metadata summary generator (MSMG) sends at least one prompt containing the retrieved metadata to the large language model or to another large language model. The metadata summary generator then receives a summary of the retrieved metadata. Finally, the application outputs the response together with the at least one score and/or the summary to the user. The system supports the user in quickly assessing quality/trustworthiness of the LLM response by providing a score/summary based on the metadata (e.g. authors, publication date, rating) of the retrieved documents. The system can offer the user different orthogonal scores and/or summaries of the metadata of the retrieved documents, e.g. both the recency of retrieved documents and the diversity of the author affiliations, as opposed to RAG systems which choose or filter their data sources in advance.

Inventors

Brandt, Sebastian-Philipp
BUCKLEY, MARK
MERK, STEPHAN

Assignees

Siemens Aktiengesellschaft

Dates

Publication Date: 20260506
Application Date: 20241031

Claims (12)

A computer implemented method for retrieval-augmented generation, wherein the following operations are performed by components, and wherein the components are hardware components and/or software components executed by one or more processors: - storing (1), by a vector database (VDB), documents, wherein for at least one text chunk of each document, an embedding model (EM) computes a vector embedding (VE), and wherein both are stored in the vector database (VDB), - receiving (2), by an application (EA), a query from a user (U), - forming (3), by the embedding model (EM) receiving the query as input, a query embedding (QE), - identifying (4), by the vector database (VDB) receiving the query embedding (QE) as input, documents matching the query embedding (QE), - retrieving (5), by the application (EA), the matching documents, - sending (6), by the application (EA), at least one prompt containing the query and the retrieved documents to a large language model (LLM), - receiving (7), by the application (EA), a response from the large language model (LLM), and - outputting (11), by the application (EA), the response to the user (U), characterized by - the storing operation (1) including metadata of each document in the vector database (VDB), and - the retrieving operation (5) also retrieving the metadata of the matching documents, and by the following operations: - computing (8), by a metadata score generator (MSCG), at least one score for the retrieved metadata, wherein the outputting operation (11) also outputs the at least one score to the user (U), and/or - sending (9), by a metadata summary generator (MSMG), at least one prompt containing the retrieved metadata to the large language model (LLM) or to another large language model, and - receiving (10), by the metadata summary generator (MSMG), a summary of the retrieved metadata, wherein the outputting operation (11) also outputs the summary to the user (U).
The method of claim 1, - wherein the sending operation (9) includes the query in the at least one prompt.
The method according to any of the preceding claims, - wherein the storing operation (1) includes the metadata of each document by retrieving existing metadata attached to each document in a data source (DS).
The method according to any of the preceding claims 1 and 2, - wherein the storing operation (1) includes the metadata of each document by extracting, by a preprocessor (PRP), the metadata from each document.
The method according to any of the preceding claims 1 and 2, - wherein the storing operation (1) includes the metadata of each document by retrieving the metadata for each document from a metadata store (MDS).
The method according to any of the preceding claims, - wherein the outputting operation (11) outputs the at least one score and/or the summary next to the response of the large language model (LLM) as text.
The method according to any of the preceding claims, - wherein the outputting operation (11) outputs the at least one score next to the response of the large language model (LLM) as a number.
The method according to any of the preceding claims, - wherein the outputting operation (11) outputs the at least one score next to the response of the large language model (LLM) as a graphical symbol.
A system for retrieval-augmented generation, comprising: - a vector database (VDB), configured for storing documents, - an embedding model (EM), configured for computing a vector embedding (VE) for at least one text chunk of each document, - wherein the vector database (VDB) is configured for storing the at least one text chunk and the at least one corresponding vector embedding (VE) for each document, - an application (EA), configured for receiving a query from a user (U), - wherein the embedding model (EM) is configured for receiving the query as input and for forming a query embedding (QE), - wherein the vector database (VDB) is configured for receiving the query embedding (QE) as input and for identifying documents matching the query embedding (QE), - wherein the application (EA) is configured for - retrieving the matching documents, - sending at least one prompt containing the query and the retrieved documents to a large language model (LLM), - receiving a response from the large language model (LLM), and - outputting the response to the user (U), characterized by - the vector database (VDB) being configured for storing metadata of each document, and - the application (EA) being configured for retrieving the metadata of the matching documents, and by - a metadata score generator (MSCG), configured for computing at least one score for the retrieved metadata, wherein the application (EA) is configured for also outputting the at least one score to the user (U), and/or - a metadata summary generator (MSMG), configured for - sending at least one prompt containing the retrieved metadata to the large language model (LLM) or to another large language model, and - receiving a summary of the retrieved metadata, wherein the application (EA) is configured for also outputting the summary to the user (U).
The system of claim 9, - comprising a user interface, configured for receiving input from the application (EA) and outputting - the response, and - the at least one score and/or the summary, to the user (U).
A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out a method according to one of the method claims.
A provisioning device for the computer program product according to the preceding claim, wherein the provisioning device stores and/or provides the computer program product.

Description

Technical Field This application relates to retrieval-augmented generation. Retrieval-augmented generation (RAG) combines intelligent information retrieval with the power of natural language generation by large language models (LLMs), aiming to enhance content creation e.g. for chatbots or co-pilots and improve the quality of generated text. Retrieval-augmented generation leverages pre-existing knowledge and relevant information from large-scale text corpora, such as e.g. document databases, to augment the generation process. Background Art Traditionally, large language models operate in a closed system, relying solely on the input provided during training. However, retrieval-augmented generation takes a step further by incorporating an intelligent retrieval component that searches for and retrieves relevant information from external sources to include it in the generated text. This retrieval process helps to enrich the generated content by providing additional context and up-to-date information. A retrieval-augmented generation framework consists of two main components: a retrieval model and a generation model. The retrieval model is responsible for querying internal or external data sources and retrieving relevant documents based on a given input prompt, such as a question from a user. It can use a simple keyword search or more advanced techniques such as semantic matching and neural ranking to ensure the retrieval of the most pertinent documents. Examples for retrieved documents are web sites, electronic text documents on share points or service tickets in a service ticket database. Once the retrieval model has obtained the relevant documents, they are passed on to the generation model. The generation model then incorporates these retrieved documents into the content generation process via a prompt template, ensuring that the generated text is contextually accurate, comprehensive, and aligned with the user's intent. This retrieval-augmented approach helps to alleviate issues like factual inaccuracies, outdated information, or lack of context that may arise in traditional generation models. There are LLM systems that add inline links to the generated text which is typically achieved by putting snippets of the generated text to a web search (e.g. classical style Google or Bing) and e.g., using the highest ranked search result for the link. Summary of Invention It is an object of the present invention to identify a problem in the prior art and to find a technical solution for this. The invention is defined in the independent claims. Further advantageous arrangements and embodiments of the invention are set forth in the respective dependent claims. According to the method for retrieval-augmented generation, the following operations are performed by components, wherein the components are hardware components and/or software components executed by one or more processors: storing, by a vector database, documents, wherein for at least one text chunk of each document, an embedding model computes a vector embedding, and wherein both are stored in the vector database,receiving, by an application, a query from a user,forming, by the embedding model receiving the query as input, a query embedding,identifying, by the vector database receiving the query embedding as input, documents matching the query embedding,retrieving, by the application, the matching documents,sending, by the application, at least one prompt containing the query and the retrieved documents to a large language model,receiving, by the application, a response from the large language model, andoutputting, by the application, the response to the user. The method is characterized by the storing operation including metadata of each document in the vector database, andthe retrieving operation also retrieving the metadata of the matching documents, and by the following operations that are performed by components, wherein the components are hardware components and/or software components executed by one or more processors: computing, by a metadata score generator, at least one score for the retrieved metadata, wherein the outputting operation also outputs the at least one score to the user, and/orsending, by a metadata summary generator, at least one prompt containing the retrieved metadata to the large language model or to another large language model, andreceiving, by the metadata summary generator, a summary of the retrieved metadata, wherein the outputting operation also outputs the summary to the user. The system for retrieval-augmented generation comprises the following components, wherein the components are hardware components and/or software components executed by one or more processors: a vector database, configured for storing documents,an embedding model, configured for computing a vector embedding for at least one text chunk of each document,wherein the vector database is configured for storing the at least one text chunk and the at least one corr