US-20260127186-A1 - QUERY RESPONSE SYSTEM IMPLEMENTING A RETRIEVAL-AUGMENT GENERATION ARCHITECTURE

US20260127186A1US 20260127186 A1US20260127186 A1US 20260127186A1US-20260127186-A1

Abstract

A query is received from a client device. Upon determining that the query includes insufficient details for generating a query response, the query is augmented to generate an augmented query that includes sufficient details for generating the query response. The augmented query is summarized into a summarized query. A subset of documents is determined to be relevant to the query in part by determining an optimal configuration for the query. A query response based on the subset of documents is outputted to the client device.

Inventors

Venkatesh K Pappakrishnan
Praveen Herur
Alok Tongaonkar

Assignees

PALO ALTO NETWORKS, INC.

Dates

Publication Date: 20260507
Application Date: 20251222

Claims (20)

1 . A method, comprising: receiving a query from a client device; upon determining that the query includes insufficient details for generating a query response; augmenting the query to generate an augmented query that includes sufficient details for generating the query response; summarizing the augmented query into a summarized query; determining a subset of documents relevant to the query in part by determining an optimal configuration for the query, wherein determining the optimal configuration for the query includes classifying the summarized query into a first query category of a plurality of known query categories, wherein classifying the summarized query includes: converting the summarized query into an embedding vector; determining, using an embedding model, a stored query that is most similar to the summarized query; and determining a category associated with the stored query that is most similar to the summarized query to be the first query category; and outputting to the client device a query response based on the subset of documents.
2 . The method of claim 1 , wherein the query includes insufficient details when the response generator determines that a relevant document cannot be retrieved to answer the query.
3 . The method of claim 1 , wherein augmenting the query includes sending to the client device one or more follow up questions.
4 . The method of claim 1 , wherein converting the summarized query into an embedding vector includes utilizing a natural language processor to generate the embedding vector.
5 . The method of claim 1 , wherein determining a stored query that is most similar to the summarized query includes determining an embedding vector corresponding to a stored query that is closest to the embedding vector corresponding to the summarized query in an embedding space from a plurality of embedding vectors corresponding to a plurality of stored queries located in a plurality of different positions in the embedding space.
6 . The method of claim 1 , wherein determining a category associated with the stored query that is most similar to the summarized query to be the first query category includes computing one or more similarity values between the plurality of vectors corresponding to the plurality of stored queries and the embedding vector corresponding to the summarized query.
7 . The method of claim 1 , wherein classifying the summarized query further includes inputting the subset of documents in a context window for a response generator and the query as a prompt for the query response.
8 . The method of claim 1 , wherein classifying the summarized query further includes assigning corresponding weights to the subset of documents based on the first query category.
9 . The method of claim 1 , wherein determining the subset of documents relevant to the query includes generating a plurality of different documents sets for a set of the documents.
10 . The method of claim 9 , wherein determining the subset of documents relevant to the query includes providing to a response generator a summarized query and each document set of the plurality of different document sets.
11 . The method of claim 10 , wherein determining the subset of documents relevant to the query further includes generating a corresponding score for each document included in the set of the documents.
12 . The method of claim 11 , wherein the corresponding score for a first document included in the set of documents increases in response to receiving from the response generator a positive response indicating that the response generator has determined that it can generate the query response utilizing the first document.
13 . The method of claim 12 , wherein the response generator has determined that it can generate the query response utilizing the first document by itself.
14 . The method of claim 12 , wherein the response generator has determined that it can generate the query response utilizing the first document in conjunction with one or more other documents included in the set of the documents.
15 . A system, comprising: query receiving means for receiving a query from a client device; query augmenting means for, upon determining that the query includes insufficient details for generating a query response, augmenting the query to generate an augmented query that includes sufficient details for generating the query response; query summarizing means for summarizing the augmented query into a summarized query; document subset determining means for determining a subset of documents relevant to the query, at least in part by determining an optimal configuration for the query, wherein determining the optimal configuration includes classifying the summarized query into a first query category of a plurality of known query categories, and wherein the document subset determining means includes: embedding generation means for converting the summarized query into an embedding vector; similarity determination means for determining, using an embedding model, a stored query that is most similar to the summarized query; and category determination means for determining a category associated with the stored query that is most similar to the summarized query to be the first query category; and response output means for outputting to the client device a query response based on the subset of documents; and memory means coupled to the foregoing means and storing instructions executable to perform the functions of the foregoing means.
16 . The system of claim 15 , wherein the query augmenting means determines that the query includes insufficient details when a response generator determines that a relevant document cannot be retrieved to answer the query.
17 . The system of claim 15 , wherein the query augmenting means augments the query by sending to the client device one or more follow-up questions.
18 . The system of claim 15 , wherein the embedding generation means converts the summarized query into the embedding vector by utilizing a natural language processor to generate the embedding vector.
19 . The system of claim 15 , wherein the similarity determination means determines the stored query that is most similar to the summarized query by determining an embedding vector corresponding to a stored query that is closest to the embedding vector corresponding to the summarized query in an embedding space from a plurality of embedding vectors corresponding to a plurality of stored queries located in a plurality of different positions in the embedding space.
20 . A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: receiving a query from a client device; upon determining that the query includes insufficient details for generating a query response, augmenting the query to generate an augmented query that includes sufficient details for generating the query response; summarizing the augmented query into a summarized query; determining a subset of documents relevant to the query in part by determining an optimal configuration for the query, wherein determining the optimal configuration for the query includes classifying a summarized query into a first query category of a plurality of known query categories, wherein classifying the summarized query includes: converting the summarized query into an embedding vector; determining, using an embedding model, a stored query that is most similar to the summarized query; and determining a category associated with the stored query that is most similar to the summarized query to be the first query category; and outputting to the client device a query response based on the subset of documents.

Description

CROSS REFERENCE TO OTHER APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/767,729 entitled QUERY RESPONSE SYSTEM IMPLEMENTING A RETRIEVAL-AUGMENT GENERATION ARCHITECTURE filed Jul. 9, 2024 which is incorporated herein by reference for all purposes. BACKGROUND OF THE INVENTION Large Language Models (LLMs) are typically trained on publicly available documents. As a result, they may struggle to answer domain-specific questions if such documents were not included in their training data. Retrieval-Augmented Generation (RAG) is an architecture used for knowledge-based question answering, particularly useful when the required data was not part of the model's training set. RAG can reduce the likelihood of hallucination in LLM responses, though it does not eliminate them entirely. There are several potential failure points in a RAG-based approach that can impact the reliability of the responses. For example, if irrelevant or conflicting documents are retrieved, it may cause the LLM to generate hallucinated responses. Additionally, the absence of relevant documents can also lead to hallucinations in the LLM response. BRIEF DESCRIPTION OF THE DRAWINGS Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings. FIG. 1 is a block diagram illustrating a system to generate a query response in accordance with some embodiments. FIG. 2 is a flow diagram illustrating a process to generate a query response in accordance with some embodiments. FIG. 3 is a flow diagram illustrating a process to augment a query in accordance with some embodiments. FIG. 4 is a flow diagram illustrating a process to determine an optimal configuration for a query in accordance with some embodiments. FIG. 5 is a flow diagram illustrating a process to determine a set of documents to be used to answer a query in accordance with some embodiments. DETAILED DESCRIPTION The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions. A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured. An enhanced RAG architecture to achieve higher accuracy and reliability in RAG-based LLM applications is disclosed herein. The disclosed architecture reduces the hallucinations generated by an LLM in a query response to zero or near-zero, and causes the LLM to generate highly accurate query responses. The disclosed architecture is highly reliable for consistently generating accurate responses. In some embodiments, the disclosed architecture is implemented as a customer chatbot to address common queries using public documents. In some embodiments, the disclosed architecture is implemented for customer support to internally resolve specific issues and questions. A query is received from a client device at a query response system implementing the enhanced RAG architecture. The query response system includes a query augmentor to collect sufficient details from a user associated with the client device to accurately retrieve documents to answer the query. The query augmentor asks the user associated with the client device one or more follow up questions until sufficient information is collecte