US-20260127226-A1 - METHOD FOR AUTOMATIC GENERATION OF FREQUENTLY ASKED QUESTIONS

US20260127226A1US 20260127226 A1US20260127226 A1US 20260127226A1US-20260127226-A1

Abstract

Methods and systems for generating a frequently asked questions are provided, which include defining, by a computer program executed by a computer, a first large language model (LLM) with a user query and a feedback of the user query; refining, by the computer program, the user query to a question set based on the feedback, the question set comprising one or more sentences; defining, by the computer program, a second LLM to generate a first set of question and answer pairs from a source document; defining, by the computer program, a third LLM to generate a content set from the source document based on a rewriting of the source document; selecting, by the computer program, top questions from the content set to be provided to the second LLM; and generating, by the second LLM, a second set of question and answer pairs based on the top questions.

Inventors

Olusegun Oshin
Shuaidong PAN
Milad Zafar Nezhad
Ale NABIJEE

Assignees

JPMORGAN CHASE BANK, N.A.

Dates

Publication Date: 20260507
Application Date: 20241107

Claims (6)

1 . A method comprising: inputting, by a computer program executed by a computer, a user query set into a first large language model (LLM); refining the user query set into a first question set based on a feedback criteria comprising an existence of a question word in a first question; inputting, by the computer program, a source document to a second LLM, the second LLM configured to generate a content set from the source document, wherein the source document is selected based on similarity to a content of the user query; selecting, by the computer program, a second question set as a subset of the first question set based on a semantic similarity to the content set, wherein the top semantic similarity is determined based on an embedding-wise cosine similarity score; generating, by a third LLM, a set of question and answer pairs based on the second question set provided by the computer program; selecting, by the computer program, a subset of the set of question and answer pairs based on a semantic similarity to the first question set; linking, by the computer program, the subset of the set of question and answer pairs to the source document, wherein the computer program saves a link to the source document in the cache to a database; publishing, by the computer program, the subset of the set of question and answer pairs; saving, by the computer program, the source document to a cache; periodically checking, by the computer program, changes to the source document; providing, by the computer program, the changes to the second LLM to generate a revised content set; providing, by the computer program, the revised content to the third LLM; generating, by the third LLM, a revised set of question and answer pairs; and deleting and replacing, by the computer program, the subset of the set of question and answer pairs with the revised set of question and answer pairs.
2 - 7 . (canceled)
8 . A computer processing system comprising: a memory configured to store instructions; and a hardware processor operatively coupled to the memory for executing the instructions of a text or call processing program to: input, by a computer program executed by a computer, a user query set into a first large language model (LLM); refine the user query set into a first question set based on a feedback criteria comprising an existence of a question word in a question word in the first question; input, by the computer program, a source document to a second LLM, the second LLM configured to generate a content set from the source document, wherein the source document is selected based on similarity to a content of the user query; select, by the computer program, a second question set as a subset of the first question set based on a semantic similarity to the content set; generate, by a third LLM, a set of question and answer pairs based on the second question set provided by the computer program; select, by the computer program, a subset of the set of question and answer pairs based on a semantic similarity to the first question set, wherein the top semantic similarity is determined based on an embedding-wise cosine similarity score; link, by the computer program, the subset of the set of question and answer pairs to the source document, wherein the computer program saves a link to the source document in the cache to a database; and publish, by the computer program, the subset of the set of question and answer pairs; save, by the computer program, the source document to a cache; periodically check, by the computer program, changes to the source document; provide, by the computer program, the changes to the second LLM to generate a revised content set; provide, by the computer program, the revised content to the third LLM; generate, by the third LLM, a revised set of question and answer pairs; and delete and replace, by the computer program, the subset of the set of question and answer pairs with the revised set of question and answer pairs.
9 - 14 . (canceled)
15 . A non-transitory computer readable storage medium, including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to perform steps comprising: inputting, by a computer program executed by a computer, a user query set into a first large language model (LLM); refining the user query set into a first question set based on a feedback criteria comprising an existence of a question word in a question word in the first question; inputting, by the computer program, a source document to a second LLM, the second LLM configured to generate a content set from the source document, wherein the source document is selected based on similarity to a content of the user query; selecting, by the computer program, a second question set as a subset of the first question set based on a semantic similarity to the content set; generating, by a third LLM, a set of question and answer pairs based on the second question set provided by the computer program; selecting, by the computer program, a subset of the set of question and answer pairs based on a semantic similarity to the first question set, wherein the top semantic similarity is determined based on an embedding-wise cosine similarity score; linking, by the computer program, the subset of the set of question and answer pairs to the source document, wherein the computer program saves a link to the source document in the cache to a database; publishing, by the computer program, the subset of the set of question and answer pairs; saving, by the computer program, the source document to a cache; periodically checking, by the computer program, changes to the source document; providing, by the computer program, the changes to the second LLM to generate a revised content set; providing, by the computer program, the revised content to the third LLM; generating, by the third LLM, a revised set of question and answer pairs; and deleting and replacing, by the computer program, the subset of the set of question and answer pairs with the revised set of question and answer pairs.
16 - 20 . (canceled)

Description

BACKGROUND 1. Field of the Invention Embodiments generally relate to systems and methods for generation of frequently asked questions. 2. Description of the Related Art The creation of Frequently Asked Questions (FAQs) is labor-intensive, relying heavily on subject matter experts or content creators meticulously reading content sources. It also requires manual extraction of key details from content sources before crafting appropriate questions and supplying answers. Current processes are time-consuming and do not scale with large numbers of content sources. As a result, oftentimes, current methods of producing FAQs result in FAQs that quickly become stale as updates occur to the original content. Also, since there is no link between content and FAQs, it is challenging at scale to determine which FAQs require updates when content is modified. These challenges make FAQ production even more challenging and time intensive. There is a need for a scalable, autonomous FAQ production model that leverages large language models to understand, extract, and summarize information from source documents. Further, there is a need for an autonomous FAQ model that can keep FAQs updated and current. SUMMARY According to some embodiments, the techniques described herein relate to a method including: defining, by a computer program executed by a computer, a first large language model (LLM) with a user query and a feedback of the user query; refining, by the computer program, the user query to a question set based on the feedback, the question set comprising one or more sentences; defining, by the computer program, a second LLM to generate a first set of question and answer pairs from a source document; defining, by the computer program, a third LLM to generate a content set from the source document based on a rewriting of the source document; selecting, by the computer program, top questions from the content set to be provided to the second LLM; generating, by the second LLM, a second set of question and answer pairs based on the top questions; determining, by the second LLM, answers to the question set based on the source document to generate a third set of question and answer pairs; semantic filtering, by the computer program, of first, second, and third sets of questions from the first, second, and third sets of question and answer pairs, the filtering being based on the user query; linking, by the computer program, the answers of the first, second, and third set of question and answer pairs to the filtered sets of questions; and publishing, by the computer program, the filtered set of questions and the linked answers. According to some embodiments, the techniques described herein relate to a method including: inputting, by a computer program executed by a computer, a user query set into a first large language model (LLM); refining the user query set into a first question set based on a criteria of a feedback, the criteria comprising an existence of a question word in the first question; inputting, by the computer program, a source document to a second LLM, the second LLM configured to generate a content set from the source document; selecting, by the computer program, a second question set as a subset of the first question set based on a semantic similarity to the content set; generating, by a third LLM, a set of question and answer pairs based on the second question set provided by the computer program; selecting, by the computer program, a subset of the set of question and answer pairs based on a semantic similarity to the first question set; linking, by the computer program, the subset of the set of question and answer pairs to the source document; and publishing, by the computer program, the subset of the set of question and answer pairs. According to some embodiments, the feedback may include whether an image is included in the FAQ response. According to some embodiments, the source document may be selected based on similarity to an input topic. According to some embodiments, the top questions may be a top 5 questions of the contents set. According to some embodiments, the instructions may comprise saving the source document to a cache, changes to the source document are checked periodically, the changes are provided to the second LLM, and the filtered set of questions and linked answers are regenerated and republished based on the changes. According to some embodiments, the instructions may comprise saving a link to the source document in the cache. According to some embodiments, older filtered sets of questions and linked answers are deleted before republishing. Embodiments consistent with the present disclosure include a system including one or more processors and one or more storage devices storing instructions that when executed by one or more processors, cause the processor to perform one or more steps of the methods disclosed herein. Embodiments consistent with the present disclosure include a computer processin