US-12619649-B1 - Large language model input preprocessing and refinement
Abstract
At least one processor can receive a text input from a user interface (UI) comprising at least a portion of a prompt to a large language model (LLM). The at least one processor can classify the text input as having a negative sentiment classification using a machine learning (ML) model configured to classify inputs according to expected user sentiments in reaction to an LLM response, the ML model being configured to classify the inputs from available classifications including at least a positive sentiment classification and one or more available negative sentiment classifications. In response to the classifying, the at least one processor can prevent input of the prompt to the LLM, determine information to add to the prompt to change the negative sentiment classification to the positive sentiment classification, and cause the UI to display a reply requesting the information to add to the prompt.
Inventors
- Shon MENDELSON
- Gal Elgavish
- Hadar Lackritz
- Kaaleb EDERY
Assignees
- INTUIT INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20241101
Claims (20)
- 1 . A method comprising: receiving, by at least one processor, a first text input from a user interface (UI), the first text input comprising at least a portion of a first prompt to a large language model (LLM); classifying, by the at least one processor, the first text input as having a negative sentiment classification using a machine learning (ML) model configured to classify inputs according to expected user sentiments in reaction to an LLM response, wherein the ML model is configured to classify the inputs from available classifications including at least a positive sentiment classification and a plurality of available negative sentiment classifications, each respective one of the plurality of available negative sentiment classifications defining a respective cause of negative sentiment; in response to the classifying, preventing, by the at least one processor, input of the first prompt to the LLM; determining, by the at least one processor, information to add to the first prompt to change the negative sentiment classification to the positive sentiment classification, the information corresponding to the cause of negative sentiment defined by the negative sentiment classification; and causing, by the at least one processor, the UI to display a reply requesting the information to add to the first prompt.
- 2 . The method of claim 1 , further comprising: receiving, by the at least one processor, a second text input from the UI; classifying, by the at least one processor, the second text input as having the positive sentiment classification using the ML model; and in response to the classifying of the second text input, causing, by the at least one processor, input of a prompt to the LLM.
- 3 . The method of claim 2 , wherein: the second text input is received after causing the UI to display the reply; the second text input includes the information; and the prompt input to the LLM comprises the first prompt and the information.
- 4 . The method of claim 1 , wherein: the plurality of available negative sentiment classifications comprise at least one of: a lack of context classification indicating negative sentiment due to a lack of contextual data available to the LLM for preparing the LLM response; and a phrasing classification indicating poor phrasing in the first prompt.
- 5 . The method of claim 1 , further comprising configuring, by the at least one processor, the ML model to classify the inputs, wherein the configuring comprises: receiving, by the at least one processor, a training data set comprising a plurality of pairs of questions and corresponding user sentiments; and processing, by the at least one processor, the training data set using a pre-trained sentiment analysis ML model, thereby producing a labeled training data set wherein the plurality of pairs are respectively classified as having a positive sentiment classification or a negative sentiment.
- 6 . The method of claim 5 , wherein the configuring further comprises labeling, by the at least one processor, each pair of the labeled training data set having a negative sentiment with a respective one of the plurality of available negative sentiment classifications.
- 7 . The method of claim 5 , wherein: the ML model comprises a pre-trained language model; and the configuring further comprises fine-tuning, by the at least one processor, the ML model on the labeled training data set.
- 8 . A method comprising: receiving, by at least one processor, a training data set comprising a plurality of pairs of questions and corresponding user sentiments; processing, by the at least one processor, the training data set using a pre-trained sentiment analysis machine learning (ML) model, thereby producing a labeled training data set wherein the plurality of pairs are respectively classified as having a positive sentiment classification or a negative sentiment; labeling, by the at least one processor, each pair of the labeled training data set having a negative sentiment with a respective one of a plurality of available negative sentiment classifications, each respective one of the plurality of available negative sentiment classifications defining a respective cause of negative sentiment; fine-tuning, by the at least one processor, a pre-trained ML language model on the labeled training data set; and causing, by the at least one processor, the fine-tuned pre-trained ML language model to be used in production to prevent input of at least one prompt to a large language model (LLM) in response to the fine-tuned pre-trained ML language model processing the at least one prompt and classifying the at least one prompt as having one of the plurality of available negative sentiment classifications, wherein fine-tuned pre-trained ML language model is configured to classify inputs from available classifications including at least the positive sentiment classification and the plurality of available negative sentiment classifications.
- 9 . The method of claim 8 , wherein using the fine-tuned pre-trained ML language model in production comprises: receiving, by at least one production processor, a first text input from a user interface (UI), the first text input comprising at least a portion of a first prompt to the LLM; classifying, by the at least one production processor, the first text input as having a negative sentiment classification using the fine-tuned pre-trained ML language model; in response to the classifying, preventing, by the at least one production processor, input of the first prompt to the LLM; determining, by the at least one production processor, information to add to the first prompt to change the negative sentiment classification to the positive sentiment classification, the information corresponding to the cause of negative sentiment defined by the negative sentiment classification; and causing, by the at least one production processor, the UI to display a reply requesting the information to add to the first prompt.
- 10 . The method of claim 9 , wherein using the fine-tuned pre-trained ML language model in production further comprises: receiving, by the at least one production processor, a second text input from the UI; classifying, by the at least one production processor, the second text input as having the positive sentiment classification using the fine-tuned pre-trained ML language model; and in response to the classifying of the second text input, causing, by the at least one production processor, input of a prompt to the LLM.
- 11 . The method of claim 10 , wherein: the second text input is received after causing the UI to display the reply; the second text input includes the information; and the prompt input to the LLM comprises the first prompt and the information.
- 12 . The method of claim 9 , wherein: the plurality of available negative sentiment classifications comprise at least one of: a lack of context classification indicating negative sentiment due to a lack of contextual data available to the LLM for preparing the LLM response; and a phrasing classification indicating poor phrasing in the first prompt.
- 13 . The method of claim 8 , further comprising: determining, by the at least one processor, that a probability of a label of at least one of the plurality of pairs is below a threshold level; and removing, by the at least one processor, the at least one of the plurality of pairs from the labeled training data set.
- 14 . A system comprising: at least one processor; at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform processing comprising: receiving a first text input from a user interface (UI), the first text input comprising at least a portion of a first prompt to a large language model (LLM); classifying the first text input as having a negative sentiment classification using a machine learning (ML) model configured to classify inputs according to expected user sentiments in reaction to an LLM response, wherein the ML model is configured to classify the inputs from available classifications including at least a positive sentiment classification and a plurality of available negative sentiment classifications, each respective one of the plurality of available negative sentiment classifications defining a respective cause of negative sentiment; in response to the classifying, preventing input of the first prompt to the LLM; determining information to add to the first prompt to change the negative sentiment classification to the positive sentiment classification, the information corresponding to the cause of negative sentiment defined by the negative sentiment classification; and causing the UI to display a reply requesting the information to add to the first prompt.
- 15 . The system of claim 14 , wherein the processing further comprises: receiving a second text input from the UI; classifying the second text input as having the positive sentiment classification using the ML model; and in response to the classifying of the second text input, causing input of a prompt to the LLM.
- 16 . The system of claim 15 , wherein: the second text input is received after causing the UI to display the reply; the second text input includes the information; and the prompt input to the LLM comprises the first prompt and the information.
- 17 . The system of claim 14 , wherein: the plurality of available negative sentiment classifications comprise at least one of: a lack of context classification indicating negative sentiment due to a lack of contextual data available to the LLM for preparing the LLM response; and a phrasing classification indicating poor phrasing in the first prompt.
- 18 . The system of claim 14 , wherein the processing further comprises configuring the ML model to classify the inputs, wherein the configuring comprises: receiving a training data set comprising a plurality of pairs of questions and corresponding user sentiments; and processing the training data set using a pre-trained sentiment analysis ML model, thereby producing a labeled training data set wherein the plurality of pairs are respectively classified as having a positive sentiment classification or a negative sentiment.
- 19 . The system of claim 18 , wherein the configuring further comprises labeling each pair of the labeled training data set having a negative sentiment with a respective one of the one or more available negative sentiment classifications.
- 20 . The system of claim 18 , wherein: the ML model comprises a pre-trained language model; and the configuring further comprises fine-tuning the ML model on the labeled training data set.
Description
BACKGROUND With the increasing use of chatbots based on large language models (LLMs), and as users' expectations for more accurate and relevant responses from them continue to rise, users often experience the problem of chatbots generating unsatisfactory or irrelevant responses when they do not have enough information. Moreover, each time a user solicits a response from a chatbot, tokens must be sent from the requesting computing system to the LLM, the LLM must generate a response, and the user must review the response and request more information if it is unsatisfactory. This is costly in terms of tokens, network use, and chatbot latency. BRIEF DESCRIPTIONS OF THE DRAWINGS FIG. 1 shows an example LLM input preprocessing and refinement system according to some embodiments of the disclosure. FIG. 2 shows an example question classifier configuration process according to some embodiments of the disclosure. FIG. 3 shows an example LLM input preprocessing and refinement process according to some embodiments of the disclosure. FIG. 4 shows an example computing device according to some embodiments of the disclosure. DETAILED DESCRIPTION OF SEVERAL EMBODIMENTS Systems and methods described herein can provide automatic preprocessing of user inputs to chatbots or other LLM-based interactive systems to improve both the user experience and system performance. To identify and address potential issues in real-time, disclosed embodiments can train a question classifier offline using sentiment analysis with natural language processing (NLP)-based classification methods. Upon deployment, this classifier can analyze user inputs and predict when additional context or clarification may be needed to enhance response quality and prevent hallucinations by the LLM. In real-time, following the classifier's prediction, the chatbot can prompt the user to rephrase the question or provide more context. By taking these actions, the systems and methods described herein can improve the response quality of the chatbot to enhance overall user experience and reduce unnecessary costly calls to an LLM. Disclosed embodiments can improve the user experience of chatbot interactions by automatically identifying when additional context or a rephrasing of the user's question is required to enhance the chatbot's response. As a result, the chatbot can deliver more accurate, relevant, and helpful responses. Moreover, LLM usage typically follows a pay-per-token pricing model; and generating tokens, sending tokens to an LLM, and obtaining a response from the LLM is also costly in terms of latency, network use, and processing. By classifying user questions before interacting with an LLM, disclosed embodiments can reduce or even effectively eliminate unnecessary prompts being sent to the LLM chatbot. This results in a reduction of operational costs and latency associated with LLM chatbots. FIG. 1 shows an example LLM input preprocessing and refinement system 100 according to some embodiments of the disclosure. System 100 may include a variety of hardware, firmware, and/or software components that interact with one another and/or with external components, such as client 10 and/or LLM 20. The components of system 100 can include, for example, chatbot user interface (UI) 110, question classifier 120 (which may include offline processing components such as sentiment analysis model 122, NLP classifier 124 and/or data annotation 126), and/or retrieval augmented generation (RAG) database 130. While not illustrated as such, RAG database 130 may be external to system 100 in some embodiments, and/or LLM 20 may be included within system 100 in some embodiments. These elements are described in greater detail below, but in general, a user of client 10 can interact with chatbot UI 110, including by asking a question. Question classifier 120 can process the question to determine whether or not it is likely to cause LLM 20 to deliver a meaningful response. If so, question classifier 120 can pass the question as at least part of a query to LLM 20, which can use data from RAG database 130 and/or its own data to provide an answer to the question which can then be shown in chatbot UI 110. If the question is not likely to yield a meaningful response, question classifier 120 can provide different information for presentation in chatbot UI 110 and avoid contacting LLM 20. Some components within system 100 may communicate with one another using networks and/or locally. Some components may communicate with external components, such as client 10 and/or LLM 20, through one or more networks (e.g., the Internet, an intranet, and/or one or more networks that provide a cloud environment) and/or by other modes of data transfer. Each component may be implemented by one or more computers (e.g., as described below with respect to FIG. 4). Elements illustrated in FIG. 1 (e.g., system 100 (including chatbot UI 110, question classifier 120 and its components, and/or RAG database 130), client 1