CN-122003680-A - System and method for proactively reducing illusions in generated Artificial Intelligence (AI) model responses

CN122003680ACN 122003680 ACN122003680 ACN 122003680ACN-122003680-A

Abstract

A method, computer program product, and computing system for processing a prompt for a target-generated AI model and a corresponding response generated by the target-generated AI model for the prompt. The prompt and corresponding response from the generated AI model are compared to a plurality of predefined validated prompt-response pairs. Responsive to determining that at least a threshold similarity exists between the prompt and the corresponding response and the predefined validated prompt-response pair, a corresponding response from the target-generated AI model is provided to a source of the prompt.

Inventors

P. Seth Lohmann
S. Saha

Assignees

微软技术许可有限责任公司

Dates

Publication Date: 20260508
Application Date: 20241021
Priority Date: 20231114

Claims (15)

1. A computer-implemented method performed on a computing device, comprising: processing a prompt for a target-generated AI model and a corresponding response generated by the target-generated AI model for the prompt; Comparing the prompt and the corresponding response from the generated AI model with a plurality of predefined validated prompt-response pairs, and Providing the corresponding response from the target-generated AI model to a source of the prompt in response to determining that at least a threshold similarity exists between the prompt and the corresponding response and a predefined validated prompt-response pair.
2. The computer-implemented method of claim 1, further comprising: In response to providing the corresponding response from the target-generated AI model to the source of the prompt, processing feedback regarding the corresponding response; comparing the prompt and the corresponding response from the generated AI model with the plurality of predefined validated prompt-response pairs, and The feedback is applied to the target-generated AI model in response to determining that at least a threshold similarity exists between the prompt and the corresponding response and a predefined validated prompt-response pair.
3. The computer-implemented method of claim 2, preventing negative feedback from being applied to the goal-generated AI model in response to determining that at least the threshold similarity exists between the corresponding response and the prompt and a predefined validated prompt-response pair from the plurality of predefined validated prompt-response pairs.
4. The computer-implemented method of claim 1, wherein comparing the hint and the corresponding response to the plurality of predefined validated hint-response pairs comprises: Generating an embedding representing the hint; performing a vector similarity search for the hint from the plurality of predefined validated hint-response pairs using the embedding that represents the hint; identifying a threshold number of most similar cues from the plurality of predefined validated cue-response pairs; Obtaining the corresponding responses for the threshold number of most similar cues from the plurality of predefined validated cue-response pairs; generating an embedding representing each corresponding response to the threshold number of most similar cues, and Vector similarity searches are performed for the corresponding responses from the plurality of predefined validated prompt-response pairs using the embedding representing the prompt.
5. The computer-implemented method of claim 1, further comprising: In response to determining that the prompt and the corresponding response are at least less than the threshold similarity with any of the plurality of predefined validated prompt-response pairs, a default response from the target-generated AI model is provided.
6. The computer-implemented method of claim 1, further comprising: In response to determining that the prompt and the corresponding response are at least less than the threshold similarity with any of the plurality of predefined validated prompt-response pairs, providing a most similar predefined validated response from the plurality of predefined validated prompt-response pairs.
7. The computer-implemented method of claim 1, further comprising: the plurality of predefined validated hint-response pairs are generated by: extracting each paragraph of content from the validated document; Generating a hint for each extracted paragraph, and A corresponding response to each prompt is generated by processing the prompts and the corresponding extracted paragraphs.
8. A computing system, comprising: Memory, and A processor configured to process feedback regarding a response generated by the target-generated AI model for the prompt; and applying positive feedback to the target generated AI model in response to determining that there is at least a threshold similarity between the response and the prompt and the predefined validated prompt-response pairs from the plurality of predefined validated prompt-response pairs.
9. The computing system of claim 8, wherein the processor is further configured to: Negative feedback is prevented from being applied to the generated AI model in response to determining that at least the threshold similarity exists between the response and the prompt and predefined validated prompt-response pairs from a plurality of predefined validated prompt-response pairs.
10. The computing system of claim 8, wherein the processor is further configured to: processing the prompt for the target-generated AI model and the response generated by the target-generated AI model for the prompt; comparing the prompt and the response from the generated AI model with the plurality of predefined validated prompt-response pairs, and Providing the response from the target-generated AI model to a source of the prompt in response to determining that at least a threshold similarity exists between the prompt and the response and a predefined validated prompt-response pair.
11. The computing system of claim 9, providing a default response from the target-generated AI model in response to determining that at least the threshold similarity between the hint and the corresponding response and any of the plurality of predefined validated hint-response pairs is less than the threshold similarity.
12. The computing system of claim 8, wherein comparing the hint and the corresponding response to the plurality of predefined validated hint-response pairs comprises: Generating an embedding representing the hint; performing a vector similarity search for the hint from the plurality of predefined validated hint-response pairs using the embedding that represents the hint; identifying a threshold number of most similar cues from the plurality of predefined validated cue-response pairs; Obtaining the corresponding responses for the threshold number of most similar cues from the plurality of predefined validated cue-response pairs; generating an embedding representing each corresponding response to the threshold number of most similar cues, and Vector similarity searches are performed for the corresponding responses from the plurality of predefined validated prompt-response pairs using the embedding representing the prompt.
13. The computing system of claim 8, wherein the processor is further configured to: In response to determining that at least the threshold similarity between the prompt and the corresponding response and any of the plurality of predefined validated prompt-response pairs is less than the threshold similarity, positive feedback is prevented from being applied to the generated AI model.
14. The computing system of claim 8, wherein the processor is further configured to: negative feedback is applied to the objective generated AI model in response to determining that the prompt and the corresponding response are at least less than the threshold similarity between the prompt and any of the plurality of predefined validated prompt-response pairs.
15. A computer program product residing on a computer readable medium having stored thereon a plurality of instructions that, when executed by a processor, cause the processor to perform operations comprising: processing a prompt for a target-generated AI model and a corresponding response generated by the target-generated AI model for the prompt; comparing the prompt and the corresponding response from the generated AI model to a plurality of predefined validated prompt-response pairs; providing the corresponding response from the target-generated AI model to a source of the prompt in response to determining that at least a threshold similarity exists between the prompt and the corresponding response and a predefined validated prompt-response pair; processing feedback regarding the corresponding response; comparing the prompt and the corresponding response from the generated AI model with the plurality of predefined validated prompt-response pairs, and The feedback is applied to the target-generated AI model in response to determining that at least a threshold similarity exists between the prompt and the corresponding response and a predefined validated prompt-response pair.

Description

System and method for proactively reducing illusions in generated Artificial Intelligence (AI) model responses Background The generated AI model is being used in the Customer Support Service (CSS) and other fields to solve various problems and simplify data processing from various sources, improving the efficiency and accuracy of customer support by automating repetitive and simple tasks and providing consistent responses. However, the illusion in generating an AI model response results in the presentation of virtually incorrect or meaningless information to the user. Despite advances in natural language processing, this challenge still exists due to the complexity of language understanding and generation. Additionally, current methods rely on post-processing to correct for the illusion that has occurred in the response provided to the user. Drawings FIGS. 1A-1B are flowcharts of one implementation of a response verification process; FIG. 2 is a schematic view of the response verification process of FIGS. 1A-1B generating a predefined verification hint-response; FIG. 3 is a schematic view of the response verification process of FIGS. 1A-1B verifying the generated AI model response, and FIG. 4 is a schematic view of the response verification process of FIGS. 1A-1B verifying user feedback regarding the generated AI model response, and FIG. 5 is a schematic view of a computer system and response verification process coupled to a distributed computing network. Like reference symbols in the various drawings indicate like elements. Detailed Description Implementations of the present disclosure provide a two-point comparator model that improves the accuracy of the generated AI model response by actively reducing the illusion. For example, generative AI models including Natural Language Processing (NLP) have advanced in understanding and generating humanoid text. However, these generative AI models still face certain challenges, and one such problem is "illusion. Illusion refers to the phenomenon in which the generative AI model generates a response containing fictional or incorrect information without any actual basis in the input data or context. The manifestation of a illusion may vary in severity, ranging from a tiny factual error to generating an entire paragraph of imagined content. In some cases, the generated AI model may confidently present illusion information, guiding the user in believing the validity of the generated text. This phenomenon can be problematic, especially in critical areas where erroneous information can have serious consequences. To combat the hallucination problem, the two-point comparator model of the present disclosure involves a multi-stage process through a predefined, validated hint-pair generated by the generated AI model(s). As will be discussed in more detail below, the first stage validates the response by processing the generated AI model response with a predefined validated hint-response pair before displaying the response to the user. In this way, the generated AI model response is validated and the illusion is filtered out of the response before being presented to the prompting user. In the second phase, user feedback is verified by processing through a predefined verification prompt-response pair before invoking a Subject Matter Expert (SME) to verify/correct a biased or incorrect user feedback response. In this manner, the user feedback is filtered through a predefined validated prompt-response pair before being applied to the generated AI model training or tuning. As will be described in greater detail below, implementations of the present disclosure process a prompt for a target-generated AI model and a corresponding response generated by the target-generated AI model for the prompt. The prompt and corresponding response from the generated AI model are compared to a plurality of predefined validated prompt-response pairs. Responsive to determining that at least a threshold similarity exists between the prompt and the corresponding response and the predefined validated prompt-response pair, a corresponding response from the target-generated AI model is provided to a source of the prompt. As will be described in greater detail below, implementations of the present disclosure provide a process for automatically and proactively identifying and removing the illusions generated by the generated AI model, and improving the accuracy of the provided responses by removing incorrect or virtually erroneous information by converting each prompt into an embedded vector and converting each corresponding generated AI model response into a vector embedding, and performing a vector similarity calculation between the embedded representation of the prompt and the embedded representation of the corresponding response and the predefined validated prompt-response pairs. When the vector similarity between the embedded representation of the prompt and the corresponding response and the predefin