US-12619830-B2 - Optimizing performance of conversational interface applications using example forgetting

US12619830B2US 12619830 B2US12619830 B2US 12619830B2US-12619830-B2

Abstract

Methods and apparatuses for optimizing performance of conversational interface applications using example forgetting include a server that retrieves training data comprising utterances each mapped to one or more known intents. The server determines a forgetting count for each utterance and selects utterances from the training data that have a forgetting count above a predetermined threshold. The server identifies whether the predicted intent associated with each utterance is accurate. The server generates updated training data comprising the selected utterances and corresponding predicted intents, and trains conversational interface applications using the updated training data. The server validates performance of the trained conversational interface applications and saves the updated training data.

Inventors

Chen Bi
Ou Li
Yong Zou
Sijing Lv
Bing Cui
Tieyi Guo
Byung Chun

Assignees

FMR LLC

Dates

Publication Date: 20260505
Application Date: 20240429

Claims (20)

1 . A computer system for optimizing performance of conversational interface applications using example forgetting, the system comprising a server computing device having a memory that stores computer-executable instructions and a processor that executes the computer-executable instructions to: retrieve a corpus of conversational interface application training data from a data store, the conversational interface application training data comprising a plurality of utterances each mapped to one or more known intents; determine a forgetting count for each of the plurality of utterances, comprising: a) executing an intent prediction model using the plurality of utterances as input to predict an intent for each of the utterances, the predicted intent associated with a confidence score; b) determining an accuracy of the predicted intent for each utterance by comparing the predicted intent to the known intent for the utterance; c) incrementing a forgetting count for each utterance when the determined accuracy of the predicted intent for the utterance is lower than a prior accuracy of the predicted intent for the utterance, and d) repeating steps a)-c) for each utterance using the determined accuracy as the prior accuracy for the next execution of the intent prediction model until the determined accuracy reaches a minimum value; select one or more utterances from the corpus of conversational interface application training data that have a forgetting count above a threshold; identify whether the predicted intent associated with each of the selected utterances is accurate based upon (i) word characteristics of the utterance, (ii) semantic meaning of the utterance, and (iii) the confidence score associated with the predicted intent; generate an updated corpus of conversational interface application training data comprising one or more of the selected utterances and corresponding predicted intents based upon the identified accuracy; train one or more conversational interface applications using the updated corpus of conversational interface application training data; and validate performance of the trained conversational interface applications and store the updated corpus of conversational interface application training data in the data store.
2 . The system of claim 1 , wherein the intent prediction model comprises a machine learning classification model configured to generate a predicted intent for an input utterance using a supervised learning algorithm.
3 . The system of claim 2 , wherein the intent prediction model is provided by an external conversational interface application platform.
4 . The system of claim 1 , wherein determining an accuracy of the predicted intent for each utterance comprises comparing the predicted intent to the known intent using a similarity metric and calculating the accuracy of the predicted intent based upon the similarity metric.
5 . The system of claim 4 , wherein the accuracy of the predicted intent for each utterance includes a margin of error.
6 . The system of claim 4 , wherein the accuracy of the predicted intent for each utterance includes a loss function value.
7 . The system of claim 1 , wherein selecting one or more utterances from the corpus of conversational interface application training data that have a forgetting count above a predetermined threshold value comprises: ranking the plurality of utterances using the forgetting count; and selecting one or more utterances based upon the rank assigned to each utterance.
8 . The system of claim 1 , wherein training one or more conversational interface applications using the updated corpus of conversational interface application training data comprises performing a regression test to determine performance characteristics of the conversational interface applications.
9 . The system of claim 8 , wherein validating performance of the trained conversational interface applications comprises comparing the performance characteristics of each conversational service application to historical performance characteristics.
10 . The system of claim 1 , wherein retrieving a corpus of conversational interface application training data from a data store comprises convert the corpus of conversational interface application training data into a format acceptable as input to the intent prediction model.
11 . The system of claim 10 , wherein the corpus of conversational interface application training data is partitioned into one or more subgroups for processing by the intent classification model.
12 . A computerized method of optimizing performance of conversational interface applications using example forgetting, the system comprising a server computing device having a memory that stores computer-executable instructions and a processor that executes the computer-executable instructions to: retrieve a corpus of conversational interface application training data from a data store, the conversational interface application training data comprising a plurality of utterances each mapped to one or more known intents; determine a forgetting count for each of the plurality of utterances, comprising: a) executing an intent prediction model using the plurality of utterances as input to predict an intent for each of the utterances, the predicted intent associated with a confidence score; b) determining an accuracy of the predicted intent for each utterance by comparing the predicted intent to the known intent for the utterance; c) incrementing a forgetting count for each utterance when the determined accuracy of the predicted intent for the utterance is lower than a prior accuracy of the predicted intent for the utterance, and d) repeating steps a)-c) for each utterance using the determined accuracy as the prior accuracy for the next execution of the intent prediction model; select one or more utterances from the corpus of conversational interface application training data that have a forgetting count above a predetermined threshold value; identify whether the predicted intent associated with each of the selected utterances is accurate based upon (i) word characteristics of the utterance, (ii) semantic meaning of the utterance, and (iii) the confidence score associated with the predicted intent; generate an updated corpus of conversational interface application training data comprising one or more of the selected utterances and corresponding predicted intents based upon the accuracy of the predicted intent; train one or more conversational interface applications using the updated corpus of conversational interface application training data; and validate performance of the trained conversational interface applications and store the updated corpus of conversational interface application training data in the data store.
13 . The method of claim 12 , wherein the intent prediction model comprises a machine learning classification model configured to generate a predicted intent for an input utterance using a supervised learning algorithm.
14 . The method of claim 13 , wherein the intent prediction model is provided by an external conversational interface application platform.
15 . The method of claim 12 , wherein determining an accuracy of the predicted intent for each utterance comprises comparing the predicted intent to the known intent using a similarity metric and calculating the accuracy of the predicted intent based upon the similarity metric.
16 . The method of claim 15 , wherein the accuracy of the predicted intent for each utterance includes a margin of error.
17 . The method of claim 16 , wherein the accuracy of the predicted intent for each utterance includes a loss function value.
18 . The method of claim 12 , wherein selecting one or more utterances from the corpus of conversational interface application training data that have a forgetting count above a predetermined threshold value comprises: ranking the plurality of utterances using the forgetting count; and selecting one or more utterances based upon the rank assigned to each utterance.
19 . The method of claim 12 , wherein training one or more conversational interface applications using the updated corpus of conversational interface application training data comprises performing a regression test to determine performance characteristics of the conversational interface applications.
20 . The method of claim 19 , wherein validating performance of the trained conversational interface applications comprises comparing the performance characteristics of each conversational service application to historical performance characteristics.

Description

TECHNICAL FIELD This application relates generally to methods and apparatuses, including computer program products, for optimizing performance of conversational interface applications using example forgetting. BACKGROUND A virtual assistant application (also called a chatbot) is a computer software application and/or computing system that communicates with users at client computing devices through an exchange of text messages and/or audio messages during conversations. Virtual assistants are commonly used in different areas of daily life with high efficiency and low costs, such as providing weather forecasts, giving business advice, and responding to queries. Generally, the technology behind a virtual assistant application comprises a Natural Language Understanding (NLU) and/or Natural Language Processing (NLP) algorithm that captures user input messages (also called utterances), parses the messages, and attempts to discern the intent or reason for the user's messages. Certain types of virtual assistants are task-oriented, meaning that the virtual assistant receives a user message, recognizes one or more user intents and/or entities of the message, retrieves information that is related to or otherwise responsive to the message, and generates a response message that is provided to the user. In some cases, virtual assistant applications leverage advanced machine learning technology—such as intent recognition models—in order to comprehend the intent behind a user's message more accurately or efficiently. Generally, an intent recognition model attempts to map a user's message to a particular user intent that is defined in the virtual assistant, where the intent provides the virtual assistant with a starting point from which to respond to the user message. Although a virtual assistant can be highly useful for business owners (e.g., by reducing or eliminating the need for live customer service staff, more quickly responding to user queries, etc.), sometimes the virtual assistant may be unable to determine the proper user intent or may determine an incorrect user intent because, e.g., the chatbot may be unable to parse or understand a particular message from an end user (a so-called unrecognized or incomprehensible message). For example, when a chatbot does not comprehend a user's message, the chatbot may simply respond with a default message such as “Sorry, I don't understand what you mean by that,” continually asks the user to repeat the message or state the message in a different way, or the virtual assistant may provide a response to the user that includes irrelevant or incorrect information. As a result, some end users may stop interacting with virtual assistant systems due to experiencing these difficulties with the virtual assistant understanding the user's messages and providing unexpected or undesirable responses. Such activity leads to user dissatisfaction with the virtual assistant technology. To avoid these problems, developers try to improve the virtual assistant application performance by re-training the underlying intent recognition model so that the model better understands the intent behind the requests/messages originating from end users. As can be appreciated, upon closer inspection, certain messages should actually be mapped to different user intents than they are currently, while other messages should be mapped to new user intents. When correct intent mappings are determined for user utterances, the data can be absorbed into the model training dataset to help re-train and improve the existing intent classifier model. In some circumstances, the creation and validation of training data (e.g., a corpus of utterances and known intents) for virtual assistant NLP/NLU models is difficult and time-consuming. In addition, certain utterances in the training data may not contribute to improving the accuracy and robustness of the NLP/NLU models because the models may consistently make accurate intent predictions for those utterances. SUMMARY Therefore, what is needed are methods and systems for automatically generating useful and relevant training data for NLP/NLU intent prediction models in conversation service applications that can improve the accuracy and responsiveness of those applications. The techniques described herein advantageously analyze existing intent prediction model training data using a framework of example forgetting in order to identify portions of training data that can be used to re-train intent prediction models and lead to meaningful improvement. Additional benefits realized by the methods and systems described herein include understanding the impact of each training utterance on the performance of the intent prediction model, which helps guide the chatbot designer in ways to update training data at the intent level which positively impacts the prediction determinations of the intent prediction model during both training and testing. Furthermore, the technology described herein redu