EP-4523208-B1 - VOICE-BASED CHATBOT POLICY OVERRIDE(S) FOR EXISTING VOICE-BASED CHATBOT(S)

EP4523208B1EP 4523208 B1EP4523208 B1EP 4523208B1EP-4523208-B1

Inventors

GOLDSHTEIN, Sasha

Dates

Publication Date: 20260513
Application Date: 20240625

Claims (15)

A method implemented by one or more processors (514), the method comprising: receiving, from a first-party entity, a voice-based chatbot policy override (262A) for an existing third-party voice-based chatbot (292) that is managed by a third-party entity, the third-party entity being distinct from the first-party entity, and the voice-based chatbot policy override being associated with one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot; and causing the existing third-party voice-based chatbot to engage in a corresponding conversation with a human user via a client device (110) of the human user, wherein causing the existing third-party voice-based chatbot to engage in the corresponding conversation with the human user comprises: receiving audio data (201) that captures a spoken utterance provided by the human user; determining, based on processing the audio data that captures the spoken utterance and based on the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot, whether to utilize the voice-based chatbot policy override in responding to the spoken utterance or the existing third-party voice-based chatbot in responding to the spoken utterance; and in response to determining to utilize the voice-based chatbot policy override in responding to the spoken utterance: generating, using the voice-based chatbot policy override and in lieu of the third-party voice-based chatbot, and based on processing the audio data that captures the spoken utterance, a voice-based chatbot policy override response (203) that is responsive to the spoken utterance; and causing the voice-based chatbot policy override response that is responsive to the spoken utterance to be audibly rendered for presentation to the human user via one or more speakers of the client device of the human user.
The method of claim 1, further comprising: while the voice-based chatbot policy override response that is responsive to the spoken utterance is being generated: activating one or more third-party voice-based chatbot components, of the existing third-party voice-based chatbot, that are to be utilized in processing additional audio data that captures an additional spoken utterance provided by the human user, optionally wherein the one or more third-party voice-based chatbot components comprise one or more of: an automatic speech recognition, ASR, component (131); a natural language understanding, NLU, component (132); a fulfillment component (133); or a large language model, LLM, component (135), and further optionally wherein: the existing third-party voice-based chatbot is initially trained by the third-party entity.
The method of claim 2, further comprising: subsequent to causing the voice-based chatbot policy override response to the spoken utterance to be audibly rendered for presentation to the human user: receiving the additional audio data that captures the additional spoken utterance provided by the human user; generating, using the one or more third-party voice-based chatbot components, of the third-party voice-based chatbot, and in lieu of the voice-based chatbot policy override, and based on processing the additional audio data that captures the spoken utterance, a third-party voice-based chatbot response (204) that is responsive to the additional spoken utterance; and causing the third-party voice-based chatbot response that is responsive to the additional spoken utterance to be audibly rendered for presentation to the human user via one or more of the speakers of the client device of the human user.
The method of any preceding claim, wherein determining whether to utilize the voice-based chatbot policy override in responding to the spoken utterance or the existing third-party voice-based chatbot in responding to the spoken utterance based on processing the audio data that captures the spoken utterance and based on the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot comprises: processing, using an automatic speech recognition, ASR, model, the audio data that captures the spoken utterance to generate ASR output; processing, using a natural language understanding, NLU, model, the ASR output to generate NLU output; and determining, based on comparing the ASR output and/or the NLU output to the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot, whether to utilize the voice-based chatbot policy override in responding to the spoken utterance or the existing third-party voice-based chatbot in responding to the spoken utterance, optionally wherein determining to utilize the voice-based chatbot policy override in responding to the spoken utterance comprises: determining that the ASR output and/or the NLU output invokes one or more of the rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot.
The method of any one of claims 1 to 4, wherein determining whether to utilize the voice-based chatbot policy override in responding to the spoken utterance or the existing third-party voice-based chatbot in responding to the spoken utterance based on processing the audio data that captures the spoken utterance and based on the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot comprises: processing, using an automatic speech recognition, ASR, model, the audio data that captures the spoken utterance to generate ASR output; processing, using large language model, LLM, the ASR output to generate LLM output; and determining, based on comparing the ASR output and/or the LLM output to the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot, whether to utilize the voice-based chatbot policy override in responding to the spoken utterance or the existing third-party voice-based chatbot in responding to the spoken utterance, optionally wherein determining to utilize the voice-based chatbot policy override in responding to the spoken utterance comprises: determining that the ASR output and/or the LLM output invokes one or more of the rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot.
The method of any preceding claim, wherein the third-party entity specifies the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot, optionally wherein an additional third-party entity, that is in addition to the third-party entity and the first-party entity, specifies the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot.
The method of any preceding claim, wherein the voice-based chatbot policy override corresponds to a machine learning, ML, model (130) that is trained based on a plurality of historical conversations and a description of the one or more rules for when to utilize the voice-based chatbot policy override, optionally wherein the voice-based chatbot policy override for the existing third-party voice-based chatbot is proactively provided by the first-party entity and to the third-party entity, and further optionally wherein the method further comprises: prior to receiving the voice-based chatbot policy override for the existing third-party voice-based chatbot that is managed by the third-party entity: transmitting, to the first-party entity, an indication of a need for the voice-based chatbot policy override for the existing third-party voice-based chatbot.
The method of any preceding claim, further comprising: in response to determining to utilize the third-party voice-based chatbot in responding to the spoken utterance: generating, using the third-party voice-based chatbot and in lieu of the voice-based chatbot policy override, a third-party voice-based chatbot response that is responsive to the spoken utterance; and causing the third-party voice-based chatbot response that is responsive to the spoken utterance to be audibly rendered for presentation to the human user via one or more of the speakers of the client device of the human user, optionally wherein the corresponding conversation is initiated by the human user by placing a telephone call to the third-party entity via the client device of the human user or wherein the corresponding conversation is initiated by the existing third-party voice-based chatbot by placing a telephone call to the human user.
The method of any preceding claim, wherein the one or more rules include at least a temporal period and/or location constraint for when to utilize the voice-based chatbot policy override in lieu of the third-party voice-based chatbot.
The method of claim 9, wherein using the voice-based chatbot policy override and in lieu of the third-party voice-based chatbot in generating the voice-based chatbot policy override response that is responsive to the spoken utterance enables the third-party entity to test functionality that has not been deployed by the third-party voice-based chatbot.
A method implemented by one or more processors (514), the method comprising: generating a voice-based chatbot policy override (262A) for an existing voice-based chatbot (292), the voice-based chatbot policy override being associated with one or more rules for when to utilize the voice-based chatbot policy override in lieu of the existing voice-based chatbot; and causing the existing voice-based chatbot to engage in a corresponding conversation with a human user via a client device (110) of the human user, wherein causing the existing voice-based chatbot to engage in the corresponding conversation with the human user comprises: receiving audio data (201) that captures a spoken utterance provided by the human user; determining, based on processing the audio data that captures the spoken utterance and based on the one or more rules for when to utilize the voice-based chatbot policy override in lieu of the existing voice-based chatbot, whether to utilize the voice-based chatbot policy override in responding to the spoken utterance or the existing voice-based chatbot in responding to the spoken utterance; and in response to determining to utilize the voice-based chatbot policy override in responding to the spoken utterance: generating, using the voice-based chatbot policy override and in lieu of the existing voice-based chatbot, and based on processing the audio data that captures the spoken utterance, a voice-based chatbot policy override response (203) that is responsive to the spoken utterance; and causing the voice-based chatbot policy override response that is responsive to the spoken utterance to be audibly rendered for presentation to the human user via one or more speakers of the client device of the human user.
The method of claim 11, wherein the existing voice-based chatbot is associated with a first-party entity, and wherein the voice-based chatbot policy override is generated by the first-party entity.
The method of claim 11, wherein the one or more rules include at least a temporal period and/or location constraint for when to utilize the voice-based chatbot policy override in lieu of the voice-based chatbot.
A system comprising: one or more hardware processors (514); and memory (525) storing instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform the method of any one of claim 1 to 13.
Computer instructions, optionally stored on a non-transitory computer-readable storage medium (526), that, when executed by one or more hardware processors (514), cause the one or more hardware processors to perform operations according to the method of any one of claim 1 to 13.

Description

Background Humans may engage in human-to-computer dialogs with interactive software applications referred to as "chatbots," "automated assistants", "intelligent personal assistants," etc. (referred to herein as "chatbots"). As one example, these chatbots may correspond to a machine learning model or a combination of different machine learning models, and may be utilized to perform various tasks on behalf of users. For instance, some of these chatbots can conduct conversations with various humans to perform action(s) on behalf of an another human or on behalf of an entity. In some of these instances, the conversations conducted by these chatbots can include voice-based conversations (these chatbots are referred to herein as "voice-based chatbots"), such as conversations conducted locally at a computing device, conducted remotely over multiple computing devices via a telephonic network or other network, or other voice-based scenarios. However, functionality of some these voice-based chatbots may be limited in various manners. For example, functionality of some these voice-based chatbots may be limited by pre-defined intent schemas that the voice-based chatbots utilize to perform the action(s). In other words, if a human that is engaged in a given conversation with a given voice-based chatbot provides a spoken utterance that is determined to include an intent not defined by the pre-defined intent schemas, the given voice-based chatbot may fail. Further, to update these voice bots, existing intent schemas may be modified or new intent schemas may be added. As another example, functionality of some these voice bots may be limited by a corpus of examples utilized to train the voice-based chatbots. In other words, if a human that is engaged in a given conversation with a given voice-based chatbot provides a spoken utterance that was not included in the given corpus of examples, the given voice-based chatbot may fail. Further, to update these voice-based chatbots, existing examples in the corpus may be modified or new examples may be added. However, in both of these examples, there are virtually limitless intent schemas and/or examples that may need to be previously defined to make the voice-based chatbots robust to various nuances of human speech and to mitigate instances of failure. Notably, extensive utilization of computational resources is required to manually define and/or manually refine such intent schemas and/or examples, and to re-train these voice-based chatbots. Further, even if a large quantity of intent schemas and/or examples are defined, a large amount of memory is required to store and/or utilize the large quantity of intent schemas for these voice-based chatbots, and/or to re-train these voice-based chatbots based on the large quantity of examples in the corpus. Accordingly, there is a need in the art for techniques to modify and/or supplement functionality of these voice-based chatbots in a more computationally efficient manner. US11341335B1 describes a method which includes receiving a user input from a client system associated with a user, determining a task based on the user input and a confidence score associated with the task, generating one or more first dialog acts based on a task policy which specifies dialog acts associated with the task, generating one or more second dialog acts based on an override policy responsive to the confidence score being less than a threshold score, wherein the override policy specifies dialog acts that modify dialog acts specified by the task policy; and sending instructions for presenting a response to the user input to the client system, wherein the response is based on one or more of the first dialog acts or the second dialog acts. Alex Marin ET AL: "Flexible, Rapid Authoring of Goal-Orientated, Multi-Turn Dialogues Using the Task Completion Platform" (24 July 2016) describes authoring tasks in a variety of dialogue styles, ranging from entirely flexible to fully system initiative. This flexibility is enabled by a set of task-level policy override constructs, which augment or constrain the default platform-level policy to achieve the desired system behavior. US 20220293093 A1 describes federated learning of machine learning ("ML") model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. US20190340527A1 describes technologies pertaining to creating and/or updating a chatbot. Graphical user interfaces (GUIs) are described that facilitate updating a computer-implemented response model of the chatbot based upon interaction between a developer and features of the