Search

EP-4740163-A2 - MULTI-AGENT COMPUTER SOFTWARE FRAMEWORK FOR A CONVERSATIONAL ARTIFICIAL INTELLIGENCE SYSTEM

EP4740163A2EP 4740163 A2EP4740163 A2EP 4740163A2EP-4740163-A2

Abstract

Methods and systems are presented for providing an artificial intelligence (AI)-based conversation system for facilitating a conversation with users and processing transactions for the users. The AI-based conversation system includes an AI model coupled with different backend modules. Based on an utterance submitted by a user during a chat session, the AI model is configured to generate instructions for a backend module to perform a transaction for the user based on a prompt template. The AI model also communicates the instructions to the backend module using a protocol specified in the prompt template. Upon receiving an output from the backend module, the AI model is configured to generate content for the chat session based on the output, and provide the content to the user.

Inventors

  • LANKA, Soujanya
  • WANG, GUANGSEN
  • VERMA, Reyha

Assignees

  • PayPal, Inc.

Dates

Publication Date
20260513
Application Date
20240731

Claims (20)

  1. 1. A system comprising: a non-transitory memory; and one or more hardware processors coupled with the non-transitory memory and configured to execute instructions from the non-transitory memory to cause the system to: receive an utterance from a device via a chat interface during a chat session; predict, for an artificial intelligence (Al) model, an intent of a user of the device based on the utterance, wherein the Al model is communicatively coupled with a plurality of backend modules, and wherein each backend module in the plurality of backend modules is configured to perform transactions corresponding to a different transaction type; select, from the plurality of backend modules, a particular backend module for the user during the chat session based on the intent; cause, by the Al model, the particular backend module to perform a transaction for the user based on the utterance and the intent; generate, by the Al model, content for the chat session based on a result from the particular backend module performing the transaction; and transmit the content to the device via the chat interface.
  2. 2. The system of claim 1 , wherein executing the instructions further causes the system to: obtain additional information via the chat interface, wherein the particular backend module is caused to perform the transaction for the user further based on the additional information.
  3. 3. The system of claim 2, wherein executing the instructions further causes the system to: generate, by the Al model, a set of requests or questions for the user based on a set of data types or content required by the particular backend module for performing the transaction; and provide the set of questions to the device via the chat interface, wherein the additional information obtained from the user corresponds to the set of request or questions.
  4. 4. The system of claim 3, wherein executing the instructions further causes the system to: determine the set of data types or content required by the particular backend module for performing the transaction based on a prompt template associated with the particular backend module.
  5. 5. The system of claim 1, wherein executing the instructions further causes the system to: obtain, by the Al model and from the particular backend module, an output based on the performing the transaction, wherein the content is generated based on the output.
  6. 6. The system of claim 5, wherein the output indicates whether the transaction has been completed, has been denied, or requires additional data to complete.
  7. 7. The system for claim 1, wherein executing the instructions further causes the system to: format the utterance based on a set of policies; and provide the formatted utterance to the Al model, wherein the intent is further predicted based on the formatted first utterance.
  8. 8. A method, comprising: receiving, via a chat interface during a chat session between a device and a computer system, a request for performing a transaction corresponding to a first transaction type; selecting, from a plurality of modules, a first module for a user of the device based on the first transaction type; generating, by an artificial intelligence (Al) model, instructions for the first module to perform the transaction based on information extracted from the request; providing, by the Al model, instructions to the first module; generating, by the Al model, content for the chat session based on an output produced by the particular module; and providing, by the Al model, the content to the device via the chat interface as a response to the request.
  9. 9. The method of claim 8, wherein the output comprises data associated with the computer system obtained from a plurality of data sources, and wherein the content comprises a summary of the data.
  10. 10. The method of claim 8, further comprising: retrieving, from a repository, a template corresponding to the particular module, wherein the template indicates a communication protocol for communicating with the particular module; and providing the template to the Al model, wherein the instructions are generated further based on the template.
  11. 11. The method of claim 8, wherein the instructions comprise an application programming interface (API) call, and wherein the method further comprises: inserting one or more input parameters to the API call based on the extracted information.
  12. 12. The method of claim 8, wherein the content is first content, and wherein the method further comprises: subsequent to providing the first content on the device, receiving a second request from the device via the chat interface during the chat session, wherein the second request is for performing a second transaction corresponding to a second transaction type; selecting, from the plurality of modules, a second module for the user based on the second transaction type; providing, by the Al model, second instructions to the second module; generating, by the Al model, second content for the chat session based on a second output produced by the second backend module; and providing the second content to the user via the chat interface.
  13. 13. The method of claim 8, wherein the request comprises an utterance, and wherein the method further comprises: processing the utterance based on a set of policies; and providing the utterance to the Al model, wherein the instructions are generated further based on the utterance.
  14. 14. The method of claim 13, wherein the processing the utterance comprises: identifying one or more words associated with a particular word type in the utterance; and removing the one or more words from the utterance.
  15. 15. A non-transitory machine-readable medium having stored thereon machine- readable instructions executable to cause a machine to perform operations comprising: in response to receiving a first utterance from a device via a chat interface during a chat session, determining that the first utterance corresponds to a request for performing a transaction of a particular transaction type; selecting, from a plurality of computer modules, a particular computer module for performing the transaction based on the particular transaction type; generating, by an artificial intelligence (Al) model, instructions that cause the particular computer module to perform the transaction for a user of the device; generating, by the Al model, content for the chat session based on a result from the particular computer module performing the transaction; and transmitting the content to the device via the chat interface.
  16. 16. The non-transitory machine-readable medium of claim 15, wherein the operations further comprise: obtaining additional information via the chat interface, wherein the particular computer module is configured to perform the transaction for the user further based on the additional information.
  17. 17. The non-transitory machine-readable medium of claim 16, wherein the operations further comprise: generating, by the Al model, one or more questions for the user based on data types or content required by the particular computer module for performing the transaction; and providing the one or more questions to the device via the chat interface, wherein the additional information obtained from the user corresponds to the one or more questions.
  18. 18. The non-transitory machine-readable medium of claim 17, wherein the operations further comprise: determining the data types or content required by the particular computer module for performing the transaction based on a prompt template associated with the particular computer module.
  19. 19. The non-transitory machine-readable medium of claim 15, wherein the operations further comprise: detecting that a first language used in the first utterance corresponds to a first language that is incompatible with the Al model; and translating the first utterance from the first language to a second language that is compatible with the Al model.
  20. 20. The non-transitory machine-readable medium of claim 15, wherein the result indicates whether the transaction has been completed, has been denied, or requires additional data to complete.

Description

MULTI-AGENT COMPUTER SOFTWARE FRAMEWORK FOR A CONVERSATIONAL ARTIFICIAL INTELLIGENCE SYSTEM BACKGROUND [0001] The present specification generally relates to computer-based automated interactive services, and more specifically, to a framework for providing a conversational artificial intelligence system configurable to interact with users and to perform various transactions for the users according to various embodiments of the disclosure. Related Art [0002] Service providers typically provide a platform for interacting with their users. The platform can be implemented as a website, a mobile application, or a phone service, through which the users may access data and/or services offered by the service provider. While these platforms can be interactive in nature (e.g., the content of the platform can be changed based on different user interactions, etc.), they are fixed and bound by their structures. In other words, users have to navigate through the platform to obtain the desired data and/or services. When the data and/or the service desired by a user is “hidden” (e.g., requiring multiple navigation steps that are not intuitive, etc.), it may be difficult for the user to access the data and/or the service purely based on manual navigation of the platform. [0003] In the past, service providers have often dedicated one or more information pages, such as a “Frequently Asked Questions (FAQ)” page, within the platforms for assisting users to access data and/or services that are popular in demand. The information pages may include predefined questions, such as “how to change my password” and pre-populated answers to the questions. However, given that the questions were pre-generated, a user who is looking for data and/or services is still required to navigate through the information pages to find a question that matches the data and/or services that the user desires. If the desired data and/or services do not match any of the questions on the information pages, the user will have to manually navigate the platform or contact a human agent of the service provider. Furthermore, the information pages also create an additional burden for the service provider, as the answers to the pre-generated questions would need to be reviewed and/or modified as necessary whenever any one of the platform, the data, and/or the services offered by the service provider is updated. Thus, there is a need for an advanced framework for providing data and/or services to users in a natural and intuitive way. BRIEF DESCRIPTION OF THE FIGURES [0004] FIG. 1 is a block diagram illustrating an electronic transaction system according to an embodiment of the present disclosure; [0005] FIG. 2 is a block diagram illustrating a conversation module according to an embodiment of the present disclosure; [0006] FIG. 3 illustrates an example data flow for processing utterances according to an embodiment of the present disclosure; [0007] FIG. 4 illustrates an example data flow for using an artificial intelligence model to process transactions according to an embodiment of the present disclosure; [0008] FIG. 5 is a block diagram of an evaluation module according to an embodiment of the present disclosure; [0009] FIG. 6 is a block diagram of a caching module according to an embodiment of the present disclosure; [00010] FIG. 7 is a flowchart showing a process of facilitating a conversation between an artificial intelligence model and a user according to an embodiment of the present disclosure; [00011] FIG. 8 is a flowchart showing a process of using an artificial intelligence model to instruct various software modules to process different types of transactions according to an embodiment of the present disclosure; [00012] FIG. 9 is a flowchart showing a process of evaluating the quality of a conversation system according to an embodiment of the present disclosure; [00013] FIG. 10 illustrates an example neural network that can be used to implement a machine learning model according to an embodiment of the present disclosure; and [00014] FIG. 11 is a block diagram of a system for implementing a device according to an embodiment of the present disclosure. [00015] Embodiments of the present disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures, wherein showings therein are for purposes of illustrating embodiments of the present disclosure and not for purposes of limiting the same. DETAILED DESCRIPTION [00016] The present disclosure describes methods and systems for providing a computer framework that uses one or more artificial intelligence (Al) models to interact with, and provide services to, users. As used herein, an Al model is a computer-based model that can be configured and trained to provide natural conversation services for users (e.g., automatically interpreting input utt