EP-4740152-A2 - SYSTEM AND METHOD FOR IMPLEMENTING AN ADVISORY ASSISTANT TO A GENERATIVE ARTIFICAL INTELLIGENCE TOOL

EP4740152A2EP 4740152 A2EP4740152 A2EP 4740152A2EP-4740152-A2

Abstract

The invention relates to computer-implemented systems and methods that implement an innovative generative Al service based on proprietary expertise and industry knowledge. The generative Al service provides unique autonomous features, such as combining separate and distinct LLM responses and prompts to create unique results. Other autonomous features may include an ability to handle scaling and auto deployment of models and rerouting requests autonomously to ensure user load is balanced across the entire globally distributed Generative Al infrastructure estate. The generative Al service may further deploy new Production instances of models on demand by predefined system criteria as well as by explicit user request based on projected demand increase and/or the need for specific instance for further model fine-tuning.

Inventors

MCCLUNG, Joshua
ATKINS, RYAN
BILAL, Mohammad, Mirwais
Hardy, Bryan
KELLER, DAVID

Assignees

KPMG LLP

Dates

Publication Date: 20260513
Application Date: 20240808

Claims (20)

1. A computer-implemented system for implementing a middleware platform that provides a series of generative Al services, the system comprising: a user interface that is configured to receive one or more requests from a user, via a communication network; a Digital Matrix Application Gateway that is configured to receive the one or more requests and route the one or more requests to a set of API Compute Resources, wherein the set of API Compute Resources comprises one or more API Apps and one or more API App Functions, wherein each of the set of API Compute Resources is configured to make calls to one of a plurality of APIs, wherein the set of API Compute Resources interacts with a plurality of different large language models (LLMs) that collectively generate a response to the one or more requests; a data storage component that reads and writes configuration and operational data associated with the API Compute Resources; an insights analytics processing component that is configured to receive application telemetry data from one or more API Compute Resources; and a log analytics component that stores log data from the insights component.
2. The system of claim 1, wherein the response represents a combination of LLM responses from the plurality of LLMs.
3. The system of claim 1, wherein the plurality of LLMs have access to a proprietary knowledgebase.
4. The system of claim 1, wherein user load balancing is applied across the plurality of different LLMs on an entire globally distributed generative Al infrastructure.
5. The system of claim 1, wherein the user interface comprises a generative Al chat interface.
6. The system of claim 1, wherein the user interface comprises a cognitive search interface.
7. The system of claim 1, wherein each of the plurality of LLMs is independently run and secured in a virtual container.
8. The system of claim 1, wherein a copy of a LLM from the plurality of LLMs is created to respond in a predetermined way through a training process to create a tuned version of the LLM.
9. The system of claim 1, wherein the log data is globally and centrally managed to determine model usage and performance metrics across the plurality of different LLMs.
10. The system of claim 1, wherein the plurality of LLMs are selected based on model optimization.
11. A computer-implemented method for implementing a middleware platform that provides a series of generative Al services, the method comprising the steps of: receiving, via a user interface, one or more requests from a user, via a communication network; receiving, via a Digital Matrix Application Gateway, the one or more requests and routing the one or more requests to a set of APT Compute Resources, wherein the set of API Compute Resources comprises one or more API Apps and one or more API App Functions, wherein each of the set of API Compute Resources is configured to make calls to one of a plurality of APIs, wherein the set of API Compute Resources interacts with a plurality of different large language models (LLMs) that collectively generate a response to the one or more requests; reading and writing, via a data storage component, configuration and operational data associated with the API Compute Resources; receiving, via an insights analytics processing component, application telemetry data from one or more API Compute Resources; storing, via a log analytics component, log data from the insights component; and transmitting, via user interface, the response.
12. The method of claim 11, wherein the response represents a combination of LLM responses from the plurality of LLMs.
13. The method of claim 11, wherein the plurality of LLMs have access to a proprietary knowledgebase.
14. The method of claim 11, wherein user load balancing is applied across the plurality of different LLMs on an entire globally distributed generative Al infrastructure.
15. The method of claim 11, wherein the user interface comprises a generative Al chat interface.
16. The method of claim 11, wherein the user interface comprises a cognitive search interface.
17. The method of claim 11, wherein each of the plurality of LLMs is independently run and secured in a virtual container.
18. The method of claim 11, wherein a copy of a LLM from the plurality of LLMs is created to respond in a predetermined way through a training process to create a tuned version of the LLM.
19. The method of claim 11, wherein the log data is globally and centrally managed to determine model usage and performance metrics across the plurality of different LLMs.
20. The method of claim 11, wherein the plurality of LLMs are selected based on model optimization.

Description

SYSTEM AND METHOD FOR IMPLEMENTING AN ADVISORY ASSISTANT TO A GENERATIVE ARTIFICAL INTELLIGENCE TOOL CROSS-REFERENCE TO RELATED APPLICATIONS [0001] The application claims priority to U.S. Provisional Application 63/531,388 (Attorney Docket No. 055089.0000113), filed August 8, 2023, the contents of which are incorporated by reference herein in their entirety. FIELD OF THE INVENTION [0002] The present invention relates to systems and methods for implementing a middleware platform that provides a series of generative artificial intelligence (Gen Al) services, including an advisory assistant to a generative artificial intelligence tool, and combines responses from multiple language models with proprietary data for enhanced results and outputs. BACKGROUND [0003] ChatGPT (Chat Generative Pre-trained Transformer) is a state-of-the-art language model developed by OpenAI using advanced machine learning algorithms to generate human-like text based on input it receives. [0004] Microsoft Azure OpenAI Service provides REST API access to OpenAI’ s language models including the GPT-4, GPT-3, Codex and Embeddings model series. It represents an opportunity for organizations looking to improve efficiency, reduce costs, and enhance customer satisfaction. By leveraging artificial intelligence technology, language processing tools enable organizations to automate a wide range of tasks that previously required human intervention. [0005] In particular, language processing tools can assist organizations to improve customer service and support. With the ability to process and understand natural language inputs, language processing tools can provide quick and accurate responses to customer inquiries, reducing response times and increasing customer satisfaction. [0006] In addition to customer service, language processing tools can be used for a variety of other tasks, including content creation, data analysis, business process automation, etc. For example, language processing tools can help organizations complete these tasks more efficiently and accurately, freeing up valuable time and resources for other strategic initiatives. [0007] As technology advances, new language models are introduced and current ones continue to improve. Businesses are required to effectively leverage technology in a consistent and efficient manner to gain a significant competitive advantage while protecting proprietary and sensitive data. Current systems are unable to support new language models and the speed at which they are introduced to the market. At best, current solutions onboard one language model at a time. This leads to missed opportunities and an inability to leverage the newest features and updated technologies in a timely and efficient manner. [0008] It would be desirable, therefore, to have a system and method that could overcome the foregoing disadvantages of known systems. SUMMARY [0009] According to one embodiment, the invention relates to a computer-implemented system for implementing a middleware platform that provides a series of generative Al services. The system comprises: a user interface that is configured to receive one or more requests from a user, via a communication network; a Digital Matrix Application Gateway that is configured to receive the one or more requests and route the one or more requests to a set of API Compute Resources, wherein the set of API Compute Resources comprises one or more API Apps and one or more API App Functions, wherein each of the set of API Compute Resources is configured to make calls to one of a plurality of APIs, wherein the set of API Compute Resources interacts with a plurality of different large language models (LLMs) that collectively generate a response to the one or more requests; a data storage component that reads and writes configuration and operational data associated with the API Compute Resources; an insights analytics processing component that is configured to receive application telemetry data from one or more API Compute Resources; and a log analytics component that stores log data from the insights component. [0010] According to another embodiment, the invention relates to a computer- implemented method for implementing a middleware platform that provides a series of generative Al services. The method comprises the steps of: receiving, via a user interface, one or more requests from a user, via a communication network; receiving, via a Digital Matrix Application Gateway, the one or more requests and routing the one or more requests to a set of API Compute Resources, wherein the set of API Compute Resources comprises one or more API Apps and one or more API App Functions, wherein each of the set of API Compute Resources is configured to make calls to one of a plurality of APIs, wherein the set of API Compute Resources interacts with a plurality of different large language models (LLMs) that collectively generate a response to the one or more requests; reading and writing, via a d