US-20260127428-A1 - SYSTEMS AND METHODS FOR GENERATING COMPLEX RESPONSES TO USER INTERFACE QUERIES USING MULTI-AUTONOMOUS MODEL ARCHITECTURES

US20260127428A1US 20260127428 A1US20260127428 A1US 20260127428A1US-20260127428-A1

Abstract

Systems and methods for a multi-autonomous model architecture. For example, an autonomous model operates independently, within the architecture to perform tasks on behalf of itself or another system. These models possess the ability to make decisions and act without intervention, based on their programming and the information they perceive from their environment, thereby increasing autonomy, adaptability, and/or perception. More specifically, the system uses a multi-autonomous model architecture that comprises an understanding model (e.g., tasked with generating a contextual representation of inputted information), a planning model (e.g., tasked with generating an action graph for generating a complex solution), and an evaluation model (e.g., tasked with independently validating the action graph prior to recommending to a user).

Inventors

SHIXIONG ZHANG
Sambit Sahu
Anirban Das
Kartik Balasubramaniam
Vivek Nayak
Milind Naphade
Premkumar Natarajan

Assignees

CAPITAL ONE SERVICES, LLC

Dates

Publication Date: 20260507
Application Date: 20250314

Claims (20)

1 . One or more non-transitory, computer-readable medium, comprising instructions that, when executed by one or more processors, cause operations comprising: processing, using a first large language model, a first user query to generate a first action graph objective, wherein the first action graph objective is processed to generate a first action graph, and wherein the first action graph is processed to generate a first validated action graph; and generating a first response to the first user query based on the first validated action graph.
2 . The one or more non-transitory, computer-readable medium of claim 1 , wherein training the first autonomous model using the first large language model to determine the contextual information in order to determine the action graph objective outputs for the inputted user queries further comprises: generating a first set of training data for the contextual information comprising labeled action graph objective outputs for labeled contextual information; retrieving the first large language model, wherein the first large language model is pre-trained to process the inputted user queries to determine the labeled contextual information; and re-training the first large language model, using the first set of training data, to generate the labeled action graph objective outputs for the labeled contextual information.
3 . The one or more non-transitory, computer-readable medium of claim 1 , wherein training the second autonomous model using the second large language model to determine the action graph outputs for the inputted action graph objectives further comprises: generating a second set of training data comprising labeled action graph objectives and labeled action graph outputs; retrieving the second large language model, wherein the second large language model is pre-trained to process the inputted user queries to determine the labeled action graph objectives; and re-training the second large language model, using the second set of training data, to generate the labeled action graph outputs for the labeled action graph objectives.
4 . The one or more non-transitory, computer-readable medium of claim 1 , wherein training the third autonomous model using the third large language model to validate the inputted action graphs further comprises: generating a third set of training data comprising labeled action graph characteristics and labeled action graph outputs; retrieving the third large language model, wherein the third large language model is pre-trained to generate code script to modify action graph characteristics; and re-training the third large language model, using the third set of training data, to generate code script modifications to results in the labeled action graph outputs by modifying the labeled action graph characteristics.
5 . The one or more non-transitory, computer-readable medium of claim 1 , wherein training the second autonomous model using the second large language model to determine the action graph outputs for the inputted action graph objectives further comprises: generating training data for the second autonomous model based on the action graph objective outputs outputted by the first autonomous model; and training the second autonomous model, using the training data, to generate inputs for the third autonomous model.
6 . A system for generating complex responses to user interface queries using multi-autonomous model architectures, the system comprising: one or more processors; and one or more non-transitory, computer readable mediums comprising instructions that when executed by the one or more processors cause operations comprising: processing, using a first model, a plurality of user queries to generate respective action graph objectives, wherein the respective action graph objectives are processed to generate respective action graphs, and wherein the respective action graphs are processed to generate respective validated action graphs; generating respective responses to the plurality of user queries based on the respective validated action graphs; iteratively receiving, by the first model, respective feedback based on the respective responses; and iteratively updating, the first model, based on the respective feedback.
7 . A method for generating complex responses to user interface queries using a multi-autonomous model architecture, the method comprising: receiving, at a user interface, a first user query; processing the first user query with a first autonomous model, of a multi-autonomous model architecture, to generate a first action graph objective, wherein the first autonomous model is trained using a first large language model to determine contextual information in order to determine action graph objective outputs for inputted user queries; processing the first action graph objective with a second autonomous model to generate a first action graph, wherein the second autonomous model is trained using a second large language model to determine action graph outputs for inputted action graph objectives; processing the first action graph with a third autonomous model to generate a first validated action graph, wherein the third autonomous model is trained using a third large language model to validate inputted action graphs; and generating for display, in the user interface, a first response to the first user query based on the first validated action graph.
8 . The method of claim 7 , wherein training the first autonomous model using the first large language model to determine the contextual information in order to determine the action graph objective outputs for the inputted user queries further comprises: generating a first set of training data for the contextual information comprising labeled action graph objective outputs for labeled contextual information; retrieving the first large language model, wherein the first large language model is pre-trained to process the inputted user queries to determine the labeled contextual information; and re-training the first large language model, using the first set of training data, to generate the labeled action graph objective outputs for the labeled contextual information.
9 . The method of claim 7 , wherein training the second autonomous model using the second large language model to determine the action graph outputs for the inputted action graph objectives further comprises: generating a second set of training data comprising labeled action graph objectives and labeled action graph outputs; retrieving the second large language model, wherein the second large language model is pre-trained to process the inputted user queries to determine the labeled action graph objectives; and re-training the second large language model, using the second set of training data, to generate the labeled action graph outputs for the labeled action graph objectives.
10 . The method of claim 7 , wherein training the third autonomous model using the third large language model to validate the inputted action graphs further comprises: generating a third set of training data comprising labeled action graph characteristics and labeled action graph outputs; retrieving the third large language model, wherein the third large language model is pre-trained to generate code script to modify action graph characteristics; and re-training the third large language model, using the third set of training data, to generate code script modifications to results in the labeled action graph outputs by modifying the labeled action graph characteristics.
11 . The method of claim 7 , wherein training the second autonomous model using the second large language model to determine the action graph outputs for the inputted action graph objectives further comprises: generating training data for the second autonomous model based on the action graph objective outputs outputted by the first autonomous model; and training the second autonomous model, using the training data, to generate inputs for the third autonomous model.
12 . The method of claim 7 , wherein training the third autonomous model using the third large language model to validate the inputted action graphs further comprises: generating training data for the third autonomous model based on the action graph outputs outputted by the second autonomous model; and training the third autonomous model, using the training data, to validate inputs to the third autonomous model.
13 . The method of claim 7 , wherein processing the first action graph with the third autonomous model to generate the first validated action graph further comprises: determining a first update to the first action graph required to generate the first validated action graph; processing the first update using a large language model to generate a second code script corresponding to the first update; and updating first code script with the second code script.
14 . The method of claim 7 , wherein processing the first action graph with the third autonomous model to generate the first validated action graph further comprises: generating a sandbox session for the first action graph; retrieving user profile data from a user profile for a user; retrieving state characteristics from one or more servers; populating the sandbox session with the user profile data and the state characteristics; and testing the first action graph in the sandbox session to determine whether the first action graph results in a requested final state.
15 . The method of claim 7 , wherein the first action graph comprises a plurality of nodes and a plurality of edges from an initial state and a requested final state, and wherein generating the first action graph comprises: determining a plurality of pathways through the plurality of nodes using the plurality of edges, wherein each of the plurality of pathways comprises a respective route from the initial state and the requested final state; determining, based on the first user query, first criterion; comparing the first criterion to a first route, wherein the first route corresponds to a first pathway of the plurality of pathways; and selecting the first pathway from the plurality of pathways based on comparing the first criterion to the first route.
16 . The method of claim 7 , wherein the first action graph comprises a plurality of nodes and a plurality of edges from an initial state and a requested final state, and wherein generating the first action graph comprises: determining a first pathway through the plurality of nodes using the plurality of edges; determining a second pathway through the plurality of nodes using the plurality of edges; determining a comparison criterion based on the first user query; and comparing the first pathway to the second pathway based on the comparison criterion.
17 . The method of claim 7 , wherein the first action graph comprises a plurality of nodes and a plurality of edges from an initial state and a requested final state, and wherein generating the first action graph comprises: determining pairs of the plurality of nodes to connect using the plurality of edges; and determining weights for the pairs based on state characteristics.
18 . The method of claim 7 , wherein the first action graph comprises a plurality of nodes and a plurality of edges from an initial state and an requested final state, and wherein generating the first action graph comprises: determining a number of the plurality of nodes based on the first user query; and determining a number of edges to connect the plurality of nodes based on the first user query.
19 . The method of claim 7 , further comprising: initiating a first device session between a first mobile device and one or more servers comprising the multi-autonomous model architecture; determining a user corresponding to the first mobile device; and retrieving a user profile corresponding to the user.
20 . The method of claim 7 , wherein the multi-autonomous model architecture comprises a plurality of autonomous models, and wherein processing inputs by the multi-autonomous model architecture comprises: receiving one or more outputs from one or more of the plurality of autonomous models; and autonomously inputting the one or more outputs into the one or more of the plurality of autonomous models.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S) This application claims the benefit of priority of U.S. Provisional Ser. No. 63/717,232, filed Nov. 6, 2024. The content of the foregoing application is incorporated herein in its entirety by reference. BACKGROUND Chatbots are typically implemented using a combination of natural language processing (NLP), machine learning, and sometimes rule-based systems. The implementation begins with defining the chatbot's purpose and the scope of interactions it will handle. NLP techniques are used to understand and interpret user inputs, converting the natural language into structured data that the chatbot can process. Machine learning models, particularly those involving deep learning, are trained on large datasets to improve the chatbot's ability to recognize patterns and provide relevant responses. These models enable the chatbot to understand context, manage dialogues, and learn from interactions over time. In some cases, rule-based systems are used to handle specific queries or follow predefined scripts, particularly for simpler or more structured interactions. The chatbot's architecture often includes integration with messaging platforms, databases, and APIs to access necessary information and provide dynamic responses. Additionally, developers focus on user experience design to ensure that the interactions are intuitive and engaging. Continuous monitoring and updating are crucial to maintain the chatbot's effectiveness and to adapt to new user needs or changes in language patterns. Dealing with complex problems is technically challenging for chatbots due to several factors related to the intricacies of human language and cognition. Human language is highly context-dependent, nuanced, and often ambiguous, making it difficult for chatbots to accurately interpret and respond to complex queries. Understanding the context requires not just parsing words but also grasping the intent, sentiment, and sometimes even cultural or situational subtleties, which can be beyond the capabilities of many NLP models. SUMMARY Further exacerbating these technical issues, complex problems often involve complex solutions, which may include multiple steps, dependencies, and/or a need for deep domain-specific knowledge. This requires chatbots to have advanced reasoning abilities, extensive and up-to-date knowledge bases, and/or the capacity to manage multi-turn conversations effectively. Maintaining coherence and relevance throughout an extended dialogue, while also handling interruptions or changes in topic, adds another layer of difficulty. The limitations of current artificial intelligence models struggle preparing these complex solutions even with contextual relevance. For example, artificial intelligence, including, but not limited to, machine learning, deep learning, etc. (referred to collectively herein as artificial intelligence models, machine learning models, or simply models) refers to a wide-ranging branch of computer science concerned with building smart machines capable of performing tasks that typically require human intelligence. Key benefits of artificial intelligence are its ability to process data, find underlying patterns, and/or perform real-time determinations. However, despite these benefits and despite the wide-ranging number of potential applications, practical implementations of artificial intelligence have been hindered by several technical problems. First, artificial intelligence may rely on large amounts of high-quality data. The process for obtaining this data and ensuring it is high-quality can be complex and time-consuming. Second, any data that is obtained may need to be categorized and labeled accurately, which can be difficult, time-consuming and a manual task. Finally, results based on artificial intelligence can be difficult to review as the process by which the results are made may be unknown or obscured. This obscurity can create hurdles for identifying errors in the results, as well as improving the models providing the results. These technical problems may present an inherent problem with attempting to use an artificial intelligence-based solution in preparing complex solutions. Systems and methods are described herein for novel uses and/or improvements to artificial intelligence applications in order to generate complex technical responses and solutions. As one example, systems and methods are described herein for a multi-autonomous model architecture. For example, an autonomous model operates independently, within the architecture to perform tasks on behalf of itself or another system. These models possess the ability to make decisions and act without intervention, based on their programming and the information they perceive from their environment, thereby increasing autonomy, adaptability, and/or perception. More specifically, the system uses a multi-autonomous model architecture that comprises an understanding model (e.g., tasked with generating a con