Search

KR-20260068110-A - Execution of an execution plan using a large-scale language model with digital assistance

KR20260068110AKR 20260068110 AKR20260068110 AKR 20260068110AKR-20260068110-A

Abstract

Techniques for executing an execution plan for a digital assistant having generative artificial intelligence (genAI) are disclosed herein. A first genAI model may generate a list of actionable actions based on an utterance provided by a user. An execution plan may be generated to include actionable actions. An execution plan may be executed by performing an iterative process for each of the actionable actions. The iterative process may include the steps of identifying an action type, calling one or more states, and executing an actionable action using an asset to obtain an output from one or more states. A second prompt may be generated based on the output obtained from executing each of the one or more actionable actions. A second genAI model may generate a response to an utterance based on the second prompt.

Inventors

  • 슈, 신
  • 헤티게, 바기야 가야트리
  • 가데, 스리니바사 파니 쿠마르
  • 다르마시리, 야쿠피티야게 돈 타누자 사모디에
  • 스리다란, 반시카
  • 비슈노이, 비샬
  • 존슨, 마크 에드워드

Assignees

  • 오라클 인터내셔날 코포레이션

Dates

Publication Date
20260513
Application Date
20240912
Priority Date
20240905

Claims (20)

  1. As a computer-implemented method, A step of generating a list including one or more executable actions based on a first prompt containing a natural language utterance provided by a user, by a first generative artificial intelligence model; A step of generating an execution plan including one or more of the above-mentioned executable actions; As a step of executing the above execution plan, the step of executing the above execution plan includes a step of performing an iterative process for each of the one or more executable actions, and the iterative process is: Step of identifying action types for actionable actions, A step of invoking one or more states configured to execute the above action type, and A step of executing the execution plan, comprising the step of executing the executable action using an asset to obtain an output by means of one or more of the above states; A step of generating a second prompt based on the output obtained from executing each of the above one or more executable actions; and A computer-implemented method comprising the step of generating a response to the natural language utterance based on the second prompt by a second generative artificial intelligence model.
  2. In paragraph 1, The step of generating the above execution plan includes the step of performing an evaluation of the one or more actionable actions, and The above evaluation includes evaluating the one or more execution plans based on one or more ongoing conversation paths initiated by the user and any currently active execution plans, and The step of generating the above execution plan is: (i) when the evaluation determines that the natural language utterance is part of an ongoing conversational path, integrating the one or more actionable actions into a currently active execution plan associated with the ongoing conversational path—the currently active execution plan includes an ordered list of the one or more actionable actions and one or more previous actions—or (ii) A computer-implemented method further comprising the step of generating a new execution plan including an ordered list of one or more actionable actions when the evaluation determines that the natural language utterance is not part of an ongoing conversation path.
  3. In paragraph 1, The above iterative process is: A step of determining whether one or more parameters are available for the above-mentioned executable action; When one or more of the above parameters are available, calling one or more of the above states and executing the executable action based on the one or more of the above parameters; and A computer-implemented method further comprising the step of, when one or more parameters for the above-mentioned executable actions are not available, acquiring the one or more parameters that are not available, then calling the one or more states, and executing the executable action based on the one or more parameters.
  4. In paragraph 3, A computer-implemented method comprising the step of obtaining one or more parameters, the step of generating a natural language request to the user to obtain one or more parameters for the executable actions, and the step of receiving a response from the user that includes one or more parameters.
  5. In any one of paragraphs 1 through 4, The step of invoking one or more states configured to execute the above action type is: A step of calling a first state to identify that the above-mentioned executable action has not yet been executed to generate a response, and The method includes the step of calling a second state to determine whether one or more parameters are available for the above-mentioned executable action, and The step of executing the executable action using the asset to obtain the output includes the step of calling a third state to generate the output, and The first state, the second state, and the third state are different from each other, a computer-implemented method.
  6. In any one of paragraphs 1 through 4, The step of generating the above list includes the step of selecting one or more actionable actions from a list of candidate agent actions determined using a semantic index, and the step of generating the above execution plan is: A step of identifying one or more actionable actions that provide information or knowledge for generating the response to the natural language utterance, based at least partially on metadata associated with candidate agent actions in the list of candidate agent actions; and A computer-implemented method further comprising the step of generating a structured output for the execution plan by generating an ordered list of the one or more executable actions and a set of dependencies between the one or more executable actions.
  7. In paragraph 6, A computer-implemented method, wherein the iterative process further comprises the step of determining, based on a set of dependencies between the one or more executable actions, that there exist one or more dependencies between the executable action and at least one other executable action among the one or more executable actions, and wherein the executable action is executed sequentially according to the one or more dependencies determined to exist between the executable action and the at least one other executable action.
  8. As a system, One or more processors; and The system comprises one or more computer-readable media storing instructions that cause the system to perform the following operations when executed by the one or more processors, wherein the operations are: The operation of generating a list including one or more executable actions based on a first prompt containing natural language utterances provided by a user, by a first generative artificial intelligence model; The operation of generating an execution plan including one or more of the above-mentioned executable actions; As an action for executing the above execution plan, the action for executing the above execution plan includes an action for each of the one or more executable actions, wherein the action for executing the above execution plan includes an action for performing a repetitive process, and the repetitive process is: Action that identifies the action type for an actionable action, An action that calls one or more states configured to execute the above action type, and An action of executing the execution plan, including an action of executing the executable action using an asset to obtain an output by means of one or more of the above states; The operation of generating a second prompt based on the output obtained from executing each of the above one or more executable actions; and A system comprising an operation to generate a response to the natural language utterance based on the second prompt by the second generative artificial intelligence model.
  9. In paragraph 8, The operation of generating the above execution plan includes the operation of performing an evaluation of one or more of the above-mentioned executable actions, and The above evaluation includes evaluating the one or more execution plans based on one or more ongoing conversation paths initiated by the user and any currently active execution plans, and The action of generating the above execution plan is: (i) When the above evaluation determines that the above natural language utterance is part of an ongoing conversational path, the action of integrating the one or more actionable actions into a currently active execution plan associated with the ongoing conversational path—the currently active execution plan includes an ordered list of the one or more actionable actions and one or more previous actions—or (ii) A system further comprising the operation of generating a new execution plan including an ordered list of one or more actionable actions when the above evaluation determines that the natural language utterance is not part of an ongoing conversation path.
  10. In paragraph 8, The above iterative process is: An action that determines whether one or more parameters are available for the above-mentioned executable action; An operation to call the one or more states when the one or more parameters are available, and to execute the executable action based on the one or more parameters; and A system further comprising the operation of, when one or more parameters for the above-mentioned executable actions are not available, acquiring the one or more parameters that are not available, then calling the one or more states, and executing the executable action based on the one or more parameters.
  11. In Paragraph 10, A system in which the operation of obtaining one or more parameters comprises: generating a natural language request to the user to obtain one or more parameters for the executable actions; and receiving a response from the user that includes one or more parameters.
  12. In any one of paragraphs 8 through 11, An action that invokes one or more states configured to execute the above action type is: An action of calling a first state to identify that the above-mentioned executable action has not yet been executed to generate a response, and It includes an operation to call a second state to determine whether one or more parameters are available for the above-mentioned executable action, and The operation of executing the executable action using the asset to obtain the output includes the operation of calling a third state to generate the output, and The above first state, the above second state, and the above third state are different from each other, in a system.
  13. In any one of paragraphs 8 through 11, The operation of generating the above list includes the operation of selecting one or more actionable actions from a list of candidate agent actions determined using a semantic index, and the operation of generating the above execution plan is: An action of identifying one or more actionable actions that provide information or knowledge for generating the response to the natural language utterance, based at least partially on metadata associated with candidate agent actions within the list of candidate agent actions; and A system further comprising the operation of generating a structured output for the execution plan by generating an ordered list of the one or more executable actions and a set of dependencies between the one or more executable actions.
  14. In Paragraph 13, The above iteration process further includes an operation of determining, based on a set of dependencies between the one or more executable actions, that there exist one or more dependencies between the executable action and at least one other executable action among the one or more executable actions, and the executable action is sequentially executable according to the one or more dependencies determined to exist between the executable action and the at least one other executable action.
  15. One or more non-transient computer-readable media storing instructions that, when executed by one or more processors, cause said one or more processors to perform operations including the following operations, said operations being: The operation of generating a list including one or more executable actions based on a first prompt containing natural language utterances provided by a user, by a first generative artificial intelligence model; The operation of generating an execution plan including one or more of the above-mentioned executable actions; As an action for executing the above execution plan, the action for executing the above execution plan includes an action for each of the one or more executable actions, wherein the action for executing the above execution plan includes an action for performing a repetitive process, and the repetitive process is: Action that identifies the action type for an actionable action, An action that calls one or more states configured to execute the above action type, and An action of executing the execution plan, including an action of executing the executable action using an asset to obtain an output by means of one or more of the above states; The operation of generating a second prompt based on the output obtained from executing each of the above one or more executable actions; and One or more non-transient computer-readable media comprising an operation to generate a response to the natural language utterance based on the second prompt by the second generative artificial intelligence model.
  16. In paragraph 15, The operation of generating the above execution plan includes the operation of performing an evaluation of one or more of the above-mentioned executable actions, and The above evaluation includes evaluating the one or more execution plans based on one or more ongoing conversation paths initiated by the user and any currently active execution plans, and The action of generating the above execution plan is: (i) When the above evaluation determines that the above natural language utterance is part of an ongoing conversational path, the action of integrating the one or more actionable actions into a currently active execution plan associated with the ongoing conversational path—the currently active execution plan includes an ordered list of the one or more actionable actions and one or more previous actions—or (ii) One or more non-transient computer-readable media further comprising the operation of generating a new execution plan including an ordered list of one or more actionable actions when the above evaluation determines that the natural language utterance is not part of an ongoing conversation path.
  17. In paragraph 15, The above iterative process is: An action that determines whether one or more parameters are available for the above-mentioned executable action; An operation to call the one or more states when the one or more parameters are available, and to execute the executable action based on the one or more parameters; and One or more non-transient computer-readable media, wherein when one or more parameters for the above-mentioned executable actions are not available, the operation of acquiring the one or more parameters that are not available, then invoking the one or more states, and executing the executable action based on the one or more parameters, wherein the operation of acquiring the one or more parameters includes the operation of generating a natural language request to the user to acquire the one or more parameters for the above-mentioned executable actions, and the operation of receiving a response from the user that includes the one or more parameters.
  18. In any one of paragraphs 15 through 17, An action that invokes one or more states configured to execute the above action type is: An action of calling a first state to identify that the above-mentioned executable action has not yet been executed to generate a response, and It includes an operation to call a second state to determine whether one or more parameters are available for the above-mentioned executable action, and The operation of executing the executable action using the asset to obtain the output includes the operation of calling a third state to generate the output, and The first state, the second state, and the third state are different from one or more non-transient computer-readable media.
  19. In any one of paragraphs 15 through 17, The operation of generating the above list includes the operation of selecting one or more actionable actions from a list of candidate agent actions determined using a semantic index, and the operation of generating the above execution plan is: An action of identifying one or more actionable actions that provide information or knowledge for generating the response to the natural language utterance, based at least partially on metadata associated with candidate agent actions within the list of candidate agent actions; and One or more non-transient computer-readable media further comprising the operation of generating a structured output for the execution plan by generating an ordered list of the one or more executable actions and a set of dependencies between the one or more executable actions.
  20. In Paragraph 19, The above iteration process further includes an operation of determining, based on a set of dependencies between the one or more executable actions, that there exist one or more dependencies between the executable action and at least one other executable action among the one or more executable actions, and the executable action is sequentially executable according to the one or more dependencies determined to exist between the executable action and the at least one other executable action, one or more non-transient computer-readable media.

Description

Execution of an execution plan using a large-scale language model with digital assistance Cross-reference regarding related applications This application claims the benefit and priority under 35 U.S.C. 119(e) to U.S. Provisional Application No. 63/583,028 filed September 15, 2023, and to U.S. Regular Application No. 18/825,573 filed September 5, 2024, the disclosures of each of these applications are incorporated herein by reference in their entirety for all purposes. Technology field The present disclosure relates generally to digital assistants, and more specifically, but not necessarily to, digital assistants and techniques for executing execution plans to generate responses to utterances using large-scale language models. Artificial intelligence (AI) has a wide range of applications and has made notable advancements, particularly in the field of digital assistants or chatbots. Originally, many users sought immediate responses through instant messaging or chat platforms. Organizations recognizing the potential for engagement used these platforms to interact with entities, such as end users, through real-time conversations. However, maintaining live communication channels through human service staff has been shown to impose a significant cost burden on organizations. In response to this challenge, digital assistants or chatbots, also known as bots, have emerged as a solution to simulate conversations with entities, particularly over the Internet. Bots have enabled entities to communicate with users through messaging apps they already use or other applications with messaging capabilities. Initially, traditional chatbots relied on predefined skill or intent models, requiring entities to communicate within a fixed set of keywords or commands. Unfortunately, this approach limited the bot's ability to engage in live conversations intelligently and contextually, hindering its capacity for natural communication. Entities were constrained to use specific commands that the bot could understand, which often caused difficulties in effectively conveying intent. Since then, the situation has completely changed with the integration of Large Language Models (LLMs) into digital assistants or chatbots. LLMs are deep learning algorithms capable of performing various natural language processing (NLP) tasks. Using neural networks referred to as Transformers, these Transformers learn patterns and structures of natural language and can conduct more nuanced and context-aware conversations for various domains and purposes. This evolution signifies a significant shift from rigid keyword-based interactions compared to traditional chatbots to a more adaptive and intuitive communication experience, which enhances the overall ability of digital assistants or chatbots to understand and respond to user queries. In various embodiments, a computer-implemented method may be used to generate a response to an utterance using a digital assistant. The method may include the step of generating a list containing one or more actionable actions based on a first prompt containing a natural language utterance provided by a user, by a first generative artificial intelligence model. The method may include the step of generating an action plan containing one or more actionable actions. The method may include the step of executing the action plan. The step of executing the action plan may include performing an iterative process for each of the actionable actions. The iterative process may include the step of (i) identifying an action type for an actionable action, (ii) calling one or more states configured to execute the action type, and (iii) executing an actionable action using an asset to obtain an output by one or more states. The method may include the step of generating a second prompt based on the output obtained from executing each of the one or more actionable actions. The method may include the step of generating a response to a natural language utterance based on the second prompt, by a second generative artificial intelligence model. In some embodiments, the step of generating an execution plan may include the step of performing an evaluation of one or more actionable actions. Additionally or alternatively, the evaluation may include evaluating said one or more execution plans based on one or more ongoing conversation paths initiated by the user and any currently active execution plans. Additionally or alternatively, the step of generating an execution plan may include (i) the step of incorporating one or more actionable actions into a currently active execution plan associated with the ongoing conversation path when the evaluation determines that the natural language utterance is part of an ongoing conversation path—wherein the currently active execution plan includes one or more actionable actions and an ordered list of one or more previous actions—or (ii) the step of generating a new execution plan including an ordered list of one or mor