US-12626064-B2 - Meta-reflection techniques for learning instructions for language agents using past self-reflections

US12626064B2US 12626064 B2US12626064 B2US 12626064B2US-12626064-B2

Abstract

A data processing system implements accessing a datastore of training data using a model training unit to obtain a first training sample, the first training sample comprising a first natural language utterance, first ground truth information, the first natural language utterance requesting that content be generated by a language model, the first ground truth information providing a first example of first expected output of the language model in response to the first natural language utterance; constructing a first prompt based on the first natural language utterance using a prompt construction unit; providing, using the prompt construction unit, the first prompt to the language model as an input to cause the language model to generate a first output; analyzing the first output and the first ground truth information using the model training unit to determine whether the first output is erroneous; constructing, using the prompt construction unit, a second prompt that instructs the language model to generate a first self-reflection response that indicates why the language model generated the first output; providing the second prompt as an input to the language model to cause the language model to generate the first self-reflection response; constructing, using the prompt construction unit, a third prompt that includes the first self-reflection response, the third prompt instructing the language model to generate prompt improvement instructions to be included in subsequently constructed prompts for the language model to assist the language model in generating a correct response to the subsequently constructed prompts; providing the third prompt to the language model to cause the language model to generate the prompt improvement instructions; and including the prompt improvement instructions in the subsequently constructed prompts generated using the prompt construction unit.

Inventors

Gustavo Araujo Soares
Sumit Gulwani
Shashank KIRTANIA
Sherry SHI
Arjun RADHAKRISHNA
Ananya SINGHA
Priyanshu Gupta

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260512
Application Date: 20240423

Claims (20)

1 . A data processing system comprising: a processor; and a memory storing executable instructions that, when executed, cause the processor alone or in combination with other processors to perform operations of: accessing a datastore of training data using a model training unit to obtain a first training sample, the first training sample comprising a first natural language utterance, first ground truth information, the first natural language utterance requesting that content be generated by a language model, the first ground truth information providing a first example of first expected output of the language model in response to the first natural language utterance; constructing a first prompt based on the first natural language utterance using a prompt construction unit; providing, using the prompt construction unit, the first prompt to the language model as an input to cause the language model to generate a first output; analyzing the first output and the first ground truth information using the model training unit to determine whether the first output is erroneous; constructing, using the prompt construction unit, a second prompt that instructs the language model to generate a first self-reflection response that indicates why the language model generated the first output; providing the second prompt as an input to the language model to cause the language model to generate the first self-reflection response; constructing, using the prompt construction unit, a third prompt that includes the first self-reflection response, the third prompt instructing the language model to generate prompt improvement instructions to be included in subsequently constructed prompts for the language model to assist the language model in generating a correct response to the subsequently constructed prompts; providing the third prompt to the language model to cause the language model to generate the prompt improvement instructions; and including the prompt improvement instructions in the subsequently constructed prompts generated using the prompt construction unit.
2 . The data processing system of claim 1 , wherein the memory further includes instructions configured to cause the processor alone or in combination with other processors to perform operations of: obtaining a second natural language utterance, the second natural language utterance being input via a user interface of an application; constructing, using the prompt construction unit, a fourth prompt based on the second natural language utterance and the prompt improvement instructions to cause the language model to generate a second output; providing, using the prompt construction unit, the fourth prompt to the language model as an input to cause the language model to generate the second output; and providing the second output to the application to cause the application to present the second output on the user interface of the application.
3 . The data processing system of claim 2 , wherein the memory further includes instructions configured to cause the processor alone or in combination with other processors to perform operations of: receiving feedback from a user of the application via the user interface of the application in response to the second output, the feedback indicating that the second output was erroneous; constructing, using the prompt construction unit, a fifth prompt that indicates that the second output was erroneous and instructs the language model to generate a second self-reflection response that indicates why the language model generated the second output; providing, using the prompt construction unit, the fifth prompt as an input to the language model to cause the language model to generate the second self-reflection response; constructing, using the prompt construction unit, a sixth prompt that includes the second self-reflection response and the prompt improvement instructions, the third prompt instructing the language model to update the prompt improvement instructions based on the second self-reflection response; and providing the sixth prompt to the language model to cause the language model to update the prompt improvement instructions.
4 . The data processing system of claim 1 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further includes an operation of instructing the language model to include positive instructions for responding to the subsequently constructed prompts responsive to determining that the first output is not erroneous, the positive instructions providing context to reinforce correct inferences in response to the subsequently constructed prompts.
5 . The data processing system of claim 1 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further includes an operation of instructing the language model to include negative instructions for responding to the subsequently constructed prompts responsive to determining that the first output is erroneous, the negative instructions providing context to the language model for correctly responding to the subsequently constructed prompts by avoiding incorrect inferences included in the first output.
6 . The data processing system of claim 1 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further comprises instructions configured to cause the processor alone or in combination with other processors to perform operations of: analyzing the first natural language utterance to determine a subject-matter domain associated with the first natural language utterance, wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further comprises generating domain-specific prompt instructions to be included in subsequently constructed prompts for utterances associated with the subject-matter domain of the first natural language utterance.
7 . The data processing system of claim 6 , wherein analyzing the first natural language utterance to determine the subject-matter domain further comprises analyzing the first natural language utterance using a domain determination model trained to analyze a textual input and to output a subject-matter domain associated with the textual input.
8 . The data processing system of claim 1 , wherein the language model is implemented using a Generative Pre-trained Transformer (GPT) model.
9 . A data processing system comprising: a processor; and a memory storing executable instructions that, when executed, cause the processor alone or in combination with other processors to perform operations of: receiving a first natural language utterance, the first natural language utterance being input by a user via a user interface of an application; constructing a first prompt based on the first natural language utterance using a prompt construction unit; providing, using the prompt construction unit, the first prompt to a language model as an input to cause the language model to generate a first output based on the first natural language utterance; providing the first output to the application to cause the application to present the first output on the user interface of the application; receiving feedback from the user of the application via the user interface in response to the first output, the feedback indicating that the first output was erroneous; constructing, using the prompt construction unit, a second prompt that indicates that the first output was erroneous and instructs the language model to generate a first self-reflection response that indicates why the language model generated the first output; providing, using the prompt construction unit, the second prompt as an input to the language model to cause the language model to generate a second self-reflection response; constructing, using the prompt construction unit, a third prompt that includes the first self-reflection response, the third prompt instructing the language model to generate prompt improvement instructions based on the first self-reflection response; providing the third prompt to the language model to cause the language model to generate the prompt improvement instructions; and including the prompt improvement instructions in subsequently constructed prompts generated using the prompt construction unit.
10 . The data processing system of claim 9 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further includes an operation of instructing the language model to include positive instructions for responding to the subsequently constructed prompts responsive to determining that the first output is not erroneous, the positive instructions providing context to reinforce correct inferences in response to the subsequently constructed prompts.
11 . The data processing system of claim 9 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further includes an operation of instructing the language model to include negative instructions for responding to the subsequently constructed prompts responsive to determining that the first output is erroneous, the negative instructions providing context to the language model for correctly responding to the subsequently constructed prompts by avoiding incorrect inferences included in the first output.
12 . The data processing system of claim 9 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further comprises instructions configured to cause the processor alone or in combination with other processors to perform operations of: analyzing the first natural language utterance to determine a subject-matter domain associated with the first natural language utterance, wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further comprises generating domain-specific prompt instructions to be included in subsequently constructed prompts for utterances associated with the subject-matter domain of the first natural language utterance.
13 . The data processing system of claim 12 , wherein analyzing the first natural language utterance to determine the subject-matter domain further comprises analyzing the first natural language utterance using a domain determination model trained to analyze a textual input and to output a subject-matter domain associated with the textual input.
14 . The data processing system of claim 9 , wherein the language model is implemented using a Generative Pre-trained Transformer (GPT) model.
15 . A method implemented in a data processing system for improving predictions by a language model, the method comprising: accessing a datastore of training data using a model training unit to obtain a first training sample, the first training sample comprising a first natural language utterance, first ground truth information, the first natural language utterance requesting that content be generated by the language model, the first ground truth information providing a first example of first expected output of the language model in response to the first natural language utterance; constructing a first prompt based on the first natural language utterance using a prompt construction unit; providing, using the prompt construction unit, the first prompt to the language model as an input to cause the language model to generate a first output; analyzing the first output and the first ground truth information using the model training unit to determine whether the first output is erroneous; constructing, using the prompt construction unit, a second prompt that instructs the language model to generate a first self-reflection response that indicates why the language model generated the first output; providing the second prompt as an input to the language model to cause the language model to generate the first self-reflection response; constructing, using the prompt construction unit, a third prompt that includes the first self-reflection response, the third prompt instructing the language model to generate prompt improvement instructions to be included in subsequently constructed prompts for the language model to assist the language model in generating a correct response to the subsequently constructed prompts; providing the third prompt to the language model to cause the language model to generate the prompt improvement instructions; and including the prompt improvement instructions in the subsequently constructed prompts generated using the prompt construction unit.
16 . The method of claim 15 , further comprising: obtaining a second natural language utterance via a user interface of an application; constructing, using the prompt construction unit, a fourth prompt based on the second natural language utterance and the prompt improvement instructions to cause the language model to generate a second output; providing, using the prompt construction unit, the fourth prompt to the language model as an input to cause the language model to generate the second output; and providing the second output to the application to cause the application to present the second output on the user interface of the application.
17 . The method of claim 16 , further comprising: receiving feedback from a user of the application in response to the second output, the feedback indicating that the second output was erroneous; constructing, using the prompt construction unit, a fifth prompt that indicates that the second output was erroneous and instructs the language model to generate a second self-reflection response that indicates why the language model generated the second output; providing, using the prompt construction unit, the fifth prompt as an input to the language model to cause the language model to generate the second self-reflection response; constructing, using the prompt construction unit, a sixth prompt that includes the second self-reflection response and the prompt improvement instructions, the third prompt instructing the language model to update the prompt improvement instructions based on the second self-reflection response; and providing the sixth prompt to the language model to cause the language model to update the prompt improvement instructions.
18 . The method of claim 15 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further includes an operation of instructing the language model to include positive instructions for responding to the subsequently constructed prompts responsive to determining that the first output is not erroneous, the positive instructions providing context to reinforce correct inferences in response to the subsequently constructed prompts.
19 . The method of claim 15 , wherein constructing the third prompt instructing the language model to generate prompt improvement instructions further includes an operation of instructing the language model to include negative instructions for responding to the subsequently constructed prompts responsive to determining that the first output is erroneous, the negative instructions providing context to the language model for correctly responding to the subsequently constructed prompts by avoiding incorrect inferences included in the first output.
20 . The method of claim 15 , wherein the language model is implemented using a Generative Pre-trained Transformer (GPT) model.

Description

BACKGROUND Despite the popularity of Large Language Models (LLMs), creating specific prompts for LLMs to perform particular tasks remains challenging. Users often engage in multiple conversational turns with an LLM-based agent to accomplish their intended task. Consequently, significant computing resources and user time can be consumed attempting to reach convergence between the LLM output and the user's desired outcome. Hence there is a need for improved systems and methods that provide means for more efficiently assisting the user and the LLM to perform the user's intended task. SUMMARY An example data processing system according to the disclosure includes a processor and a memory storing executable instructions. The instructions when executed cause the processor alone or in combination with other processors to perform operations including receiving feedback from a user of the application via the user interface of the application in response to the second output, the feedback indicating that the second output was erroneous; constructing, using the prompt construction unit, a fifth prompt that indicates that the second output was erroneous and instructs the language model to generate a second self-reflection response that indicates why the language model generated the second output; providing, using the prompt construction unit, the fifth prompt as an input to the language model to cause the language model to generate the second self-reflection response; constructing, using the prompt construction unit, a sixth prompt that includes the second self-reflection response and the prompt improvement instructions, the third prompt instructing the language model to update the prompt improvement instructions based on the second self-reflection response; and providing the sixth prompt to the language model to cause the language model to update the prompt improvement instructions. An example method implemented in a data processing system includes receiving feedback from a user of the application via the user interface of the application in response to the second output, the feedback indicating that the second output was erroneous; constructing, using the prompt construction unit, a fifth prompt that indicates that the second output was erroneous and instructs the language model to generate a second self-reflection response that indicates why the language model generated the second output; providing, using the prompt construction unit, the fifth prompt as an input to the language model to cause the language model to generate the second self-reflection response; constructing, using the prompt construction unit, a sixth prompt that includes the second self-reflection response and the prompt improvement instructions, the third prompt instructing the language model to update the prompt improvement instructions based on the second self-reflection response; and providing the sixth prompt to the language model to cause the language model to update the prompt improvement instructions. An example data processing system according to the disclosure includes a processor and a memory storing executable instructions. The instructions when executed cause the processor alone or in combination with other processors to perform operations including receiving a first natural language utterance, the first natural language utterance being input by a user via a user interface of an application; constructing a first prompt based on the first natural language utterance using a prompt construction unit; providing, using the prompt construction unit, the first prompt to a language model as an input to cause the language model to generate a first output based on the first natural language utterance; providing the first output to the application to cause the application to present the first output on the user interface of the application; receiving feedback from the user of the application via the user interface in response to the first output, the feedback indicating that the first output was erroneous; constructing, using the prompt construction unit, a second prompt that indicates that the first output was erroneous and instructs the language model to generate a first self-reflection response that indicates why the language model generated the first output; providing, using the prompt construction unit, the second prompt as an input to the language model to cause the language model to generate a second self-reflection response; constructing, using the prompt construction unit, a third prompt that includes the first self-reflection response, the third prompt instructing the language model to generate prompt improvement instructions based on the first self-reflection response; providing the third prompt to the language model to cause the language model to generate the prompt improvement instructions; and including the prompt improvement instructions in subsequently constructed prompts generated using the prompt construction unit. This Summary is