CN-121986342-A - Collaboration between language models

CN121986342ACN 121986342 ACN121986342 ACN 121986342ACN-121986342-A

Abstract

A system may be configured for collaboration between language model agents. The agent may be, for example, a computer system, or a software component executing on a computer system, that can accept text and/or natural language input, process the input and perform functions using the LM, and respond via text and/or natural language output. An agent may act as an intermediary to interact with a user, identify tasks requested by the user, and delegate one or more subtasks to another agent or other resource. The agent may act as a trustee to handle tasks or subtasks delegated by the mediator. The agents may communicate with each other using a combination of structured and unstructured languages, e.g., using one or more parameters and natural language messages.

Inventors

F. Torok
I. A. Saiyawala
F. J. Delamat

Assignees

亚马逊科技公司

Dates

Publication Date: 20260505
Application Date: 20250519
Priority Date: 20240628

Claims (15)

1. A computer-implemented method, comprising: Receiving, by a first computer system corresponding to a first Language Model (LM) agent, first input data; Generating first LM output data using the first input data and a first LM corresponding to the first LM agent, the first LM output data representing a natural language request to delegate a first task to a second LM agent different from the first LM agent and an indication that the natural language request is from the first LM agent; Transmitting the first LM output data to a second computer system corresponding to the second LM agent; Receiving first data from the second computer system in response to the first LM outputting data; Generating second LM output data using the first data and the first LM, the second LM output data representing a response to the first input data, and The second LM output data is sent to the first system component.
2. The computer-implemented method of claim 1, further comprising: Receiving second data representing natural language instructions for how the first LM agent handles tasks, the second data indicating: A first instruction to determine, for the first LM agent, whether the second LM agent is more capable of handling the task, and Second instructions to delegate the task to the second LM agent in response to determining that the second LM agent is more capable of handling the task, and Determining a first LM prompt using the first input data and the second data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
3. The computer-implemented method of claim 1 or 2, further comprising: Receiving second data representing an identifier corresponding to the second LM agent and a natural language description of capabilities corresponding to the second LM agent, and Determining a first LM prompt using the first input data and the second data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
4. The computer-implemented method of claim 1,2, or 3, further comprising: Receiving second data representing an identifier corresponding to a software component and a natural language description of capabilities corresponding to the software component, and Determining a first LM prompt using the first input data and the second data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
5. The computer-implemented method of claim 1, 2, 3, or 4, further comprising: transmitting, to the second LM agent, second data representing natural language instructions for how the second LM agent handles tasks, the second data indicating: A first instruction for the second LM agent to determine if it is capable of handling the task indicated in the message from the other LM agent, Generating a second response to the other LM agent by processing the message using a second LM corresponding to the second LM agent in response to determining that the second LM agent is capable of handling the task, and Generating third instructions to the other LM agent indicating that the second LM agent cannot handle the second response to the task in response to determining that the second LM agent cannot handle the task.
6. The computer-implemented method of claim 1,2, 3,4, or 5, further comprising: transmitting, to the second LM agent, second data representing a natural language request for a description of capabilities corresponding to the second LM agent prior to generating the first LM output data; Receiving third data representing natural language descriptions of capabilities corresponding to said second LM agent, and Determining a first LM prompt using the first input data and the third data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
7. The computer-implemented method of claim 1, 2,3, 4, 5, or 6, further comprising: Receiving second input data; Generating, using the first LM, third LM output data representing a second task to be delegated to the second LM agent upon detection of an event; Determining second data corresponding to the second task using the third LM output data; Detecting the occurrence of the event, and And transmitting the second data to the second LM agent in response to detecting the occurrence, the second LM agent performing the second task in response to receiving the second data.
8. The computer-implemented method of claim 1, 2, 3, 4, 5, 6, or 7, further comprising: Receiving second data representing: A first message format corresponding to a message from a user, the first message format including a first portion indicating that the message is from a user and a second portion representing natural language user input, A second message format corresponding to messages from other LM agents, the second message format including a third portion identifying another LM agent and a fourth portion representing natural language generated by the other LM agent, and A third message format corresponding to a delegation request to be sent to other LM agents, the third message format including a fifth portion identifying a delegate LM agent and a sixth portion representing a natural language message to the delegate LM agent.
9. A first computer system, comprising: at least one processor, and At least one memory including instructions that, when executed by the at least one processor, cause the first computer system to: Receiving, by a first computer system corresponding to a first Language Model (LM) agent, first input data; Generating first LM output data using the first input data and a first LM corresponding to the first LM agent, the first LM output data representing a natural language request to delegate a first task to a second LM agent different from the first LM agent and an indication that the natural language request is from the first LM agent; Transmitting the first LM output data to a second computer system corresponding to the second LM agent; Receiving first data from the second computer system in response to the first LM outputting data; Generating second LM output data using the first data and the first LM, the second LM output data representing a response to the first input data, and The second LM output data is sent to the first system component.
10. The first computer system of claim 9, wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the first computer system to: Receiving second data representing natural language instructions for how the first LM agent handles tasks, the second data indicating: A first instruction to determine, for the first LM agent, whether the second LM agent is more capable of handling the task, and Second instructions to delegate the task to the second LM agent in response to determining that the second LM agent is more capable of handling the task, and Determining a first LM prompt using the first input data and the second data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
11. The first computer system of claim 9 or 10, wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the first computer system to: Receiving second data representing an identifier corresponding to the second LM agent and a natural language description of capabilities corresponding to the second LM agent, and Determining a first LM prompt using the first input data and the second data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
12. The first computer system of claim 9, 10 or 11, wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the first computer system to: Receiving second data representing an identifier corresponding to a software component and a natural language description of capabilities corresponding to the software component, and Determining a first LM prompt using the first input data and the second data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
13. The first computer system of claim 9, 10, 11, or 12, wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the first computer system to: transmitting, to the second LM agent, second data representing natural language instructions for how the second LM agent handles tasks, the second data indicating: A first instruction for the second LM agent to determine if it is capable of handling the task indicated in the message from the other LM agent, Generating a second response to the other LM agent by processing the message using a second LM corresponding to the second LM agent in response to determining that the second LM agent is capable of handling the task, and Generating third instructions to the other LM agent indicating that the second LM agent cannot handle the second response to the task in response to determining that the second LM agent cannot handle the task.
14. The first computer system of claim 9, 10, 11, 12, or 13, wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the first computer system to: transmitting, to the second LM agent, second data representing a natural language request for a description of capabilities corresponding to the second LM agent prior to generating the first LM output data; Receiving third data representing natural language descriptions of capabilities corresponding to said second LM agent, and Determining a first LM prompt using the first input data and the third data, wherein generating the first LM output data includes processing the first LM prompt using the first LM.
15. The first computer system of claim 9, 10, 11, 12, 13, or 14, wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the first computer system to: Receiving second input data; Generating, using the first LM, third LM output data representing a second task to be delegated to the second LM agent upon detection of an event; Determining second data corresponding to the second task using the third LM output data; Detecting the occurrence of the event, and And transmitting the second data to the second LM agent in response to detecting the occurrence, the second LM agent performing the second task in response to receiving the second data.

Description

Collaboration between language models Cross Reference to Related Applications The present application claims priority from U.S. patent application Ser. No. 18/759,176, filed on 6/28 of 2024 and entitled "COOPERATION BETWEEN LANGUAGE MODELS". The present application also claims priority from U.S. patent application Ser. No. 18/759,147, filed on 6/28 of 2024, and entitled "COOPERATION BETWEEN LANGUAGE MODELS". The contents of the above application are expressly incorporated herein by reference in their entirety. Background Natural language processing systems have evolved to the point where humans can interact with computing devices using their voice and natural language text inputs. Such systems employ computing techniques to identify words spoken and written by a human user based on received input data of various qualities. Speech recognition in combination with natural language understanding processing techniques enable speech-based user control of a computing device to perform tasks based on spoken input by a user. Such processing may be used by computers, hand-held devices, telephone computer systems, self-service terminals, and a wide variety of other devices to improve human-machine interaction. Drawings For a more complete understanding of the present disclosure, reference is now made to the following descriptions taken in conjunction with the accompanying drawings. Fig. 1A is a conceptual diagram illustrating the operation of an mediator Language Model (LM) agent in a multi-agent system according to an embodiment of the present disclosure. Fig. 1B is a conceptual diagram illustrating the operation of a trusted people LM agent in a multi-agent system according to an embodiment of the present disclosure. Fig. 2A is a conceptual diagram illustrating an example operation of an LM and LM orchestrator of the system according to an embodiment of the present disclosure. Fig. 2B is a flowchart illustrating an example operation of the LM and LM orchestrator of the system according to an embodiment of the present disclosure. Fig. 3 is a conceptual diagram illustrating components of a natural language processing system according to an embodiment of the present disclosure. Fig. 4 is a conceptual diagram illustrating in further detail LM components of a natural language processing system according to an embodiment of the present disclosure. Fig. 5A illustrates example operations of hint generation in an LM system according to an embodiment of the present disclosure. Fig. 5B illustrates example operations of hint generation in a multimodal LM system performing speech recognition according to an embodiment of the present disclosure. Fig. 5C illustrates example operations for hint generation in a multimodal LM system performing speech synthesis according to embodiments of the present disclosure. Fig. 5D illustrates example operations of a hint generating component in a multimodal LM system that performs voice-to-voice functions according to embodiments of the present disclosure. Fig. 6 is a flowchart illustrating an example method of configuring and using mediator agents in a multi-agent system, according to an embodiment of the present disclosure. Fig. 7 is a block diagram conceptually illustrating example components of an apparatus according to an embodiment of the present disclosure. Fig. 8 is a block diagram conceptually illustrating example components of a system according to embodiments of the present disclosure. Fig. 9 illustrates an example of a network for use with an overall system according to an embodiment of the present disclosure. Detailed Description Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics that involves processing user command input in the form of natural human language (e.g., english, chinese, etc.). Such natural language commands may be provided in audio, text, image, or other formats. Natural language processing may involve several different specific processing techniques, such as those discussed below. Automatic Speech Recognition (ASR) is a field of computer science, artificial intelligence, and linguistics that involves transforming audio data associated with speech into a word-segment or other textual representation of the speech. Similarly, natural Language Understanding (NLU) is a field of computer science, artificial intelligence, and linguistics that involves enabling a computer to derive meaning from natural language input, such as spoken input. ASR and NLU are typically used together as part of the language processing component of the system. Speech Synthesis Generation (SSG), sometimes referred to as text-to-speech or TTS, is a field of computer science that involves converting text data and/or other data into audio data that is synthesized to resemble human speech. Natural Language Generation (NLG) is an artificial intelligence field that involves automatically transforming data into natural language (e.g., english) conten