CN-121981150-A - Multi-agent training method, system, device, electronic equipment, storage medium and program product

CN121981150ACN 121981150 ACN121981150 ACN 121981150ACN-121981150-A

Abstract

Embodiments of the present disclosure provide a multi-agent training method, system, apparatus, electronic device, storage medium, and program product. According to the scheme provided by the embodiment of the specification, a first intelligent agent performs first-round training by using a first batch of main samples, a first sub-sample set corresponding to each first main sample in the first intelligent agent training is stored in a database, a second intelligent agent can acquire the first sub-sample set from the database and buffer the acquired first sub-sample set, when the number of the buffered first sub-sample sets meets the requirement, the second intelligent agent starts to perform first-round training, and in the second-round training, the first intelligent agent interacts with the second intelligent agent which completes the first-round training.

Inventors

YIN JIAJUN
LI JI
GAO YANAN
CHEN YEFEI
YE ZHILING
YUE BIN
GU JINJIE
LIU JUNWEI
WEI PENG
LIU JINGNAN
CHEN ZHE
WANG YUAN
Liao Xinhao
YU AILING
XIAO HANSONG
ZHOU HUALEI
GUO CHUNXIAO

Assignees

支付宝(杭州)数字服务技术有限公司

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (19)

1. A multi-agent training method, wherein the multi-agent comprises a first agent and a second agent, the method comprising: Acquiring a first batch of main samples, wherein the first batch of main samples comprises at least one first main sample; The first intelligent agent performs a first round of training by using the first batch of main samples, wherein in the training, one first main sample correspondingly generates one first sub-sample set which contains at least one first sub-sample; The second agent acquires a first sub-sample set from the database and caches the acquired first sub-sample set; When the number of the cached first sub-sample sets meets the requirement, the second agent performs a first training by using the cached first sub-sample sets; wherein, the first agent interacts with the second agent completing the first round of training in a second round of training.
2. The method of claim 1, wherein the requirement is satisfied when the number of the first sub-sample sets buffered is the same as the number of the first main samples.
3. The method of claim 1, wherein the second agent performs a first round of training with the buffered first sub-sample set of the number of sub-samples, comprising: If the first sub-sample set with the first sub-sample number smaller than the threshold value exists in the cached first sub-sample sets with the first sub-sample number smaller than the threshold value, expanding the first sub-sample set with the first sub-sample number smaller than the threshold value so that the first sub-sample number reaches the threshold value; And if no first sub-sample set with the first sub-sample number smaller than the threshold value exists in the cached first sub-sample sets with the number, performing first training by using the cached first sub-sample sets with the number.
4. A method according to claim 3, wherein expanding the first set of sub-samples having a first number of sub-samples less than a threshold value comprises: Copying part or all of the first sub-samples in the first sub-sample set, or And copying part or all of the first sub-samples in the first sub-sample set, and processing the copied first sub-sample copies to add disturbance information.
5. The method of any one of claims 1 to 4, wherein the second agent is configured with a version number, the method further comprising: And after the second agent completes the first training, updating the version number of the second agent.
6. The method as recited in claim 5, further comprising: before the first agent starts a second training round, the version number of the second agent is obtained; if the version number meets the requirement, the first agent starts a second training round; And if the version number does not meet the requirement, waiting for starting the second training after the version number of the second intelligent agent meets the requirement.
7. The method of claim 1, wherein the second agent performs a first round of training with the buffered first sub-sample set of the number of sub-samples, comprising: configuring the same weight for the first sub-samples in the same first sub-sample set; And calculating loss by using a weighted loss function when the cached first sub-sample set with the number is subjected to first training, and updating parameters in a second model to be trained of the second agent based on the calculated loss.
8. The method of claim 6, wherein the first sub-sample configuration in the same first sub-sample set has a weight of 1/n, where n is the number of first sub-samples in the first sub-sample set.
9. The method of claim 1, wherein in training, the first agent performs the following steps for a first master sample of the first batch of master samples: generating at least one subtask by the first agent for the first main sample; and corresponding to one subtask, the first intelligent agent performs one-round interaction with the second intelligent agent, and generates a first subsampleafter one-round interaction.
10. A multi-agent training method is characterized in that the multi-agent comprises a first agent and a second agent, the multi-agent training method comprises an asynchronous training method and a synchronous training method which are alternately performed, wherein, The synchronous training method comprising the steps of the multi-agent training method according to any one of the preceding claims 1 to 9; The asynchronous training method comprises the following steps: Obtaining a second batch of main samples, wherein the second batch of main samples comprises at least one second main sample; in the training, a second main sample correspondingly generates a second sub-sample set, and the second sub-sample set contains at least one second sub-sample; The second agent performs multiple rounds of training by using a second batch of sub-samples according to the training pace, wherein the second batch of sub-samples comprises second sub-sample sets respectively corresponding to the at least one second main sample; and after the first intelligent agent performs the first set round training or the second intelligent agent performs the second set round training, starting the synchronous training method.
11. A multi-agent training system is characterized by comprising a first agent, a second agent, a controller, a database and a tool server, wherein, The first agent and the second agent, with the aid of the controller, implementing the multi-agent training method of any one of the preceding claims 1 to 10; The controller is used for respectively transmitting information to the first agent and the second agent when the first agent interacts with the second agent, and also used for interacting with the second agent to call a tool according to the indication of the second agent and feeding back the execution result of the tool to the second agent; A database for storing a first sub-sample set generated in the first agent training; And the tool server is provided with at least one tool for the second agent to call.
12. A multi-agent training device, wherein, multi-agent includes a first agent and a second agent, the multi-agent training device comprising: the acquisition module is used for acquiring a first batch of main samples, wherein the first batch of main samples comprise at least one first main sample; The first triggering module is used for triggering the first intelligent agent to perform a first round of training by using the first batch of main samples, wherein in the training, one first main sample correspondingly generates one first sub-sample set which contains at least one first sub-sample; The second triggering module is used for triggering the second agent to acquire a first sub-sample set from the database and caching the acquired first sub-sample set, and when the number of the first sub-sample sets to be cached meets the requirement, the second agent performs first-round training by using the cached first sub-sample sets; wherein, the first agent interacts with the second agent completing the first round of training in a second round of training.
13. A multi-agent training device is characterized in that the multi-agent training device comprises a first agent and a second agent, the multi-agent training device comprises an asynchronous training module and a synchronous training module which work alternately, The synchronization training module, configured to trigger the first agent and the second agent to perform the multi-agent training method according to any one of claims 1 to 9; the asynchronous training module is used for triggering the first intelligent agent to perform one round of training by using a second batch of main samples, in the training, one second main sample correspondingly generates one second sub-sample set, the second sub-sample set contains at least one second sub-sample, the second intelligent agent is triggered to perform multiple rounds of training by using a second batch of sub-samples according to the training pace, the second batch of sub-samples comprises second sub-sample sets respectively corresponding to the at least one second main sample, and the synchronous execution device is triggered to work after the first intelligent agent performs the first set round of training or after the second intelligent agent performs the second set round of training.
14. A multi-agent training method, wherein the multi-agent comprises a first agent and a second agent, the method being applicable to the first agent, and The method comprises the following steps: Acquiring a first batch of main samples, wherein the first batch of main samples comprises at least one first main sample; In the training, a first main sample correspondingly generates a first sub-sample set, and the first sub-sample set contains at least one first sub-sample; Storing a first sub-sample set generated in training into a database so that the second agent can acquire and cache the first sub-sample set; In the training, the first agent performs the following steps for a first main sample in the first batch of main samples: generating at least one subtask by the first agent for the first main sample; and corresponding to one subtask, performing one-round interaction with the second intelligent agent through the controller, and generating a first subsampleafter one-round interaction.
15. A multi-agent training method, wherein the multi-agent comprises a first agent and a second agent, the method being applicable to the second agent, and The method comprises the following steps: Acquiring a first sub-sample set generated in a first round of training process of the first intelligent agent by using a first batch of main samples from a database; caching the acquired first sub-sample set; And when the number of the cached first sub-sample sets meets the requirement, performing first training by using the cached first sub-sample sets.
16. An electronic device is characterized in that a first agent is deployed, the electronic device comprises a memory and a processor, wherein, The memory is used for storing executable instructions; the processor implements the steps in the multi-agent training method of claim 14 by executing the executable instructions.
17. An electronic device is characterized in that a second agent is deployed, the electronic device comprises a memory and a processor, wherein, The memory is used for storing executable instructions; the processor is configured to implement the steps in the multi-agent training method of claim 15 by executing the executable instructions.
18. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, perform the steps of the multi-agent training method of claim 14 or 15.
19. A computer program product, characterized in that the computer program product comprises a computer program or instructions which, when executed by a processor, cause the processor to perform the steps in the multi-agent training method as claimed in claim 14 or 15.

Description

Multi-agent training method, system, device, electronic equipment, storage medium and program product Technical Field The present disclosure relates to the field of computer technologies, and in particular, to a multi-agent training method, system, device, electronic apparatus, storage medium, and program product. Background DEEP RESEARCH (deep research direction) is one of the core challenges in the current AI Agent (artificial Agent) field, requiring agents to accomplish a long-context, multi-step, complex reasoning task in a real network environment. Aiming at the complex problem, the industry proposes a multi-Agent collaboration framework, the task is decomposed into a main Agent (MAIN AGENT) which is responsible for task planning and complex reasoning, and a Sub Agent (Sub Agent) which is responsible for specific retrieval and tool execution. However, this multi-agent architecture presents a key training challenge. In the related art, the training pace of two agents is difficult to synchronize, so that the training effect of the two agents is poor. Therefore, a scheme capable of supporting multi-agent cooperative training and achieving training synchronization is urgently needed. Disclosure of Invention Various embodiments of the present specification provide a multi-agent training method, system, apparatus, electronic device, storage medium, and program product. A first embodiment of the present specification provides a multi-agent training method. The multi-agent includes a first agent and a second agent. The corresponding multi-agent training method comprises the following steps: Acquiring a first batch of main samples, wherein the first batch of main samples comprises at least one first main sample; The first intelligent agent performs a first round of training by using the first batch of main samples, wherein in the training, one first main sample correspondingly generates one first sub-sample set which contains at least one first sub-sample; The second agent acquires a first sub-sample set from the database and caches the acquired first sub-sample set; When the number of the cached first sub-sample sets meets the requirement, the second agent performs a first training by using the cached first sub-sample sets; wherein, the first agent interacts with the second agent completing the first round of training in a second round of training. A second embodiment of the present specification provides a multi-agent training method. The multi-agent training method comprises an asynchronous training method and a synchronous training method which are alternately performed, wherein, The synchronous training method comprises the steps in the multi-agent training method provided by the embodiment; The asynchronous training method comprises the following steps: Obtaining a second batch of main samples, wherein the second batch of main samples comprises at least one second main sample; in the training, a second main sample correspondingly generates a second sub-sample set, and the second sub-sample set contains at least one second sub-sample; The second agent performs multiple rounds of training by using a second batch of sub-samples according to the training pace, wherein the second batch of sub-samples comprises second sub-sample sets respectively corresponding to the at least one second main sample; and after the first intelligent agent performs the first set round training or the second intelligent agent performs the second set round training, starting the synchronous training method. A third embodiment of the present specification provides a multi-agent training system. The multi-agent training system comprises a first agent, a second agent, a controller, a database and a tool server, wherein, The first agent and the second agent implement the multi-agent training method provided in the first or second embodiment with the help of the controller; The controller is used for respectively transmitting information to the first agent and the second agent when the first agent interacts with the second agent, and also used for interacting with the second agent to call a tool according to the indication of the second agent and feeding back the execution result of the tool to the second agent; A database for storing a first sub-sample set generated in the first agent training; And the tool server is provided with at least one tool for the second agent to call. A fourth embodiment of the present specification provides a multi-agent exercise device. The multi-agent training device comprises a first agent and a second agent, and the corresponding multi-agent training device comprises: the acquisition module is used for acquiring a first batch of main samples, wherein the first batch of main samples comprise at least one first main sample; The first triggering module is used for triggering the first intelligent agent to perform a first round of training by using the first batch of main samples, wherein in the training, one