CN-121989250-A - Robust multi-robot operation method and system based on closed-loop multi-agent frame and robot

CN121989250ACN 121989250 ACN121989250 ACN 121989250ACN-121989250-A

Abstract

The invention discloses a robust multi-robot operation method, a system and a robot based on a closed-loop multi-agent framework, wherein the system comprises three agents, namely planning, operating and verifying, which cooperatively form a closed-loop workflow of planning, executing and verifying, wherein the planning agents complete task decomposition and heterogeneous distribution based on language instructions and scene information, the operating agents convert abstract subtasks into parameterized 6-DoF action primitives through visual analysis and cascading perception tools, and the built-in double-layer memory mechanism improves the efficiency, and the verifying agents realize visual verification of execution results and trigger layered error recovery. The invention breaks through the traditional open-loop execution mode, solves the problems of poor robustness, insufficient multi-machine cooperation capability, disjoint semantic planning and physical execution and the like in the prior art, remarkably improves the task success rate and fault tolerance capability of the multi-robot system in an unstructured environment, can be widely applied to heterogeneous multi-robot cooperation scenes such as home service, industrial manufacturing, logistics storage and the like, and has strong industrial practicability.

Inventors

ZHENG WEISHI
HE YIXIANG
WEI LAN
CEN HAOMING
JIANG JIANJIAN
LI ZHUOHAO
YANG YIHAN

Assignees

中山大学

Dates

Publication Date: 20260508
Application Date: 20260320

Claims (10)

1. A robust multi-robot operation method based on a closed-loop multi-agent framework is characterized by comprising the following steps of: The planning agent receives natural language instructions of users and coarse-granularity scene descriptions containing the types, positions and approximate orientations of objects in a working space of the robot, decomposes a total task into subtasks in the form of directed acyclic graphs with logic dependence by using a large language model, completes heterogeneous allocation of the subtasks by combining a robot capability database, generates a task schedule with parallel marks, and distributes the subtasks to the operation agent; after the operation intelligent agent receives the subtasks, parameterized 6-DoF physical action primitives are generated and executed through visual perception and analysis processes; The verification agent acquires environmental images before and after operation, realizes visual verification of an execution result through multi-mode large model comparison, and outputs a successful or failed binary judgment result; And executing the subsequent process according to the judging result, if successful, advancing to the next subtask, if unsuccessful, triggering a local or global recovery cycle according to the error type, and re-executing the corresponding subtask after correction until all the subtasks are successfully verified, so as to form a closed-loop workflow of planning, executing and verifying.
2. The robust multi-robot operating method based on a closed-loop multi-agent framework of claim 1, wherein the decomposition of the overall task is specifically: The planning agent analyzes logic dependence among the subtasks by using a large language model, identifies a task chain needing to be sequentially executed and a task group capable of being executed in parallel, adds a parallel mark for the parallel subtasks, and realizes parallel task scheduling of multiple robots.
3. The robust multi-robot operating method based on a closed-loop multi-agent framework of claim 1, wherein when a natural language instruction is ambiguous or coarse-grained scene description lacks critical information, further comprising the steps of: The planning agent generates a further perception mark, a visual perception module of the operation agent is triggered to acquire a fine-grained object list and a state of the environment and feeds back the fine-grained object list and the state, and the planning agent corrects and refines the task directed acyclic graph based on feedback information.
4. The robust multi-robot operation method based on a closed-loop multi-agent framework of claim 1, wherein the visual perception and resolution process of the operation agent specifically comprises: The method comprises the steps of carrying out operation analysis based on a visual language model, dividing the operation into active object and passive object roles related to objects, extracting object semantic parts and judging whether double-arm cooperative operation is needed, and calling a cascade visual perception tool to obtain 3D space anchor points of the objects as target coordinates of motion planning.
5. The robust multi-robot operating method based on a closed-loop multi-agent framework of claim 4, wherein the call flow of the cascaded visual perception tool is: Firstly, obtaining an object boundary box through a VLM detector, and inputting the boundary box into a SAM model to generate a pixel-level high-precision mask; positioning functional key points on the 2D image through a visual prompt tool, and positioning a key point group symmetrical about the center of the object through double-arm cooperative operation; and finally, combining the depth map information of the depth camera, and back-projecting the 2D key points to a 3D space to obtain 3D space anchor points.
6. The robust multi-robot operating method based on a closed loop multi-agent framework of claim 4, wherein said operating agent generates parameterized 6-DoF physical action primitives by: Based on the 3D space anchor point and the analyzed operation semantics, the visual language model selects corresponding skills from the action primitive library and combines the skills, the grabbing operation calls the grabbing generation model to generate candidate 6-DoF grabbing gestures and combines semantic key point optimization, the direction adjustment operation calls the rotation perception tool to predict a rotation axis, a direction and an angle, and the obtained parameters are filled into the action primitives to complete parameterization.
7. The robust multi-robot operating method based on a closed loop multi-agent framework of claim 1, wherein the operating agent incorporates a dual-layer memory mechanism of short-term and long-term memory; The short-term memory records the executed action state of the current task sequence and is used for completing action execution logic; The long-term memory is an experience pool for storing a task signature-action template, the task signature format is "< action > active object passive body", and the action template is directly called to generate action primitives when a new task signature is matched with the history.
8. The robust multi-robot operating method based on a closed loop multi-agent framework of claim 1, wherein the error types are classified into recoverable execution errors and physically infeasible global errors; the recoverable execution error triggers a local recovery cycle, a verification intelligent agent sends a correction instruction to an operation intelligent agent, and the operation intelligent agent autonomously generates a local correction action sequence; and triggering a global recovery cycle by the global error which is not feasible, reporting a fault context to a planning agent by a verification agent, planning the intelligent weight plan and modifying the task directed acyclic graph.
9. A robust multi-robot operating system based on a closed-loop multi-agent framework, characterized in that it is applied to the robust multi-robot operating method based on a closed-loop multi-agent framework of any one of claims 1-8, comprising planning an agent module, operating the agent module, and verifying the agent module; The planning agent module is used for decomposing a total task into subtasks in the form of a directed acyclic graph with logic dependence by utilizing a large language model based on the natural language instruction of a planning agent receiving user and coarse granularity scene description containing the type, the position and the approximate direction of an object in a working space of a robot, combining a robot capability database to finish heterogeneous allocation of the subtasks and generate a task schedule with parallel marks, and distributing the subtasks to an operation agent; the operation agent module is used for generating parameterized 6-DoF physical action primitives through visual perception and analysis flow after the operation agent receives the subtasks and executing the parameterized 6-DoF physical action primitives; The verification agent module is used for verifying that an agent acquires environmental images before and after operation, realizing visual verification of an execution result through multi-mode large model comparison, outputting a successful or failed binary judgment result, executing a subsequent procedure according to the binary judgment result, advancing to a next subtask if the subsequent procedure is successful, triggering a local or global recovery cycle according to an error type if the subsequent procedure is successful, and re-executing a corresponding subtask after correction until all subtasks are successfully verified, so as to form a planning-execution-verification closed-loop workflow.
10. A robot is characterized in that, the robot includes: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform the robust multi-robot operating method based on a closed loop multi-agent framework of any of claims 1-8.

Description

Robust multi-robot operation method and system based on closed-loop multi-agent frame and robot Technical Field The invention belongs to the technical field of machine learning, and particularly relates to a robust multi-robot operation method and system based on a closed-loop multi-agent frame and a robot. Background Along with the rapid development of robot technology, the multi-robot system has great application potential in complex application scenes such as home service, industrial manufacturing, logistics storage and the like by virtue of the advantages of strong parallel processing capability, high system redundancy, good expansibility and the like. In order to realize long-period and complex operation tasks, the multi-robot system not only needs to have the bottom-layer accurate physical operation capability, but also needs to have the high-layer semantic understanding and logical reasoning capability, so as to realize accurate landing from a user natural language instruction to physical action execution. In recent years, the introduction of a Large Language Model (LLMs) provides a new technical idea for robot operation, and the abstract natural language instruction can be decomposed into subtasks executable by a robot by utilizing the strong semantic reasoning and planning capability of the large language model, so that the improvement of the intelligent level of a robot system is promoted. However, the existing robot system based on the large language model still has significant technical bottlenecks in effectively landing high-level semantic reasoning to be a robust physical multi-agent execution, and specific defects are as follows: The open-loop execution mode is poor in robustness, the existing multi-robot planning framework generally adopts the open-loop execution mode, the bottom physical operation is regarded as an idealized primitive, and the execution uncertainty such as grabbing slippage, contact state change, external interference and the like which are widely existed in a real physical environment is ignored under the assumption that the action is necessarily successful after being issued. Due to the lack of dynamic verification and feedback mechanisms in operation, the bottom local execution errors cannot be perceived by a high-level planner, error cascading is easy to cause, and finally, the cooperation of the whole system fails, so that the reliability and the task success rate of the multi-robot system in practical application are greatly reduced. The single robot strategy expansibility is limited, the existing single robot operation method has better physical grounding capability in a single working space, but lacks a space-time coordination mechanism required by multi-machine cooperation, cannot process long-period tasks which span a plurality of working spaces, is difficult to meet the requirements of complex scenes on system parallelism and redundancy, and limits the range and execution efficiency of operation tasks. The error recovery mechanism is missing and the fault tolerance is low, when the prior art encounters an execution error, the prior art lacks a fine hierarchical recovery strategy, and the recoverable execution error (such as a grabbing position deviation) cannot be distinguished from the unrecoverable feasibility error (such as that the target exceeds the working space). The small execution errors can cause the interruption or the resetting of the whole task flow, and the lack of flexible coping means combining the local self-correction and the global re-planning has low task execution efficiency. The adaptability to unstructured instructions is poor, the traditional multi-robot task and motion planning method relies on stiff predefined domain rules (such as PDDL) and geometric constraints, open natural language instructions are difficult to understand and generalize, semantic changes in unstructured environments cannot be flexibly adapted, human-computer interaction experience is poor, and system deployment cost is high. In summary, in the prior art, an obvious fault exists between the connection of the high-level semantic planning and the bottom-level physical execution, and particularly in a multi-robot cooperation scene, a closed-loop control framework capable of sensing environmental feedback in real time, actively verifying an execution result and performing self-adaptive error recovery is lacking. Therefore, there is a need to develop a robust multi-robot operating system integrating semantic planning, physical landing and result verification, which solves the above-mentioned drawbacks of the prior art. Disclosure of Invention The invention aims to overcome the defects and shortcomings of the prior art and provides a robust multi-robot operation method, a system and a robot based on a closed-loop multi-agent framework, which are used for realizing the accurate and robust landing of a high-level semantic instruction on the bottom physical execution by constr