CN-121280877-B - Visual algorithm self-training method based on multi-agent cooperative optimization
Abstract
The invention discloses a visual algorithm self-training method based on multi-agent cooperative optimization, which comprises the following steps of constructing a multi-agent system architecture, comprising a user interaction layer, an intelligent scheduling layer, an A2A protocol communication layer and a professional agent cluster layer, analyzing user task intention and generating an execution plan by the user interaction layer, scheduling an agent to call the professional agent to complete data processing, model construction, training, testing and deployment, coordinating task flow by a standardized communication mechanism, supporting task execution by combining an MCP tool set, a knowledge base module and a memory system, automatically executing rescheduling operation when the task fails, and finally outputting a self-training result. The invention obviously improves development efficiency, self-adaptability and intelligence level, and is suitable for computer vision tasks such as industrial detection, intelligent security, automatic driving and the like.
Inventors
- ZHU WENJUAN
- TAN YINGYING
- FENG PENGFEI
- TIAN XIAOCHUN
- TAO TAO
- WANG YISHI
- WANG YA
- ZHANG YUJIE
- WANG XIAOHUI
- XIONG ZIHANG
- LIANG QING
- WANG CHEN
- TAO RENYOU
- ZHU KAI
- LIANG ZIJUN
- XIAO BIN
- ZHOU GUOHAO
- CHEN LIANG
Assignees
- 安徽合擎智能机器人有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251208
Claims (7)
- 1. A visual algorithm self-training method based on multi-agent cooperative optimization is characterized by comprising the following steps: s1, constructing a multi-agent system architecture, wherein the multi-agent system comprises a user interaction layer, an intelligent scheduling layer, an A2A protocol communication layer and a professional agent cluster layer; S2, receiving image or video data uploaded by a user at a user interaction layer, and analyzing and generating user task intention through a user interface of an integrated LLM model; s3, the intelligent scheduling layer receives task intention of a user, builds professional capability context information through scheduling agents in the professional agents, and integrates LLM model to complete task understanding and task disassembling, and generates execution steps and dependency relationships to form an execution plan; S4, based on the execution plan, scheduling the intelligent agent to call the professional intelligent agent to execute the corresponding subtasks; S5, establishing a standardized communication mechanism at an A2A protocol communication layer to finish asynchronous call, state update, skill discovery and error processing among the intelligent agents; s6, setting professional division through a role definition module at a professional agent cluster layer, executing subtask understanding and cooperation plan through a LLM model, completing data processing and training through an MCP tool set, providing domain knowledge support through a knowledge base module, and maintaining task context and interaction records through a memory system; S7, in the task execution process, if any execution step fails, the intelligent agent is scheduled to execute rescheduling operation; The A2A protocol communication layer specifically comprises: establishing a standardized communication mechanism in an A2A protocol communication layer to support message exchange and instruction transmission between a scheduling agent and a professional agent, wherein the professional agent comprises a data labeling agent, a data analysis agent, an algorithm engineer agent, an algorithm training engineer agent, a test engineer agent, a deployment engineer agent and an algorithm application engineer agent; Defining a structure body format based on an A2A protocol, and constructing an intelligent body calling request structure, wherein fields comprise a task number, a target intelligent body identifier, a subtask index, a calling mode and calling parameters; Generating a corresponding response structure after the intelligent agent receives the request structure, wherein the fields comprise response state codes, response data and abnormal information; The synchronous call is suitable for feeding back task states in real time, the asynchronous call is suitable for non-blocking task scheduling, the streaming call is suitable for task process tracking feedback, and the batch call is suitable for unified initiation and centralized return of multiple subtasks; Recording calling log information in the protocol communication process, wherein the log content comprises a calling time stamp, an intelligent agent interaction path, a message body hash value and state code information; based on the exception handling mechanism, identifying error field content in the response structure, if the calling exception is identified, triggering an error retry module, and updating task state records and error handling logs; The A2A protocol communication layer supports the scheduling of the intelligent agent to execute skill discovery operation, inquires the intelligent agent registration information and capability items in a protocol broadcasting request mode, and updates a callable intelligent agent set based on a capability matching result; the professional agent cluster layer specifically comprises: Configuring a role definition module, a LLM model, an MCP tool set, a knowledge base module and a memory system in a professional intelligent agent cluster layer; Setting a responsibility range, an interaction interface and a capability label of a data labeling intelligent agent, a data analyzer intelligent agent, an algorithm engineer intelligent agent, an algorithm training engineer intelligent agent, a test engineer intelligent agent, a deployment engineer intelligent agent and an algorithm application engineer intelligent agent through a role definition module; The data marking agent receives the scheduling instruction and then calls a data marking tool to complete the target marking and label structure generating operation of the image or video data; the data analyst agent performs feature extraction, distribution analysis and data quality assessment operations on the structured label data, and outputs an analysis result structure; the method comprises the steps of setting a model structure, initializing parameters and generating a training flow by using an algorithm engineer agent, calling an algorithm component configuration template by using an algorithm training engineer agent, executing a model training process by using an MCP tool set by using the algorithm training engineer agent, collecting training logs and performance indexes, outputting a training report, completing accuracy verification, stability test and abnormality analysis tasks by using a test component by using a test engineer agent based on a training model, and outputting a test result; Executing subtask understanding and collaborative plan generating operation through the LLM model, and outputting subtask targets, processing requirements and interaction sequence information; The MCP tool set is used for calling a stage tool to execute data cleaning, label generation, feature analysis, model construction, training tuning, test verification and deployment release operation; Providing visual field knowledge related to tasks and tool use reference documents through a knowledge base module; recording task numbers, execution steps, interaction logs and state feedback information through a memory system, and maintaining multi-round task execution context content and professional intelligent agent response history; In the task execution process, if any execution step fails, the scheduling agent performs rescheduling operation, specifically: The dispatching agent monitors the response result, the communication calling state and the data content returned by each stage of the MCP tool set of the professional agent, identifies the task execution failure condition and records the failure node position, the failure type information and the failure time index; Inquiring a task dependency relationship table based on failure type information, marking the node state of the affected subtask, and updating the execution effectiveness identification of the associated path in the task execution map; Invoking a memory system, extracting historical input parameters, response contents and context structure information corresponding to a failed subtask, and constructing a subtask context information structure; based on the recovery information structure and the LLM model output result, regenerating a subtask target, a calling sequence and resource demand parameters, and correcting a subtask dependent path in an execution plan; inquiring the state table of the intelligent agent and the context information of the professional ability, screening the callable professional intelligent agent set, and selecting target intelligent agent resources; reconstructing a call request structure, updating task numbers, subtask indexes, call modes and call parameter fields, and restarting an execution instruction of the prodigal tasks; Registering the rescheduling state information of the subtasks in the task state record table, and marking the task stage state in a retry flow; Recording the whole process information of rescheduling operation, and constructing a record entry of a failure scheduling log, wherein the whole process information comprises a failure task number, a failure node position, failure reason classification, rescheduling time and a target agent identifier; If the rescheduling operation is successfully executed, the task updating state of the scheduling agent is normally completed, and if the rescheduling attempt number is larger than a preset number threshold, the scheduling agent stops the execution flow, generates an abnormal termination report and marks the task state as the execution failure.
- 2. The visual algorithm self-training method based on multi-agent cooperative optimization according to claim 1, wherein the step S2 is specifically: Receiving image data or video data uploaded by a user at a user interaction layer, and carrying out format identification and input registration on the received data; extracting task description text associated with input data through a user interface of an integrated LLM model; executing task intention recognition based on the task description text to generate task intention information; And submitting the task request structured content to an intelligent scheduling layer, and recording a task number, an input data index and task submitting time information.
- 3. The visual algorithm self-training method based on multi-agent cooperative optimization according to claim 1, wherein the intelligent scheduling layer comprises: The scheduling agent receives the structured content of the task request, establishes a task number index table and registers the task priority, the task type and the input data source; Scheduling an agent summarizing historical task record, professional agent role configuration and knowledge base capability entries, and generating professional capability context information; Executing task understanding on the task request structured content through an integrated LLM model, and extracting task targets, processing objects and output form information; task disassembly is carried out through an integrated LLM model on the basis of task understanding, a subtask sequence, a dependency relationship table and a resource demand list are generated, and an execution plan initial draft is formed; And the scheduling agent performs structure verification and resource matching on the initial manuscript of the execution plan, marks the executable state and generates the execution plan.
- 4. The visual algorithm self-training method based on multi-agent cooperative optimization according to claim 1, wherein the LLM model is specifically: the LLM model is used as an inference engine to receive task request structured content; Executing task understanding based on the task request structured content, and identifying a task target, a processing object and an output requirement based on task description text to form a semantic understanding result; Based on semantic understanding results, performing intelligent task decomposition to generate a subtask sequence, a dependency relationship table and a resource demand list; executing agent calling strategy analysis according to the subtask sequence, identifying required professional ability, selecting matched agent combinations and constructing a multi-agent cooperative flow; and executing dynamic task scheduling adjustment based on the state of the agent and the availability of resources, updating a task allocation scheme and providing optimized feedback information.
- 5. The visual algorithm self-training method based on multi-agent collaborative optimization according to claim 1, wherein the MCP tool set is specifically: the MCP tool set comprises a data processing tool component, a model building tool component, a training optimizing tool component, a testing verifying tool component and a deployment publishing tool component; The system comprises a data processing tool component, a data processing tool component and a test and publishing tool component, wherein the data processing tool component comprises a data cleaning tool, a format conversion tool and an anomaly detection tool, is used for receiving original image or video data and outputting a standard format data structure, the model building tool component comprises a model template library, a parameter configuration interface and a structure visualization module, receives model configuration parameters input by an algorithm engineer agent and outputs an initial model definition structure, the training and optimizing tool component comprises a training task manager, a log collector and a performance evaluation module, receives initial models and training data and outputs a trained model structure and a performance report, the test and validation tool component comprises a test case generator, a stability validation module and a result analysis module, receives training models and test data and outputs test index results, and the deployment and publishing tool component comprises a deployment configuration interface, a service packaging module and an interface publishing manager, receives test passing models and deployment requirements and outputs a deployment package structure and an API service document; Each tool component interacts with the professional agent through a unified interface protocol, supports concurrent calling and task state reporting, and assists the scheduling agent to execute task state tracking and error processing operations through execution log and error information feedback.
- 6. The visual algorithm self-training method based on multi-agent cooperative optimization according to claim 1, wherein the knowledge base module specifically comprises: Storing technical specifications, industry standards and best practice contents related to self-training of a visual algorithm, and constructing a structured knowledge item, a document knowledge set, a configuration template and a case library; Establishing a semantic index structure, constructing vector representation for technical nouns, task elements and tool parameters, and supporting fuzzy matching, semantic retrieval and relevance sorting; Based on the call request submitted by the dispatching agent and the professional agent, the knowledge base module provides the domain knowledge content, algorithm theory information and tool use reference document required by the task; supporting a knowledge updating operation in the running process, collecting new technical documents, training experiences and tuning strategy information in the task executing process, and generating versioned knowledge items; setting a knowledge item metadata structure, wherein fields comprise knowledge numbers, source types, applicable task types, version numbers and access authority levels, and recording update time and maintenance records; providing a field context information input interface for the LLM model and the professional intelligent agent, converting the search result into a structured knowledge segment, and inputting the structured knowledge segment into a task reasoning process; supporting multi-format knowledge content management, including text documents, configuration templates, operation examples and fault cases, and limiting reading and submitting rights of different roles through an access control mechanism; recording knowledge retrieval logs, accessing records and call history information, wherein fields comprise task numbers, retrieval keywords, return entry indexes and call time stamps, and knowledge recommendation content is constructed based on the history records.
- 7. The visual algorithm self-training method based on multi-agent cooperative optimization according to claim 1, wherein the task execution and multi-agent system response process is based on, and the self-training process is completed and a result is output, specifically: The dispatching agent sequentially sends out a start task calling request according to an execution plan, and records a task path and a response state; the professional agent executes the subtasks and generates structured result data, and the structured result data is fed back to the scheduling agent through the A2A protocol communication layer; the memory system records task numbers, task phases, call histories and state information; the knowledge base module provides domain knowledge support in the execution process and provides structural knowledge segments for the professional intelligent agent and the LLM model; the LLM model outputs sub-task understanding results and cooperation plan information in a key stage, and finally output content is generated in an auxiliary mode; the dispatching agent gathers the output of each stage, generates the final result data and execution report of the self-training process, and registers the task completion state.
Description
Visual algorithm self-training method based on multi-agent cooperative optimization Technical Field The invention relates to the technical field of artificial intelligence, in particular to a visual algorithm self-training method based on multi-agent cooperative optimization. Background Traditional visual small model development and application require specialized manual labor division, and the workflow is highly cured and lacks elasticity. Specifically, the process starts with manually labeling a large amount of visual data by a data labeling personnel, then researching and designing a special visual network model by an algorithm engineer, wherein an AI trainer is responsible for adjusting super parameters and training strategies, a deployment engineer embeds the trained model into an actual hardware environment, and finally, the visual application engineer combines with an OpenCV and other tool libraries to integrate the visual model with specific business logic. The mode has complicated links and long chains, seriously depends on the experience of experts in each field, has long development period and high labor cost, and is difficult to quickly adapt to the continuously-changing application requirements. To improve efficiency, the prior art typically uses a predefined workflow to concatenate the links. However, such workflows are static and stiff in nature, with execution logic, parameters, and paths all fixed during the development phase. When new data distribution, task targets or unforeseen scenes appear, the system lacks dynamic semantic understanding and autonomous decision making capability, and a developer must redesign flow, modify codes and adjust parameters, so that real self-adaptive optimization cannot be realized, and the robustness and flexibility are seriously insufficient. The traditional workflow is a repeated static process in each execution, and cannot learn and optimize itself according to actual execution conditions, so that the capability of accumulating and utilizing historical experience is also lacking, and the workflow is poor in performance in complex and changeable application scenes, high in maintenance cost and incapable of being quickly adapted to changes. With the rise of large language model (Large Language Models, LLMs) technology, the powerful semantic understanding, task planning and code generation capabilities of the large language model provide core power for constructing a new generation of self-adaptive system. In recent years, large language models supporting tool calls are becoming mature, making it possible to dynamically schedule external functional modules through natural language instructions. On the basis, technical standards and frameworks such as a model context protocol (Model Context Protocol, MCP), an Agent-to-Agent Protocols (A2A) and the like are further developed, and a solid foundation is laid for constructing a multi-Agent system capable of complex cooperation. The method marks the range transition of the cooperation of the dynamic agent from the static workflow, and provides a brand-new technical path for thoroughly breaking the limitation of the traditional automatic flow. While the application of large language models and multi-intelligent systems to general automated processes has become an important trend, the application of their depth to the end-to-end self-training process of visual small models remains an underexplored area. The existing solution is still to mechanically connect the original steps in series by using a workflow, or only utilizes a large language model to complete local links (such as data enhancement or parameter recommendation), and fails to fundamentally construct a closed-loop system capable of autonomously understanding tasks, dynamically planning training strategies, cooperatively executing multiple agents and continuously optimizing based on feedback. Therefore, in order to meet the increasing demands of enterprises on automated and intelligent development tools, a self-training method capable of fully utilizing the synergistic advantages of multiple agents of a large language model is urgently needed to thoroughly replace the stiff traditional workflow, and the efficient, self-adaptive and unmanned automated training and continuous optimization of a small visual model are realized. Therefore, how to provide a visual algorithm self-training method based on multi-agent cooperative optimization is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The invention provides a visual algorithm self-training method based on multi-agent cooperative optimization, which fully merges an integrated LLM model, an A2A protocol communication layer establishes a standardized communication mechanism and an MCP tool set, realizes intelligent cooperative processing of the whole process of visual algorithm task analysis, task disassembly, subtask allocation, tool calling and model construction through