CN-121979680-A - Language model reasoning resource scheduling method and system based on multi-agent collaboration

CN121979680ACN 121979680 ACN121979680 ACN 121979680ACN-121979680-A

Abstract

The invention relates to a multi-agent cooperation-based language model reasoning resource scheduling method and system, which realize intelligent simulation of multi-view iterative thinking in a complex reasoning task by constructing a multi-agent system with definite division and special knowledge base, effectively overcome the view limitation and decision bias of a single model in long sequence and multi-step pushing, and promote the specialization and accuracy of a reasoning result by means of fusion of the general capability of a large language model and the special knowledge base of the task, and simultaneously, solve the problems of resource waste, redundant calculation and unstable convergence caused by cognitive overload in the cooperation process by combining a multi-round dynamic cooperation with a convergence mechanism of cognitive load perception and a dynamic token quantity limitation and low confidence branch pruning strategy, thereby obviously optimizing the utilization efficiency of calculation resources, reducing peak occupation and shortening the integral reasoning delay on the premise of guaranteeing the reasoning quality and providing reliable technical support for efficient and stable deployment of the large language model in the complex task.

Inventors

MAI MIAO
LI SHENGCAI
LI DUNHONG
LUO XIAOLONG

Assignees

广东南方智媒科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. A language model reasoning resource scheduling method based on multi-agent cooperation is characterized by comprising the following steps of, Receiving task input to be processed, and after primary semantic analysis is carried out on the task input by a main control intelligent agent, decomposing and distributing the task to a plurality of auxiliary intelligent agents; Controlling the plurality of auxiliary agents to execute multiple rounds of dynamic collaborative reasoning from different reasoning function perspectives based on respective system prompts and a special knowledge base; After each round of reasoning is finished, calculating the cognitive load index of each auxiliary agent in real time, and dynamically adjusting a convergence judgment threshold according to the cognitive load index; When the convergence condition after the dynamic adjustment is met or the preset maximum reasoning round is reached, the main control agent gathers the intermediate result of the multi-round reasoning and the change track of the quantization index, executes the final aggregation decision, and outputs the optimized reasoning result sequence.
2. The multi-agent collaboration-based language model inference resource scheduling method of claim 1, further comprising an initialization configuration step prior to performing the method, the initialization configuration step comprising, Configuring a main control intelligent agent and a plurality of auxiliary intelligent agents, wherein the main control intelligent agent is responsible for overall task decomposition, resource coordination and final result aggregation, and the plurality of auxiliary intelligent agents respectively correspond to different reasoning function perspectives; respectively constructing a system prompt for the main control intelligent agent and each auxiliary intelligent agent, wherein the system prompt at least comprises function positioning, mandatory structured output format constraint, imperative fact prohibition, and verifiable intermediate representation rule and self-thinking mechanism which must be quoted; Collecting a historical reasoning log and constructing a special knowledge base, wherein the historical reasoning log is subjected to structural processing, vectorized by adopting a text embedding model and stored in a vector database; and establishing a standardized interface with a large language model reasoning engine through a model context protocol, and supporting the mixed retrieval of intermediate representation vector accurate retrieval, semantic similarity retrieval and configurable weights.
3. The language model reasoning resource scheduling method based on multi-agent cooperation as set forth in claim 2, wherein the multi-round dynamic collaborative reasoning comprises, Distributing the subtasks and related contexts after task decomposition to each auxiliary agent by the main control agent; Each auxiliary agent generates respective reasoning intermediate results based on respective system prompts and a special knowledge base; After each auxiliary agent completes the reasoning of the round, the system sequentially calculates the cognitive load index, the group reasoning divergence degree, the information entropy value and the weighted average confidence degree of each agent; according to the cognitive load index obtained by calculation of the round, the divergence threshold value, the entropy threshold value and the confidence threshold value are adjusted in real time; And when the preset strategy switching condition is met, switching to a rapid aggregation mode, executing a weighted aggregation mechanism by the main control agent based on the function weight and the confidence weight to determine a staged reasoning conclusion, otherwise, continuing to execute multi-agent deep collaborative reasoning until the next round.
4. The language model reasoning resource scheduling method based on multi-agent cooperation as set forth in claim 3, wherein the cognitive load index is calculated according to the formula, ; Wherein, the For the current agent CPU or memory occupancy, An upper limit of resources is preset for the system, For the current response delay to be the same, For the historical average response delay to be a function of, For decision complexity scoring, a predefined quantization rule based on the number of tokens of the prompt words, the depth of the output structure and the complexity of branch judgment is adopted for determination, For the maximum of the decision complexity, For the current interaction frequency, For a historical average of the frequency of interaction, And (3) determining experience coefficients for regression fitting through historical collaborative log data in advance.
5. The method for reasoning resources scheduling based on multi-agent cooperation as set forth in claim 4, wherein dynamically adjusting the convergence criterion based on the cognitive load index comprises, Dynamically adjusting a plurality of thresholds for judging convergence of group reasoning opinions according to the cognitive load index CLI obtained by calculation of the round, wherein the thresholds comprise a divergence degree threshold, an entropy value threshold and a confidence degree threshold, the dynamic adjustment comprises, The divergence threshold value and the entropy threshold value are adjusted to correspondingly increase along with the increase of the cognitive load index so as to control the tolerance level of reasoning uncertainty and avoid premature cut-off; the confidence coefficient threshold is adjusted to correspondingly decrease along with the increase of the cognitive load index so as to realize the adaptive acceleration of the reasoning process and reduce invalid rounds; And comparing the dynamically adjusted thresholds with the actual divergence degree, the entropy value and the weighted average confidence degree obtained by the calculation of the round respectively to judge whether the current reasoning round meets the convergence condition.
6. The language model reasoning resource scheduling method based on multi-agent cooperation of claim 1, further comprising, Dynamically limiting the maximum number of tokens which each auxiliary agent can generate in the reasoning of the round according to the cognitive load index CLI; When the entropy of the group information exceeds the threshold value of the entropy value after dynamic adjustment, triggering the main control agent to forcedly prune the low confidence reasoning branch, and reducing invalid reasoning paths.
7. The language model reasoning resource scheduling method based on multi-agent cooperation of claim 6, wherein the output optimized reasoning result sequence comprises, Summarizing the intermediate result, the quantized index change track and the final convergence state of each round of reasoning by the main control intelligent agent; Adopting a predefined weighted aggregation model to carry out final selection or fusion on candidate result sequences output by multiple agents; And outputting a final reasoning result sequence after resource optimization, and recording the resource consumption index of the corresponding reasoning path.
8. The language model reasoning resource scheduling system based on multi-agent cooperation is characterized by comprising a task decomposition module, a multi-agent cooperation reasoning module, a threshold self-adaption module and a decision output module; The task decomposition module is used for receiving task input to be processed, and decomposing and distributing the task to a plurality of auxiliary agents after the primary semantic analysis is carried out on the task input by the main control agent; the multi-agent collaborative reasoning module is used for controlling the plurality of auxiliary agents to execute multiple rounds of dynamic collaborative reasoning from different reasoning function perspectives based on respective system prompts and a special knowledge base; the threshold self-adaption module is used for calculating the cognitive load index of each auxiliary intelligent agent in real time after each round of reasoning is finished, and dynamically adjusting a convergence judgment threshold according to the cognitive load index; And the decision output module is used for summarizing the intermediate result of the multiple rounds of reasoning and the change track of the quantization index by the main control intelligent agent when the convergence condition after the dynamic adjustment is met or the preset maximum reasoning round is reached, executing the final aggregation decision and outputting the optimized reasoning result sequence.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program executable on the processor, and the processor when executing the program implements the steps of the multi-agent collaboration-based language model inference resource scheduling method of any one of claims 1-7.
10. A storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the multi-agent collaboration-based language model inference resource scheduling method as claimed in any one of claims 1 to 7.

Description

Language model reasoning resource scheduling method and system based on multi-agent collaboration Technical Field The invention belongs to the technical field of artificial intelligence, and particularly relates to a language model reasoning resource scheduling method and system based on multi-agent cooperation. Background In complex reasoning tasks such as multi-step mathematical solutions, logical reasoning, long sequence planning, etc., large Language Models (LLMs) have demonstrated powerful natural language understanding and generating capabilities. However, the conventional single LLM architecture still faces significant challenges when dealing with multi-step and multi-view reasoning, namely, a single view and a knowledge blind area, which cause deviation or incomplete logic of a reasoning path, an autoregressive generation mechanism which easily generates redundant token and illusions, which cause waste of computing resources, a lack of dynamic self-adaptive mechanism, and when task complexity is increased, cognitive overload is easy to occur, so that degradation and unstable convergence of the reasoning path and excessively high peak resource occupation are caused, and a long reasoning delay, particularly, under a long context or multi-round iteration scene, the end-to-end delay is obviously increased, which limits the practical deployment efficiency. In recent years, although some researches introduce methods such as Chain-of-Thought (Chain-of-thought) prompt, self-consistency sampling or external tool enhancement, etc., the reasoning capability of a single model is improved to a certain extent, the schemes still have difficulty in thoroughly overcoming the inherent limitations, and particularly in a resource-limited or high-concurrency environment, efficient and stable multi-step reasoning is difficult to realize. Therefore, a technical scheme capable of simulating multi-role collaboration, having special knowledge support and realizing dynamic resource scheduling and convergence control through cognitive load sensing is needed, so that redundant calculation is remarkably reduced, peak resource occupation is optimized, overall reasoning time delay is shortened, and deployment feasibility and system stability of a large language model in complex tasks are improved on the premise of guaranteeing reasoning quality. Disclosure of Invention The invention aims to provide a language model reasoning resource scheduling method and system based on multi-agent cooperation, which are used for solving the technical problems of over-high resource peak value, over-redundant token generation, prolonged reasoning time delay and insufficient convergence stability caused by unbalanced cognitive load of each agent in the existing large language model reasoning process driven by multi-agent cooperation. In order to achieve one of the above objects, an embodiment of the present invention provides a language model reasoning resource scheduling method based on multi-agent collaboration, the method comprising, Receiving task input to be processed, and after primary semantic analysis is carried out on the task input by a main control intelligent agent, decomposing and distributing the task to a plurality of auxiliary intelligent agents; Controlling the plurality of auxiliary agents to execute multiple rounds of dynamic collaborative reasoning from different reasoning function perspectives based on respective system prompts and a special knowledge base; After each round of reasoning is finished, calculating the cognitive load index of each auxiliary agent in real time, and dynamically adjusting a convergence judgment threshold according to the cognitive load index; When the convergence condition after the dynamic adjustment is met or the preset maximum reasoning round is reached, the main control agent gathers the intermediate result of the multi-round reasoning and the change track of the quantization index, executes the final aggregation decision, and outputs the optimized reasoning result sequence. As a further improvement of an embodiment of the present invention, the method further comprises, before executing the method, an initialization configuration step, the initialization configuration step comprising, Configuring a main control intelligent agent and a plurality of auxiliary intelligent agents, wherein the main control intelligent agent is responsible for overall task decomposition, resource coordination and final result aggregation, and the plurality of auxiliary intelligent agents respectively correspond to different reasoning function perspectives; respectively constructing a system prompt for the main control intelligent agent and each auxiliary intelligent agent, wherein the system prompt at least comprises function positioning, mandatory structured output format constraint, imperative fact prohibition, and verifiable intermediate representation rule and self-thinking mechanism which must be quoted; Collectin