CN-121998092-A - Large-model agent plan multiplexing system, method and equipment based on core task intention analysis

CN121998092ACN 121998092 ACN121998092 ACN 121998092ACN-121998092-A

Abstract

The invention discloses a large-model agent plan multiplexing system based on core task intention analysis, which comprises a core task intention analysis module, a matching and triggering module, an execution plan making module, an action executor, a plan asynchronous collection module, a plan template construction module and a plan template database, wherein the core task intention analysis module converts a user input prompt word into a core task intention keyword, the matching and triggering module adopts the core task intention keyword to search a plan cache of pairing information of the stored keyword and the plan template, the execution plan making module forms an executable plan scheme based on two scenes of cache hit or miss, the action executor analyzes a plan and gradually executes the plan to generate an end user answer, the plan asynchronous collection module screens and structurally stores effective iterative plan data, and the plan template construction module generates a standardized plan template based on the plan cached in the database. The invention obviously improves the cache matching precision and the cache multiplexing efficiency.

Inventors

DU YUXUAN
LI WENXIN
WANG YUBO

Assignees

天津大学

Dates

Publication Date: 20260508
Application Date: 20260127

Claims (10)

1. The large model agent plan multiplexing system based on the core task intention analysis is characterized by comprising a core task intention analysis module, a matching and triggering module, an execution plan making module, an action executor, a plan asynchronous collection module, a plan template construction module and a plan template database; The system comprises a core task intention analysis module, a user intention analysis module and a user intention analysis module, wherein the core task intention analysis module is used for converting a user prompt word into a standardized and unique core task intention keyword; The matching and triggering module is used for searching the plan cache of the stored keyword and plan template pairing information by adopting the core task intention keyword, and triggering and starting the workflow of two scenes of corresponding cache hit or miss; The execution plan making module is used for integrating corresponding system prompt words, tool call information and user input prompt words to generate complete context based on cache hit or miss; under the cache hit scene, inputting the context into a small language planning model, and rendering and parameter supplementing the planning template by the small language planning model to form a personalized planning scheme capable of being directly executed; in the cache miss scene, inputting a context into a large language planning model, and generating a new execution plan scheme based on the context reasoning by the large language planning model; The action executor is used for resolving the plan and gradually executing the plan to generate an end user answer, and in the execution process, the action executor returns a result to update the context in real time by each step tool and feeds the updated context back to the large language model so as to further iterate and optimize the plan; the system comprises a planning asynchronous collection module, a planning template database, a core task intention keyword and a large language model, wherein the planning asynchronous collection module is used for continuously capturing iteration plans output by each round of the large language model through a low-intrusion data capturing method under the condition of not interfering with a LLM normal reasoning flow, screening and structuring and storing effective iteration plan data, storing all plans corresponding to the same user prompt word into the planning template database after associating, synchronously recording the associated prompt word and the core task intention keyword, and providing basic data support for the generation of a follow-up planning template; The plan template construction module is used for extracting general core logic to generate a standardized plan template based on a plan cached in the plan template database, and finally storing the standardized plan template in the form of key value pairs of core task intention and the plan template into the plan template database.
2. The large model agent plan multiplexing system based on core task intention analysis according to claim 1, wherein the plan template construction module comprises a template verification module and a template dynamic update mechanism module, and the template verification module comprises a template format verification module and a rendering effectiveness and result consistency double verification module; The template format verification module is used for verifying the structural integrity, grammar standardability and placeholder definition of the planning template, and ensuring that the template meets the preset format standard; The rendering effectiveness and result consistency dual verification module is used for sequentially performing template rendering on all user input prompt words corresponding to core task intention keywords in the planning template by the SLM, verifying whether the template after rendering can be normally analyzed and outputting an executable structured execution plan; comparing the consistency of the user answer generated by the plan template after the SLM rendering is executed with the user answer generated by the plan outputted by LLM reasoning in the execution cache, and storing the template into a plan template database formally only when the similarity of the user answer and the user answer reaches a preset threshold and the rendered plan template can be executed; The template dynamic updating mechanism module is used for automatically starting a template iteration process when the number of newly-added effective plans corresponding to the same core task intention triggers a preset threshold again, and upgrading iteration is carried out on the existing template by fusing optimization logic of the newly-added plans.
3. The large model agent plan multiplexing system based on core task intention analysis according to claim 1, wherein the matching and triggering module comprises a classification aggregation module, the classification aggregation module aggregates multiple rounds of effective plans according to core task intention classification, a standardized data set is provided for a subsequent plan template construction module, and when the number of the effective plans associated with the same core task intention keyword reaches a preset threshold value, the matching and triggering module triggers a template generation flow.
4. The large model agent plan multiplexing system based on core task intent resolution of claim 1, wherein the plan template construction module extracts generic reasoning logic and tool call frames from the valid iterative plan data collected by the plan asynchronous collection module, replaces dynamically changing content with standardized placeholders, and ensures that templates adapt to different presentation scenarios of similar tasks through parameter filling.
5. The large model agent plan multiplexing system based on core task intention analysis according to claim 1, wherein a semantic classifier and an error-checking screening module based on LLM are built in the plan asynchronous collecting module, the semantic classifier takes an anti-thinking reasoning process generated by LLM for tool return results as input, the plan steps are divided into two types of execution effectiveness and execution invalidity through semantic feature extraction and intention judgment, the error-checking screening module judges the correctness of core plan data of the round according to the classification result of the semantic classifier and screens the core plan data according to the method that if the classification result is effective in execution, the LLM confirms that the logic of the current step is correct and the execution result accords with the expectations, the core plan data of the round is reserved, and if the classification result is invalid in execution, namely the LLM anti-thinks identifies that the tool call parameters are wrong and the execution logic contradicts, the round plan data is directly removed.
6. A large model agent plan multiplexing method based on core task intention analysis using the large model agent plan multiplexing system based on core task intention analysis according to claim 1, characterized in that the method comprises the following method steps: Step 1, a core task intention analysis module is adopted, and a core task intention keyword is obtained by combining a natural language prompt word input by a user through a preset system prompt word; Step 2, the matching and triggering module adopts a core task intention keyword to search a plan cache stored in a plan template database, and judges whether keywords matched with the core task intention keyword and the plan template exist or not; Step 3A, an execution plan making module integrates a rendering system prompt word, a retrieved plan template, a tool calling method corresponding to the template and a user prompt word, generates a complete context and inputs a small language model; Step 3B, the execution plan making module integrates the rendering system prompt words, all the available tool calling information and the user input prompt words, generates a complete context and inputs a large language model, and after the large language model generates a new execution plan, the large language model is analyzed by an action executor and is gradually executed; the plan asynchronous collection module screens and stores effective iterative plan data in a structured manner in the iterative optimization planning process; The plan template construction module generates a standardized plan template based on the plan cached in the plan template database and stores the standardized plan template in the plan template database for subsequent reuse.
7. The large model agent plan multiplexing method based on core task intention analysis according to claim 6, wherein a plan template construction module is provided with a template verification module and a template dynamic update mechanism module, and the template verification module is provided with a template format verification module and a rendering effectiveness and result consistency double verification module; The template format verification module is used for verifying the structural integrity, grammar standardability and placeholder definition of the planning template, and ensuring that the template meets the preset format standard; The rendering effectiveness and result consistency dual verification module is used for sequentially performing template rendering on all user input prompt words corresponding to core task intention keywords in the planning template by the SLM, verifying whether the template after rendering can be normally analyzed and outputting an executable structured execution plan; comparing the consistency of the user answer generated by the plan template after the SLM rendering is executed with the user answer generated by the plan outputted by LLM reasoning in the execution cache, and storing the template into a plan template database formally only when the similarity of the user answer and the user answer reaches a preset threshold and the rendered plan template can be executed; The template dynamic updating mechanism module is used for automatically starting a template iteration process when the number of newly-added effective plans corresponding to the same core task intention triggers a preset threshold again, and integrating optimization logic of the newly-added plans to upgrade and iterate the existing template; sequentially adopting a template format verification module and a rendering validity and result consistency dual verification module to carry out the following multi-level correctness verification: the first stage, verifying the structural integrity, grammar standardability and placeholder definition of the template one by a template format verification module, ensuring that the template accords with a preset JSON format standard, and laying a foundation for subsequent SLM instantiation rendering and action execution from a bottom layer; The second stage, the dual verification module of rendering effectiveness and result consistency calls the SLM which is homologous to the cache hit scene, sequentially executes template rendering on all user input prompt words corresponding to core task intention keywords in a planning template database, verifies whether the template can be normally analyzed and outputs an executable structured execution plan, then compares the planning result generated by the SLM rendering with the cached LLM native reasoning output result, stores the template into the planning template database only when the similarity of the planning result and the template reaches a preset similarity threshold value, and dynamically calibrates the template generation quantity threshold value according to the number of unqualified samples if the similarity does not reach the similarity threshold value, and improves the generalization capability and the adaptation precision of the follow-up template by increasing the sample accumulation quantity; the plan template construction module adopts a quantization threshold control strategy, when the newly-added effective plan quantity corresponding to the same core task intention triggers a preset template generation quantity threshold, a template iteration process is automatically started, and an optimization logic of the newly-added plan is fused to upgrade and iterate the existing template, so that the template dynamically adapts to task scene change and tool calling rule update.
8. The large model agent plan multiplexing method based on core task intention analysis according to claim 7, wherein a template generation quantity threshold setting algorithm is dynamically adjusted based on task scene difficulty, task scene difficulty is quantized and graded, and the template generation quantity threshold is dynamically automatically increased and decreased along with increase and decrease of task scene difficulty level.
9. The large model agent plan multiplexing method based on core task intent resolution as recited in claim 7, wherein step 3B includes the sub-steps of: Step 3B-1, defining input of user inquiry, core task intention keywords, system prompt words and tool set, and defining output of end user answers; Step 3B-2, initializing large model reasoning context, planning cache and final user answer, wherein the iteration step number k=0; step 3B-3, the user inputs a prompt word, and k=k+1; Step 3B-4, the execution plan making module integrates the rendering system prompt words, all the available tool calling information and the prompt words input by the user to generate a complete context; step 3B-5, inputting the context into a large language model, and calling LLM reasoning by the large language model to generate a new execution plan; step 3B-6, the action executor analyzes the plan and executes step by step; Step 3B-7, the LLM judges whether the generation of the end user answer is finished or not, returns to step 3B-8 if the end user answer is not generated, and executes step 3B-9 if the end user answer is generated; Step 3B-8, the action executor feeds back the returned results of the step tools to the LLM, and the returned results of the step tools update the context in real time; step 3B-9, a plan asynchronous collection module acquires big model thinking, tool names and tool inputs, executes tools and returns tool results; and 3B-10, generating a standardized plan template by the plan template construction module based on the plan cached in the plan template database.
10. An apparatus for a large model agent plan multiplexing method based on core task intent resolution, comprising a memory and a processor, wherein the memory is configured to store a computer program, and the processor is configured to execute the computer program and implement the large model agent plan multiplexing method steps based on core task intent resolution as claimed in any one of claims 1 to 9 when the computer program is executed.

Description

Large-model agent plan multiplexing system, method and equipment based on core task intention analysis Technical Field The invention relates to an agent technology, in particular to a large model agent plan multiplexing system, method and equipment based on core task intention analysis. Background At present, with the rapid development of artificial intelligence technology, a generated large language model is paid attention to due to the excellent capability of the large language model in complex natural language processing tasks, and the intelligent agent technology taking the large language model as a core has excellent application potential in various fields of personal assistance, intelligent interaction, data processing and the like. However, though LLMs relies on mass pre-training knowledge reserves and has excellent natural language understanding and logical reasoning capabilities, the model knowledge is in a fixed state after the pre-training stage is finished, and real-time dynamic data in an information rapid iteration scene is difficult to capture, so that full play of reasoning performance is restricted. Therefore, the current mainstream intelligent body architecture is generally LLMs provided with tool calling capability, and the defects of a static knowledge system are overcome by expanding the perception and interaction boundaries of the model through tools. The working flow of the current intelligent agent can be disassembled into four core iteration steps, the steps are in loop-by-loop connection to form a closed loop, the first step is an initialization stage, an agent receives a task target sample input by a user and takes the task target sample as an initial context, the task target sample is transmitted into the LLM, a first reasoning process is triggered, the second step is a planning stage, the LLM carries out logic reasoning based on the current context (including task targets, historical interaction records and the like), the sub-targets of the current task and action types required to be executed are clear, a structured action instruction is generated, the third step is an execution stage, the agent implements the action instruction output by the LLM, if tool interaction (such as calling of a planning template database and the like) is involved, data interaction is completed with a corresponding tool entity, the fourth step is an observation and context updating stage, the agent collects feedback results (such as tool return data, environment state change and the like) after action execution, the observation information and the current reasoning and the action content are integrated, then the context is updated based on the observation results, task progress is judged, if the task targets are completed, a loop is terminated, otherwise, and the result is fed back to the user is triggered, and the next round of plan-stage is triggered to execute the loop. Although the workflow shows excellent problem solving capability in complex task scenes, the workflow is limited by the execution complexity of the workflow, and needs to frequently interact with external tools at the end side and local environments, so that cloud side inference cost generated in the operation process is quite high. In particular, the cost pressure core of the mechanism focuses on the planning stage, which commonly employs a test-time compute techniques (test-time compute techniques) computing technique, such as chain-of-thought (chain-of-thought reasoning), to guarantee the inference accuracy, and such techniques require performing multiple complete LLMs to complete the inference. High-frequency expensive LLM calls not only directly generate huge economic cost, but also indirectly aggravate service delay due to continuous end cloud data transmission (interactive data, reasoning instructions) and cloud computing occupation. To reduce the computational effort consumption and cost, cache optimization is a critical technology path to reduce the high frequency LLM call overhead. According to the difference between the cache object and the optimization logic, the existing cache optimization work is divided into two types of implementation modes of intermediate state multiplexing and final response multiplexing. The key logic of the scheme based on intermediate state multiplexing, such as SGLang, RAGCache, promptCache, is to cache key value pairs (KV pairs) generated in the LLM reasoning pre-filling (prefill) stage, and realize cross-request reuse through a prefix matching mechanism, so that repeated calculation of redundant KV pairs is avoided, and further, computational overhead and video memory occupation in the reasoning process are reduced. However, the buffer mechanism is difficult to fully adapt to the actual demands of an intelligent agent, firstly, prefix texts which are queried before and after are strictly consistent by the prefix matching mechanism, and in the application of the intelligent agent, real-time da