CN-122019102-A - Multi-mode intelligent agent scheduling method and system

CN122019102ACN 122019102 ACN122019102 ACN 122019102ACN-122019102-A

Abstract

The invention relates to the technical field of artificial intelligent agents, in particular to a multi-mode intelligent agent scheduling method and a multi-mode intelligent agent scheduling system, wherein the method firstly responds to the receiving of multi-mode tasks, carries out cross-mode feature coding on the multi-mode tasks and outputs semantic complexity vectors; the method comprises the steps of inputting a semantic complexity vector into a preset resource demand prediction algorithm, calculating to obtain a resource demand prediction result, analyzing a service level protocol carried by a multi-mode task, calculating the dynamic priority of the multi-mode task according to the current load state of a system, the service level protocol and the resource demand prediction result, generating scheduling instructions for agents of different modes according to the dynamic priority, and finally triggering the corresponding mode agents to execute reasoning tasks according to the scheduling instructions.

Inventors

DING ZIJIAN
DING JUNWEI
CHEN DEPIN

Assignees

钛动科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260206

Claims (10)

1. A multi-modal intelligent agent scheduling method, comprising: s10, responding to the receiving of the multi-mode task, performing cross-mode feature coding on the multi-mode task, and outputting a semantic complexity vector; S20, analyzing a service level protocol carried by the multi-modal task, calculating the dynamic priority of the multi-modal task according to the current load state of the system, the service level protocol and the resource demand prediction result, and generating scheduling instructions for agents of different modalities according to the dynamic priority; S30, triggering a corresponding modal agent to execute an reasoning task according to the scheduling instruction; In any one of the execution process of step S10-step S30, the method further includes: Context tokens are transferred through cross-agent shared memory to maintain semantic consistency, and modal health is monitored in real time to perform fusing or self-healing operations.
2. The multi-modal intelligent agent scheduling method according to claim 1, wherein in step S10, cross-modal feature encoding is performed on the multi-modal task, and a semantic complexity vector is output, specifically: Respectively extracting feature vectors of video, audio and text modes by adopting a preset mode encoder, and splicing and fusing the feature vectors into unified fusion features; and mapping the fusion characteristics into a multi-dimensional complexity vector by adopting a weight matrix after pre-training.
3. The multi-modal intelligent agent scheduling method of claim 1, wherein the current load state of the system includes at least CPU occupancy, GPU occupancy, and queue occupancy.
4. A multi-modal intelligent agent scheduling method as claimed in claim 3 wherein step S20 comprises: Analyzing the service level protocol to obtain service criticality, maximum tolerant delay and minimum accuracy requirements of the multi-mode task; calculating to obtain a global load index according to the CPU occupancy rate, the GPU occupancy rate and the queue occupancy rate; Placing the business criticality, the maximum tolerant delay, the minimum accuracy requirement and the global load index into a preset priority algorithm, and calculating to obtain dynamic priority; And when the global load index is higher than a load threshold and the dynamic priority is higher than a priority execution threshold, triggering to generate a preemptive scheduling instruction so as to execute the multi-mode task preferentially.
5. The multi-modal intelligent agent scheduling method of claim 4, wherein the priority algorithm is specifically: In the formula, The task priority is indicated as being a function of the task priority, The criticality of the service is indicated, Indicating that the delay is to be maximally tolerated, Indicating that the task has been waiting or processing time, Representing the global load index (loadindex), A first weight coefficient is represented and a second weight coefficient is represented, A second weight coefficient is represented and is used to represent, Representing the third weight coefficient, exp () represents an exponential function.
6. The multi-modal intelligent agent scheduling method according to claim 1, wherein the context token is transferred by sharing memory across agents, specifically: generating a corresponding context token in a multi-mode task receiving stage, wherein the context token comprises a semantic embedded vector, a survival time and a dependent task set; When the triggered mode agent in the step S30 starts to execute, reading the context token from the shared memory, and calculating the similarity between the current task feature and the semantic embedded vector to extract the associated context; after the agent completes reasoning, the semantic embedded vector in the context token is weighted and updated by using the current output result, and the updated token is written back into the shared memory.
7. The method for multi-modal intelligent agent scheduling according to claim 6, wherein the step of maintaining semantic consistency is specifically: Calculating semantic drift degree between the initial semantic embedded vector of the context token and the weighted updated semantic embedded vector; when the semantic drift degree exceeds a preset safety threshold, triggering a correction mechanism, and performing callback correction on the current semantic embedded vector by introducing correction factors.
8. The multi-modal intelligent agent scheduling method of claim 1, wherein the modal health is monitored in real time to perform fusing, in particular: Respectively carrying out weighted summation on the reasoning success rate, the delay deviation and the reasoning error rate of different modal agents, and calculating to obtain the modal health degree of the corresponding modal agent; If the modal health of any modal agent is lower than the fusing threshold, triggering fusing to cut off the flow input of the modal agent.
9. The multi-mode intelligent agent scheduling method according to claim 1, wherein the specific steps of the self-healing operation are as follows: And if the service criticality is lower than the criticality threshold, calling a preset lightweight model to take over the flow input of the fused modal proxy, otherwise, switching the artificial channel to take over.
10. A multi-modal intelligent agent scheduling system comprising a processor and a memory, said memory storing computer program instructions that when executed by said processor implement the multi-modal intelligent agent scheduling method of any one of claims 1-9.

Description

Multi-mode intelligent agent scheduling method and system Technical Field The invention relates to the technical field of artificial intelligence agents. More particularly, the invention relates to a multi-mode intelligent agent scheduling method and system. Background Under the explosive growth background of AIGC (artificial intelligence generation content) and multi-mode AI application, the intelligent agent deployed by an enterprise has been upgraded from a single mode to a multi-mode collaboration framework integrating vision, hearing, semantics and the like, and the multi-mode collaboration framework can support a plurality of application fields such as video generation, CT image analysis or customer service image-text docking and the like. The existing multi-mode proxy method mainly adopts the following scheme to realize the application in the field: firstly, for resource allocation, the existing scheme mainly pre-allocates fixed computing resources (such as GPU slices) for tasks of different modes to execute reasoning tasks, the mode cannot sense dynamic changes of the tasks, text service time delay is 300% when new requests are suddenly added, and when 100 '1 second voice-to-text' tasks and 1 '4K medical image segmentation' tasks arrive simultaneously, the average system delay of the NVIDIA Triton reasoning server is increased from 200ms to 4.7s, so that the problem that service quality is lower due to modal resource contention is solved. Secondly, aiming at the execution strategy of the task, the multi-modal task can be split into a plurality of relatively independent subtasks, and cross-modal upper association is lost, for example, in an intelligent customer service scene, a user uploads a ' failure photo with an invoice ', and although an image recognition agent already recognizes that the equipment model is A ', a text agent cannot acquire the context of the equipment model, the client is still required to input the equipment model, so that the task completion rate and efficiency are greatly reduced, and the quality of the service is poor for the user. Thirdly, aiming at the emergency strategy of agent timeout, the existing scheme generally constructs an independent processing pipeline for each mode, when a single-mode agent of the existing scheme fails, cascade breakdown is easy to occur, for example, a bank customer service system in 2024 causes blocking of other cross-mode task queues due to timeout of a voice recognition agent, system shutdown is caused for 15 minutes, and economic loss is over 2300 ten thousand. In summary, the conventional proxy service method has the technical problems of low quality and poor reliability. Disclosure of Invention The invention discloses a multi-mode intelligent agent scheduling method and a multi-mode intelligent agent scheduling system for solving the technical problems of low quality and poor reliability. In a first aspect, the invention discloses a multi-mode intelligent agent scheduling method, which comprises the following steps: s10, performing cross-modal feature coding on the multi-modal task in response to the receiving of the multi-modal task, and outputting a semantic complexity vector; S20, analyzing a service level protocol carried by the multi-mode task, calculating the dynamic priority of the multi-mode task according to the current load state of the system, the service level protocol and a resource demand prediction result, and generating scheduling instructions for agents of different modes according to the dynamic priority; S30, triggering a corresponding modal agent to execute an reasoning task according to the scheduling instruction; in any one of the execution process of step S10-step S30, the method further includes: Context tokens are transferred through cross-agent shared memory to maintain semantic consistency, and modal health is monitored in real time to perform fusing or self-healing operations. Preferably, in step S10, cross-modal feature encoding is performed on the multi-modal task, and a semantic complexity vector is output, which specifically includes: Respectively extracting feature vectors of video, audio and text modes by adopting a preset mode encoder, and splicing and fusing the feature vectors into unified fusion features; And mapping the fusion features into multi-dimensional complexity vectors by adopting the weight matrix after pre-training. Preferably, the current load state of the system at least comprises a CPU occupancy rate, a GPU occupancy rate and a queue occupancy rate. Preferably, step S20 includes: Analyzing the service level protocol to obtain service criticality, maximum tolerance delay and minimum accuracy requirements of the multi-mode task; Calculating to obtain a global load index according to the CPU occupancy rate, the GPU occupancy rate and the queue occupancy rate; placing the business criticality, the maximum tolerant delay, the minimum accuracy requirement and the global load index into a pres