CN-122021918-A - Reliability-based agent reasoning method, system and storage medium

CN122021918ACN 122021918 ACN122021918 ACN 122021918ACN-122021918-A

Abstract

The invention discloses an agent reasoning method, system and storage medium based on reliability, which comprises the following steps of S1, processing text and image input by utilizing a multi-mode agent to generate a plurality of reasoning paths, wherein each reasoning path comprises a clue acquisition stage and a clue integration stage, S2, carrying out reliability assessment on the clue acquisition stage and the clue integration stage of each reasoning path, S3, screening the reasoning paths based on the reliability of the clue acquisition stage and the clue integration stage, filtering unreliable reasoning paths, S4, carrying out weighted voting on the screened reasoning paths based on the reliability, and outputting reasoning answers.

Inventors

PENG XI
LI HAOBIN
Yang Mouxing
YANG YUTONG

Assignees

四川大学

Dates

Publication Date: 20260512
Application Date: 20260210

Claims (5)

1. An agent reasoning method, system and storage medium based on reliability, wherein the method comprises the following steps: s1, processing text and image input by utilizing a multi-mode agent to generate a plurality of reasoning paths, wherein each reasoning path comprises a clue acquisition stage and a clue integration stage; s2, evaluating reliability of a clue acquisition stage and a clue integration stage of each reasoning path; S3, screening the reasoning paths based on the reliability of the clue acquisition stage and the clue integration stage, and filtering unreliable reasoning paths; s4, carrying out weighted voting based on reliability on the filtered reasoning paths, and outputting reasoning answers; The step S2 specifically includes: S21, calculating entropy of each word element in the reasoning path; s22, calculating the reliability of a clue acquisition stage and a clue integration stage respectively through the high-entropy lemma; the step S3 specifically includes: s31, determining a filtering threshold based on the reliability of the thread obtaining stage and the thread integrating stage; S32, filtering unreliable reasoning paths according to threshold value in a self-adaptive mode.
2. The method, system and storage medium for reasoning intelligent agent based on reliability according to claim 1, wherein in step S22, the reliability calculation mode of the thread obtaining stage and the thread integrating stage is: Wherein, the In order to obtain the clue, In order to achieve the thread integration stage, Stage of obtaining clues Is used for the reliability of the test result, Integration stage for clues Is used for the reliability of the test result, Is the number of high-entropy tokens, Stage of obtaining clues Has the highest value The indexing of the lemmas of the entropy, Integration stage for clues Has the highest value The indexing of the lemmas of the entropy, Is the first Entropy of individual tokens.
3. The reliability-based agent inference method, system and storage medium according to claim 1, wherein the filtering threshold calculation method of the thread obtaining stage and the thread integrating stage in the step S31 is as follows: Wherein, the In order to infer the path of the route, In order to obtain the clue, In order to achieve the thread integration stage, As a screening threshold for the thread acquisition stage, As a screening threshold for the thread integration stage, Stage of obtaining clues Is used for the reliability of the test result, Integration stage for clues Is used for the reliability of the test result, To be from small to large The reliability of the percentage is determined as the screening threshold, Is a set of inference paths used to estimate a screening threshold.
4. The reliability-based agent inference method, system and storage medium according to claim 1, wherein the filtering manner of the unreliable inference path in step S32 is as follows: Wherein, the For the filtered set of inference paths, In order to infer the path of the route, In order to obtain the clue, In order to achieve the thread integration stage, As a screening threshold for the thread acquisition stage, As a screening threshold for the thread integration stage, Stage of obtaining clues Is used for the reliability of the test result, Integration stage for clues Reliability of (3).
5. The reliability-based agent inference method, system and storage medium according to claim 1, wherein in the step S4, weighted voting based on reliability is performed on the filtered inference paths, and finally a voted inference answer is output, and specifically, a weighted voting manner based on reliability is as follows: Wherein, the In order to infer the path of the route, In order to obtain the clue, In order to achieve the thread integration stage, For the filtered set of inference paths, Is the answer to the weighted vote and, As a potential answer candidate, To infer paths Is used to determine the confidence weight of the (c) in the (c), If and only if the condition is satisfied for 1, To infer paths Is a combination of the answers to (a), Stage of obtaining clues Is used for the reliability of the test result, Integration stage for clues Is used for the reliability of the test result, The maximum of the two values is taken, As a function of the index of the values, As a function of absolute value.

Description

Reliability-based agent reasoning method, system and storage medium Technical Field The invention relates to the field of multi-mode agent reasoning, in particular to an agent reasoning method, system and storage medium based on reliability. Background With the development of artificial intelligence technology, multi-modal agents have been widely used in visual question-answering scenarios. The multi-modal thinking chain becomes one of important modes of the multi-modal intelligent agent for solving the complex visual question-answering task, and the mode can effectively improve the reasoning capability of the multi-modal intelligent agent. Specifically, the multi-modal agent will first find the visual information related to the task, then generate a text tool call instruction to obtain visual clues, then take the collected visual clues as the intermediate step of the thinking chain, and finally generate a text thinking chain to integrate clues and infer answers, and the reasoning process of the image-text interleaving is the multi-modal thinking chain. However, in the practical application scenario, due to the limited perception and understanding ability of the multi-modal intelligent agent, errors often exist in the thread acquisition process and the thread integration process of the multi-modal thinking chain, and these errors gradually accumulate in the subsequent thinking process, which eventually results in unreliable output answers. In order to solve the problem that the agent inference answers are unreliable, researchers have recently tried to study the selection of final answers by generating multiple inference paths and simply voting on the inference results. In short, the existing method generally defaults that all the reasoning answers have the same reliability, and considers that the answers with more occurrence numbers are correct answers according to the reliability, so that a voting mode of 'minority compliance with majority' is adopted to determine the final reasoning answer. However, the existing method ignores that both the error clue acquisition process and the error clue integration process in the multi-mode thinking chain can lead to unreliable reasoning answers, and the unreliable reasoning answers are directly brought into the final voting process, so that the multi-mode intelligent body reasoning accuracy is seriously reduced. In summary, multi-modal agent reasoning faces the problem of error accumulation caused by the error clue acquisition process and the error clue integration process. Disclosure of Invention Aiming at the defects in the prior art, the invention provides the reliability-based agent reasoning method, the reliability-based agent reasoning system and the storage medium, so that the problem of error accumulation in the multi-mode agent reasoning task is solved, and the reasoning accuracy of the multi-mode agent is improved. In order to achieve the above purpose, the invention adopts the following technical scheme: The scheme provides an agent reasoning method, system and storage medium based on reliability, which comprises the following steps: s1, processing text and image input by utilizing a multi-mode agent to generate a plurality of reasoning paths, wherein each reasoning path comprises a clue acquisition stage and a clue integration stage; s2, evaluating reliability of a clue acquisition stage and a clue integration stage of each reasoning path; S3, screening the reasoning paths based on the reliability of the clue acquisition stage and the clue integration stage, and filtering unreliable reasoning paths; and S4, carrying out weighted voting based on the reliability on the filtered reasoning paths, and outputting reasoning answers. The multi-mode intelligent agent reasoning path is divided into the clue acquisition and clue integration stages, reliability evaluation and screening are carried out on the two stages, error accumulation in the reasoning process is avoided, and meanwhile the accuracy of multi-mode intelligent agent reasoning is remarkably improved by utilizing a weighted voting mechanism based on reliability. Further, step S1 includes the following sub-steps: Further, the text question entered is The corresponding image isThe multi-modal agent first generates a generationEach inference path comprises a clue acquisition stage and a clue integration stage. Specifically, for a given inference pathThe multi-modal agent first generates a thread acquisition phase in text formAnd then follow the tool call instruction in the thread acquisition phase, obtain visual threads from the image, Wherein, the To infer pathsIs a visual cue of (a),In order for the visual cue retrieval tool to be,The thread acquisition stage. In the process of obtaining visual cluesThereafter, the multimodal agent generates a thread integration phase in the form of textAnd an answer is extracted therefrom, Wherein, the To infer pathsIs a combination of the answers to (a),As a function o