CN-120745849-B - Large-model multi-type tool collaborative reasoning method, system and equipment

CN120745849BCN 120745849 BCN120745849 BCN 120745849BCN-120745849-B

Abstract

The invention discloses a large model multi-type tool collaborative reasoning method, a system and equipment, which relate to the technical field of artificial intelligence and natural language processing and comprise the steps of calculating the entropy value of a current reasoning node after each time of calling a reasoning tool according to an original reasoning path, determining the entropy change amount according to the entropy change amount, and determining a high token entropy node according to the comparison of the entropy change amount and a threshold value; when the next inference tool call is carried out at the high token entropy node, randomly calling an inference tool different from the original inference path and continuing to execute the inference process to generate a new inference path, taking the original inference path and all the generated new inference paths as a training set, and training the big model based on a reinforcement learning algorithm. The self-adaptive exploration and reinforcement learning strategy based on token entropy is designed, so that the intelligent promotion of the inference tool selection strategy of the large model under the uncertain condition is realized, and the inference accuracy of a final answer is improved.

Inventors

SONG GANG
LIU JINTONG
ZHENG WEIBO
LU KUAN
DING YIFAN
HAN JIAYI
DU HONGWEI
LU PENGCHENG
ZHANG SHILIANG
LIU JIQIAO

Assignees

浪潮通用软件有限公司

Dates

Publication Date: 20260505
Application Date: 20250905

Claims (6)

1. A large model multi-type tool collaborative reasoning method, comprising: For the original reasoning path generated by the big model on the problem of natural language query, calculating the front of the reasoning node generated by the big model after calling the reasoning tool each time Entropy average values of the token fragments are used as entropy values of current reasoning nodes; Determining the entropy change amount of the current reasoning node according to the entropy values of the current reasoning node, the previous reasoning node and the initial reasoning node, and determining a high token entropy node according to the comparison of the entropy change amount and a set entropy change threshold value; When the next time of inference tool call is carried out at the high token entropy node, randomly calling an inference tool different from the original inference path and continuously executing an inference process to generate a new inference path, and continuously calculating the entropy value of each inference node for the new inference path until no high token entropy node exists; taking the original reasoning paths and all the generated new reasoning paths as a training set, and training the large model based on a reinforcement learning algorithm; generating a final answer by adopting a trained large model of the problem to be processed; For each inference path generated for a large model for a problem, for the first The secondary reasoning tool invoking stage calculates the front of the reasoning node generated by the big model Entropy mean of individual token fragments Taking the entropy value as the entropy value of the current reasoning node; At the same time consider the entropy value of the current reasoning node Entropy value relative to last inference node And entropy value of initial inference node Defining the entropy change of the current inference node The method comprises the following steps: ; Wherein, the Entropy value of current reasoning node; Entropy value of the last reasoning node; Entropy value of the initial reasoning node; is the dictionary size; The process of comparing the entropy change amount with the set entropy change threshold value includes: If the entropy change amount of the current reasoning node is smaller than the entropy change threshold value, a multi-type tool calling strategy of the original reasoning path is maintained; if the entropy change amount of the current reasoning node is larger than or equal to the entropy change threshold value, judging that the current reasoning node is a high token entropy node; The process of training the large model based on the reinforcement learning algorithm is modeled as follows: ; Wherein, the A mathematical function is a function of the desirability, In order to be a number of packets, A result is generated for the query and, In order to limit the scope of the variation, Is clip range parameter; is a group relative advantage; represents a KL canonical term; Is the importance sampling ratio; learning parameters for the model; To the problem of The process modeling based on multi-type tool collaborative reasoning by adopting a large model is as follows: ; Wherein the factor Representing the reasoning process of the multi-type tool call, Representing a chain of thinking The number of token in (a) is, Is the position Is selected from the group consisting of a token, Representing the position All previous token; Representation-introducing inference tool set Model instructions of (2); Representing the position Feedback of all previous history call reasoning tools, factors The answer generation process is represented by the following, Representing answers Is used for the number of token of (a), Is the position Is used for generating a result of the model of (a), Is the position Previous model histories generate results.
2. The large model multiple type tool collaborative reasoning method of claim 1 wherein the set of reasoning tools Including local corpus retrieval tools, web online retrieval tools, and code tools.
3. The collaborative reasoning method of a large model multi-type tool as claimed in claim 1, wherein in the reasoning process, the content of the thinking process is filled in the < think > identifier, the operation of calling the multi-type tool is filled in the < tool > identifier, which contains the reasoning tool name and the reasoning tool parameter, the returned result of calling the multi-type tool is filled in the < tool_result > identifier as the context information of the following reasoning step, and the reasoning is ended automatically until the maximum setting number called by the reasoning tool is iterated or the large model is iterated, at this time, the reasoning result is filled in the < answer > identifier as the final answer.
4. The large model multiple type tool collaborative reasoning method of claim 1, wherein entropy value of each token segment The method comprises the following steps: ; Wherein, the For the size of the dictionary to be the same, Is the position A token probability; Is the position The j token probability in the corresponding dictionary; Expressed in terms of Is a large model of the learnable parameters.
5. A large model multi-type tool collaborative reasoning system, comprising: an entropy calculation module configured to calculate, for an original inference path generated by a large model for a problem of natural language query, a front of inference nodes generated by the large model after each invocation of the inference tool Entropy average values of the token fragments are used as entropy values of current reasoning nodes; the entropy change judging module is configured to determine the entropy change amount of the current reasoning node according to the entropy values of the current reasoning node, the previous reasoning node and the initial reasoning node, and determine the high token entropy node according to the comparison of the entropy change amount and the set entropy change threshold; the self-adaptive exploration module is configured to randomly call an inference tool different from the original inference path and continue to execute the inference process when the next inference tool call is carried out at the high token entropy node, a new inference path is generated, and the entropy value of each inference node is continuously calculated for the new inference path until the high token entropy node is not present; The training module is configured to train the large model based on the reinforcement learning algorithm by taking the original reasoning paths and all the generated new reasoning paths as a training set; the reasoning module is configured to generate a final answer by adopting a trained large model of the problem to be processed; For each inference path generated for a large model for a problem, for the first The secondary reasoning tool invoking stage calculates the front of the reasoning node generated by the big model Entropy mean of individual token fragments Taking the entropy value as the entropy value of the current reasoning node; At the same time consider the entropy value of the current reasoning node Entropy value relative to last inference node And entropy value of initial inference node Defining the entropy change of the current inference node The method comprises the following steps: ; Wherein, the Entropy value of current reasoning node; Entropy value of the last reasoning node; Entropy value of the initial reasoning node; is the dictionary size; The process of comparing the entropy change amount with the set entropy change threshold value includes: If the entropy change amount of the current reasoning node is smaller than the entropy change threshold value, a multi-type tool calling strategy of the original reasoning path is maintained; if the entropy change amount of the current reasoning node is larger than or equal to the entropy change threshold value, judging that the current reasoning node is a high token entropy node; The process of training the large model based on the reinforcement learning algorithm is modeled as follows: ; Wherein, the A mathematical function is a function of the desirability, In order to be a number of packets, A result is generated for the query and, In order to limit the scope of the variation, Is clip range parameter; is a group relative advantage; represents a KL canonical term; Is the importance sampling ratio; learning parameters for the model; To the problem of The process modeling based on multi-type tool collaborative reasoning by adopting a large model is as follows: ; Wherein the factor Representing the reasoning process of the multi-type tool call, Representing a chain of thinking The number of token in (a) is, Is the position Is selected from the group consisting of a token, Representing the position All previous token; Representation-introducing inference tool set Model instructions of (2); Representing the position Feedback of all previous history call reasoning tools, factors The answer generation process is represented by the following, Representing answers Is used for the number of token of (a), Is the position Is used for generating a result of the model of (a), Is the position Previous model histories generate results.
6. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of any one of claims 1-4.

Description

Large-model multi-type tool collaborative reasoning method, system and equipment Technical Field The invention relates to the technical field of artificial intelligence and natural language processing, in particular to a large-model multi-type tool collaborative reasoning method, a system and equipment. Background At present, in order to solve the open complex reasoning task, a large language model (Large Language Model, LLM) is deeply fused with an external reasoning tool, and when the large language model calls the external reasoning tool to execute the complex reasoning task, the problem that a reasoning path is single, a decision is stiff, and particularly a sub-optimal path is easy to select at a decision node exists. Agent reinforcement learning has become a key training paradigm for achieving dynamic interactions of large language models with environments. The mainstream method mostly adopts an optimization algorithm based on an inference path, such as GRPO (Group Relative Policy Optimization, group relative strategy optimization), and the like, and rewards according to a final result by sampling the inference path called by the LLM inference tool. However, this model has significant limitations, namely considering reasoning as a whole, ignoring the fine-grained decisions of the large model in each reasoning tool invocation step. Studies have shown that each tool feedback introduces a high degree of uncertainty, expressed as a high token entropy. The traditional method is insufficient in exploration on the key nodes due to the fact that comparison of complete paths is concerned excessively, and diversified inference tool use strategies are difficult to form, so that the optimization potential of model inference paths and the robustness of the final solution problem are limited. Disclosure of Invention In order to solve the problems, the invention provides a large-model multi-type tool collaborative reasoning method, a large-model multi-type tool collaborative reasoning system and large-model multi-type tool collaborative reasoning equipment, and designs a token entropy-based self-adaptive exploration and reinforcement learning strategy, so that the intelligent promotion of a large-model reasoning tool selection strategy under the uncertain condition is realized, and the reasoning accuracy of a final answer is improved. In order to achieve the above purpose, the present invention adopts the following technical scheme: in a first aspect, the present invention provides a large model multi-type tool collaborative reasoning method, including: aiming at the original reasoning path generated by the large model for the problem, calculating the front of the reasoning node generated by the large model after calling the reasoning tool each time Entropy average values of the token fragments are used as entropy values of current reasoning nodes; Determining the entropy change amount of the current reasoning node according to the entropy values of the current reasoning node, the previous reasoning node and the initial reasoning node, and determining a high token entropy node according to the comparison of the entropy change amount and a set entropy change threshold value; When the next time of inference tool call is carried out at the high token entropy node, randomly calling an inference tool different from the original inference path and continuously executing an inference process to generate a new inference path, and continuously calculating the entropy value of each inference node for the new inference path until no high token entropy node exists; taking the original reasoning paths and all the generated new reasoning paths as a training set, and training the large model based on a reinforcement learning algorithm; and generating a final answer by adopting the trained large model of the problem to be processed. As an alternative embodiment, the problem is solvedThe process modeling based on multi-type tool collaborative reasoning by adopting a large model is as follows: ; Wherein the factor Representing the reasoning process of the multi-type tool call,Representing a chain of thinkingThe number of token in (a) is,Is the positionIs selected from the group consisting of a token,Representing the positionAll previous token; Representation-introducing inference tool set Model instructions of (2); Representing the position Feedback of all previous history call reasoning tools, factorsThe answer generation process is represented by the following,Representing answersIs used for the number of token of (a),Is the positionIs used for generating a result of the model of (a),Is the positionPrevious model histories generate results. As an alternative embodiment, the inference tool setIncluding local corpus retrieval tools, web online retrieval tools, and code tools. As an alternative implementation mode, in the reasoning process, the thinking process content is filled in the < think > identifier, the operation of calling th