CN-122021894-A - Inference processing method, inference processing device, computer equipment and storage medium

CN122021894ACN 122021894 ACN122021894 ACN 122021894ACN-122021894-A

Abstract

The present application relates to an inference processing method, an inference processing apparatus, a computer device, a storage medium and a computer program product. The method comprises the steps of obtaining an inference request, conducting inference based on the inference request, obtaining thinking content generated in the inference process, compressing the thinking content to obtain compressed thinking content when the thinking content meets preset conditions for triggering compression, and conducting inference based on the compressed thinking content to obtain an inference result of the inference request. By adopting the method, the reasoning accuracy can be improved.

Inventors

CUI KAIYUAN

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260512
Application Date: 20260121

Claims (16)

1. A method of reasoning processing, the method comprising: Acquiring an reasoning request, and reasoning based on the reasoning request; Acquiring thinking contents generated in the reasoning process, and compressing the thinking contents when the thinking contents meet preset conditions for triggering compression to obtain compressed thinking contents; and continuing to perform reasoning based on the compressed thinking content to obtain a reasoning result of the reasoning request.
2. The method according to claim 1, wherein when the content meets a preset condition for triggering compression, compressing the content to obtain compressed content, including: determining a content length of the thought content; and when the content length exceeds a thinking length threshold, compressing the thinking content to obtain compressed thinking content.
3. The method according to claim 2, wherein the method further comprises: determining a length scale and a context processing length for the inference request; Determining a first length threshold according to the length proportion and the context processing length; The thought length threshold is determined based on the first length threshold and a preset second length threshold.
4. The method according to claim 1, wherein when the content meets a preset condition for triggering compression, compressing the content to obtain compressed content, including: And when the thinking content comprises a target type mark triggering compression, compressing the thinking content to obtain compressed thinking content.
5. The method according to claim 1, wherein the compressing the content to obtain compressed content comprises: Acquiring compression prompt information for guiding execution of compression; And generating compressed thinking content according to the compressed prompt information and the thinking content.
6. The method of claim 5, wherein the obtaining compression hint information for directing compression to be performed comprises: Performing recognition analysis on the thinking content to obtain an analysis result; and generating the compression prompt information based on the analysis result and a preset compression parameter.
7. The method according to claim 1, wherein when the content meets a preset condition for triggering compression, compressing the content to obtain compressed content, including: obtaining compression configuration information, wherein the compression configuration information comprises preset compression parameters and preset conditions for triggering compression; And when the thinking content meets the preset condition for triggering compression, compressing the thinking content according to the preset compression parameter to obtain compressed thinking content.
8. The method according to claim 1, wherein said continuing to infer based on said compressed thinking content, obtaining an inference result of said inference request, comprises: Acquiring verification prompt information for guiding to execute verification; Verifying the compressed thinking content based on the verification prompt information to obtain a verification result; and when the verification result is that verification is passed, continuing to perform reasoning based on the compressed thinking content until the reasoning is finished, and obtaining a reasoning result aiming at the reasoning request.
9. The method according to claim 1, wherein said continuing to infer based on said compressed thinking content, obtaining an inference result of said inference request, comprises: And continuing to perform reasoning based on the compressed thinking content, and returning to the step of acquiring the thinking content generated in the reasoning process until the reasoning is finished, so as to obtain a reasoning result of the reasoning request.
10. The method according to claim 1, wherein the method further comprises: responding to a thinking intervention operation, and acquiring thinking intervention information corresponding to the thinking intervention operation; reasoning is performed based on the thought intervention information and the reasoning request.
11. The method according to claim 1, wherein the method further comprises: When the reasoning for the reasoning request is finished, compressing the thinking content of which the reasoning is finished to obtain target thinking content; And generating an reasoning result aiming at the reasoning request based on the target thinking content.
12. The method according to any one of claims 1 to 11, wherein the inference processing method is implemented by an inference model, and the training step of the inference model includes: Acquiring sample data, wherein the sample data at least comprises at least one of first sample data and second sample data, the first sample data carries a target type mark, and the second sample data comprises pre-compression thinking content and compressed thinking content obtained by compressing the pre-compression thinking content; Training at least two stages based on the sample data to obtain the reasoning model; And when the training in the stage does not belong to the training in the final stage, taking the target model as the training model to be trained in the next stage to perform the training in the next stage until the target model trained in the final stage is obtained, and obtaining the inference model according to the target model trained in the final stage.
13. An inference processing apparatus, characterized in that the apparatus comprises: the reasoning triggering module is used for acquiring a reasoning request and reasoning based on the reasoning request; The compression processing module is used for acquiring the thinking content generated in the reasoning process, and compressing the thinking content to obtain compressed thinking content when the thinking content meets the preset condition for triggering compression; and the reasoning result obtaining module is used for continuing to reason based on the compressed thinking content to obtain the reasoning result of the reasoning request.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 12 when the computer program is executed.
15. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 12.
16. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 12.

Description

Inference processing method, inference processing device, computer equipment and storage medium Technical Field The present application relates to the field of computer technology, and in particular, to an inference processing method, an inference processing apparatus, a computer device, a storage medium, and a computer program product. Background With the development of computer technology, large language models have made remarkable progress in the field of natural language processing, and particularly exhibit strong capabilities in reasoning tasks. By introducing mechanisms such as a thinking Chain (Chain-of-Thought), the model is allowed to develop multiple rounds of intermediate reasoning in steps before generating a final answer, and the human thinking process is simulated to improve the answer quality. This "deep thinking" mode significantly enhances the model's ability to understand complex problems, enabling it to handle longer tasks of the logic chain. However, conclusions that look reasonable but deviate from the fact actually may be drawn based on erroneous premises or local information during the inference process, resulting in limited accuracy of the final answer. Disclosure of Invention In view of the foregoing, it is desirable to provide an inference processing method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the accuracy of inference. In a first aspect, the present application provides a method of reasoning processing. The method comprises the following steps: Acquiring an reasoning request, and reasoning based on the reasoning request; Acquiring the thinking content generated in the reasoning process, and compressing the thinking content when the thinking content meets the preset condition for triggering compression to obtain compressed thinking content; And continuing to perform reasoning based on the compressed thinking content to obtain a reasoning result of the reasoning request. In a second aspect, the application further provides an reasoning processing device. The device comprises: the reasoning triggering module is used for acquiring the reasoning request and reasoning based on the reasoning request; The compression processing module is used for acquiring the thinking content generated in the reasoning process, and compressing the thinking content to obtain compressed thinking content when the thinking content meets the preset condition for triggering compression; and the reasoning result obtaining module is used for continuing to conduct reasoning based on the compressed thinking content to obtain a reasoning result of the reasoning request. In some embodiments, the compression processing module is further configured to determine a content length of the content, and compress the content when the content length exceeds a threshold value of the content length, thereby obtaining compressed content. In some embodiments, the reasoning processing apparatus further comprises a threshold determination module for determining a length scale and a context processing length for the reasoning request, determining a first length threshold according to the length scale and the context processing length, and determining a thinking length threshold based on the first length threshold and a preset second length threshold. In some embodiments, the compression processing module is further configured to compress the content when the content includes a target type mark that triggers compression, so as to obtain compressed content. In some embodiments, the compression processing module is further configured to obtain compression prompt information for guiding execution of compression, and generate compressed thinking content according to the compression prompt information and the thinking content. In some embodiments, the compression processing module is further configured to perform recognition analysis on the thinking content to obtain an analysis result, and generate compression prompt information based on the analysis result and a preset compression parameter. In some embodiments, the compression processing module is further configured to obtain compression configuration information, where the compression configuration information includes a preset compression parameter and a preset condition for triggering compression, and when the thinking content meets the preset condition for triggering compression, compress the thinking content according to the preset compression parameter to obtain compressed thinking content. In some embodiments, the reasoning result obtaining module is further configured to obtain verification prompt information for guiding to perform verification, verify the compressed thinking content based on the verification prompt information to obtain a verification result, and continuously perform reasoning based on the compressed thinking content until the reasoning is completed when the verification result is that the