Search

CN-121660095-B - Adaptive buffer management method, system, device, medium and program product for streaming response

CN121660095BCN 121660095 BCN121660095 BCN 121660095BCN-121660095-B

Abstract

The invention discloses a self-adaptive buffer management method, a system, equipment, a medium and a program product of streaming response, wherein the method comprises the steps of determining a minimum buffer granularity parameter according to a user request; the method comprises the steps of obtaining incremental content fragments generated based on a user request, classifying the incremental content fragments into inference fragments and text fragments, writing the inference fragments into an inference buffer, writing the text fragments into a text buffer, outputting the inference fragments in the inference buffer and emptying the inference buffer if the length of the accumulated inference fragments in the inference buffer is larger than or equal to a minimum buffer granularity parameter for the inference buffer, and outputting residual inference fragments with complete semantics in the inference buffer in a concentrated manner and forming and outputting a final answer text by using the text fragments in the text buffer when the text fragments in the text buffer are larger than or equal to the minimum buffer granularity parameter for the first time for the text buffer. The invention can effectively promote the perception continuity of the user to the inference chain, thereby reducing the fragmented output of the inference content.

Inventors

  • CAI MEIJIE
  • Deng Chengjing

Assignees

  • 北京点富科技有限公司

Dates

Publication Date
20260508
Application Date
20251218

Claims (9)

  1. 1. A method for adaptive buffer management of streaming responses, the method comprising: Determining a minimum buffer granularity parameter according to a user request; acquiring an incremental content segment generated based on a user request; classifying the increment content segments into reasoning segments and text segments, writing the reasoning segments into a reasoning buffer, and writing the text segments into a text buffer; aiming at the reasoning buffer, if the accumulated reasoning segment length in the reasoning buffer is larger than or equal to the minimum buffer granularity parameter, outputting the reasoning segment in the reasoning buffer and emptying the reasoning buffer; For the text buffer, when the text fragments in the text buffer are larger than or equal to the minimum buffer granularity parameter for the first time, outputting the residual reasoning fragments with complete semantics in the reasoning buffer in a concentrated manner, forming a final answer text by using the text fragments in the text buffer, and outputting the final answer text; and outputting the residual reasoning fragments with complete semantics in the reasoning buffer in a concentrated way, forming a final answer text by using the text fragments in the text buffer and outputting the final answer text, wherein the method comprises the following steps: querying residual reasoning fragments in the reasoning buffer; If the residual reasoning fragments exist in the reasoning buffer, forming a complete reasoning tail block by utilizing the residual reasoning fragments and outputting the complete reasoning tail block; triggering and outputting a thinking completion event; reading the residual text segment in the text buffer; Forming a text tail block by using the residual text segment; And generating and outputting a final text block based on the text tail block.
  2. 2. The adaptive buffer management method of streaming response according to claim 1, wherein determining the minimum buffer granularity parameter according to the user request comprises: Acquiring user intention according to the user request; determining a task type based on the user intent; And dynamically adjusting the minimum buffering granularity parameter according to the task type.
  3. 3. The adaptive buffer management method of streaming response according to claim 1, wherein outputting the inferred segments in the inferred buffer comprises: Combining the reasoning fragments in the reasoning buffer into a readable reasoning semantic block for output; if the reasoning fragments are key abnormal sentences, immediately outputting the key abnormal sentences; If the reasoning fragments are non-closed sentences, the non-closed sentences are not output until the complete closed sentences are continuously accumulated; And when the preset number of continuous summarization inference fragments and/or predictive inference fragments exist, terminating the inference process and entering a text stage.
  4. 4. The method of adaptive buffering management for streaming responses of claim 1, further comprising: and if the user actively terminates the request or the system is abnormally terminated, respectively sorting and outputting contents in the reasoning buffer and the text buffer, and emptying the reasoning buffer and the text buffer.
  5. 5. The method of adaptive buffering management for streaming responses of claim 1, further comprising: and carrying out link tracking on the output of the reasoning buffer and the text buffer, and aggregating the output reasoning blocks and text blocks according to time sequence and archiving and storing.
  6. 6. The self-adaptive buffer management system for the streaming response is characterized by comprising a configuration management module, a LLM text stream generator, a buffer manager, an inference buffer, a text buffer, an event processor and a tracking module; The configuration management module is used for configuring the minimum buffering granularity parameter; the LLM text stream generator is used for generating incremental content fragments based on user requests; The buffer manager is used for classifying the increment content segments into reasoning segments and text segments, writing the reasoning segments into a reasoning buffer, and writing the text segments into a text buffer; The reasoning buffer is used for storing the reasoning fragments, and if the accumulated length of the reasoning fragments in the reasoning buffer is greater than or equal to the minimum buffer granularity parameter, the reasoning fragments in the reasoning buffer are output and the reasoning buffer is emptied; The text buffer is used for storing the text fragments, and when the text fragments in the text buffer are larger than or equal to the minimum buffer granularity parameter for the first time, the residual reasoning fragments with complete semantics in the reasoning buffer are intensively output, and the text fragments in the text buffer are utilized to form a final answer text and output; and outputting the residual reasoning fragments with complete semantics in the reasoning buffer in a concentrated way, forming a final answer text by using the text fragments in the text buffer and outputting the final answer text, wherein the method comprises the following steps: querying residual reasoning fragments in the reasoning buffer; If the residual reasoning fragments exist in the reasoning buffer, forming a complete reasoning tail block by utilizing the residual reasoning fragments and outputting the complete reasoning tail block; triggering and outputting a thinking completion event; reading the residual text segment in the text buffer; Forming a text tail block by using the residual text segment; Generating and outputting a final text block based on the text tail block; The event processor is used for triggering and outputting a thinking completion event after forming and outputting a complete reasoning tail block; And the tracking module is used for carrying out link tracking on the output of the reasoning buffer and the text buffer, and aggregating the output reasoning blocks and text blocks according to time sequence and archiving and storing.
  7. 7. An adaptive buffer management device for streaming response, wherein the device comprises a processor and a memory; The memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions for performing the steps of a streaming response adaptive buffer management method according to any of claims 1 to 5.
  8. 8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a streaming response adaptive buffer management method according to any of claims 1 to 5.
  9. 9. A computer program product comprising computer program instructions which, when executed by a processor, implement the steps of a streaming response adaptive buffer management method according to any of claims 1 to 5.

Description

Adaptive buffer management method, system, device, medium and program product for streaming response Technical Field The present invention relates to the field of large model technologies, and in particular, to a method, a system, an apparatus, a medium, and a program product for adaptive buffer management of streaming response. Background With the wide application of multi-modal large models in conversational, decision-making aid, content generation, etc., the desire of user interaction shifts from "waiting for final answer" to acquiring the thinking process and staged content in real time. However, the conventional streaming output scheme has the following drawbacks: 1. Most of the prior art can distinguish 'reasoning' and 'text answer' into different chunk types in the implementation, but still adopts pure token/character length driven increment refreshing, and the semantic boundary (hypothesis forming, verifying and converging) of an unidentified reasoning chain is not recognized, so that the reasoning chain is cut off at a position of logic turning, syntax unclosed or temporary self-correcting statement, and a user needs to repeatedly trace back and splice to the fragmented thinking, so that transient hypothesis which is not converged is possibly misregarded as a stable conclusion, and readability and decision accuracy are reduced. 2. The existing 'reasoning chunk' and 'answer chunk' are separated in type, but the output is always staggered in time and lacks a centralized flushing mechanism of an explicit stage event (starting/finishing) and the rest of reasoning tail blocks, reasoning residues are delayed or interweaved with the first answer text, a user cannot judge whether the system is still reasoning or has entered a finalizing stage, early adoption of an insufficiently verified intermediate conclusion is easy to occur or a final integration stage is ignored, task execution rhythm and risk control are affected, and the integrity of follow-up tracking/auditing is weakened. 3. The reasoning process and the final executable/trusted replies lack of hierarchical buffering and explicit stage marking, exploratory thinking and deterministic conclusions are staggered, making it difficult for users to distinguish "still constructing assumptions" from "finished finalizes", possibly performing unverified assumptions prematurely or ignoring critical conclusions, affecting decision cadence and risk control. 4. The fixed buffer strategy cannot give consideration to the real-time performance and information integrity of different tasks. For example, the semantic block granularity differences of the steps of image analysis, long text generation, video recommendation and the like are obvious. 5. User discontinuation (stream termination) often occurs when the residual content of the thought is not consolidated out, causing context loss or tracking difficulties. In addition, the prior art scheme only provides a Boolean switch of whether thinking is displayed or not, and lacks the comprehensive capability of intelligent complementary refreshing of residual content in the thinking stage, text and reasoning track logic hierarchical buffering and dynamic minimum granularity control. Therefore, it is needed to develop a buffer management algorithm capable of adaptively adjusting and supporting selectively outputting the inferred contents, so as to solve the problems existing in the prior art. Disclosure of Invention In view of the above, embodiments of the present invention provide a method, a system, a device, a medium, and a program product for adaptive buffer management of streaming response, which at least partially solve the problems in the prior art. Other features and advantages of the invention will be apparent from the following detailed description, or may be learned by the practice of the invention. In order to achieve the above object, the embodiment of the present invention provides the following technical solutions: according to a first aspect of an embodiment of the present invention, there is provided a method for adaptive buffer management of streaming response, the method including: Determining a minimum buffer granularity parameter according to a user request; acquiring an incremental content segment generated based on a user request; classifying the increment content segments into reasoning segments and text segments, writing the reasoning segments into a reasoning buffer, and writing the text segments into a text buffer; aiming at the reasoning buffer, if the accumulated reasoning segment length in the reasoning buffer is larger than or equal to the minimum buffer granularity parameter, outputting the reasoning segment in the reasoning buffer and emptying the reasoning buffer; And for the text buffer, when the text fragments in the text buffer are larger than or equal to the minimum buffer granularity parameter for the first time, outputting the residual reasoning fragments with complete semantics in