CN-122019577-A - Storage and query system and method based on intelligent agent long text tool output

CN122019577ACN 122019577 ACN122019577 ACN 122019577ACN-122019577-A

Abstract

The invention belongs to the technical field of artificial intelligence, and discloses a storage and query system and a method based on the output of an agent length text tool, wherein the system comprises a warehousing module for receiving an original response, the original response is subjected to standardization processing, and the segmentation and indexing module is used for cutting the standardized text into retrievable segments containing text data according to a preset strategy and generating corresponding index items and metadata for each retrievable segment; the NL2SQL module is used for generating SQL query according to user input, the aggregation and summary module is used for executing SQL query, and the text data corresponding to the retrievable segments is acquired by matching index items or metadata, after the text data is aggregated, the large model is called to generate natural language answers according to the aggregated text data, so that the problem of contextual overload caused by a full-length text input model is avoided, the calculation cost is reduced, the limitation of the upper limit of the input of the model is avoided, and the running efficiency and the suitability of the system are improved.

Inventors

ZHANG KUN
WU MIN
ZHOU YONGCHUAN
LI DONGLIN
PENG JIANG

Assignees

重庆数字资源集团有限公司

Dates

Publication Date: 20260512
Application Date: 20251222

Claims (10)

1. A storage and query system based on an agent-long text tool output, the system comprising: The warehousing module is used for receiving the original response of the intelligent agent long text tool and carrying out standardized processing on the original response to obtain a standardized text; The segmentation and indexing module is used for cutting the standardized text into retrievable segments containing text data according to a preset strategy and generating corresponding index items and metadata for each retrievable segment; A data storage module for storing the retrievable segments, index items, and metadata; the NL2SQL module is used for generating an SQL query according to user input; And the execution aggregation and summary module is used for executing SQL query, matching index items or metadata to obtain text data of the corresponding retrievable segments, and calling a big model to generate natural language answers according to the aggregated text data after the text data is aggregated.
2. The agent-long text tool output-based storage and query system of claim 1, further comprising: and the SQL verification and sandbox module is used for verifying the SQL query, obtaining the verified SQL query and then sending the verified SQL query to the execution aggregation and summary module.
3. The system of claim 2, wherein the means for validating the SQL query comprises at least one of static verification, cost estimation, and rights verification.
4. The system of claim 1, wherein the processing means for normalizing the original response comprises at least one of character normalization, denoising, text fingerprinting, deduplication, or metadata extraction.
5. The system of claim 1, wherein the preset cutting strategy of the slicing and indexing module comprises at least one of a semantic paragraph, a length threshold constraint, a topic boundary detection, or a special retention rule cutting strategy.
6. The system of claim 1, wherein the index entries comprise a vector embedding, a full text index entry, and an inverted index entry.
7. The system of claim 1, wherein the data storage module uses Milvus vector databases to store index entries and metadata, and uses PostgreSQL databases to store retrievable segments, and wherein the data storage module supports vector similarity retrieval, precision retrieval, boolean retrieval, and fuzzy retrieval.
8. A storage and query method based on an agent long text tool output, applied to the storage and query system based on an agent long text tool output according to any one of claims 1 to 7, characterized in that the method comprises: Receiving an original response of an intelligent agent long text tool, and carrying out standardization processing on the original response to obtain a standardized text; cutting the standardized text into retrievable segments containing text data according to a preset strategy, and generating corresponding index items and metadata for each retrievable segment; Storing the retrievable segment, index item, and metadata; Generating SQL queries according to user inputs; Executing SQL query, matching index items or metadata to obtain text data corresponding to the retrievable segments, and calling a large model to generate natural language answers according to the aggregated text data after the text data is aggregated.
9. A computer device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the steps of the agent-based long text tool output storage and query method of claim 8.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the storing and querying method based on the output of an agent-long text tool as claimed in claim 8.

Description

Storage and query system and method based on intelligent agent long text tool output Technical Field The invention relates to the technical field of artificial intelligence, in particular to a storage and query system and method based on the output of an agent-long text tool. Background Along with the rapid iteration of the large language model technology, an intelligent agent driven by the large language model technology as a core is widely applied to a plurality of fields such as office automation, enterprise data analysis, intelligent customer service, scientific research literature carding and the like. One of the core advantages of the intelligent agent is that external tools (such as professional database query tools, vertical field search engines, industry data API interfaces, document retrieval tools and the like) can be flexibly called, the defects of intrinsic knowledge on timeliness, professionality and data coverage are overcome, and therefore accurate and reference value results are output. However, in the actual landing process, the returned results of such external tools often exhibit the characteristic of "long text, high density information", which presents significant challenges for the efficient processing of the intelligent agent. For example, the output of an industry deep analysis tool usually covers multi-dimensional content such as market size, bid product dynamics, user preference and the like, the space is often tens of thousands, and the returned result of the database batch query tool can comprise hundreds of data records and associated dimension descriptions, and the volume is even broken through hundreds of thousands. These long text tool outputs are key information sources for the agent to complete complex tasks, but the existing processing mode has the following obvious defects: Direct input to the model-the entire returned contents of the tool are provided as context to the model, but this approach can result in context Wen Guochang, a large computational overhead, and even exceeding the upper input limit of the model. And the abstract compression is that the returned result is simply abstracted, but detail information is easy to lose, and the accuracy and the reliability of the intelligent agent are affected. Therefore, how to effectively process long text tool output under the framework of the intelligent agent, not only can the complete information be saved, but also the required parts can be efficiently extracted according to the problems is a technical problem to be solved. Disclosure of Invention Aiming at the defects of the prior art, the application provides a storage and query system and a method based on the output of an intelligent agent long text tool, so as to solve the technical problems that the prior art cannot meet the requirements of static and dynamic urban treatment at the same time and the treatment efficiency is not high. To achieve the object of the present application, in a first aspect, the present application provides a storage and query system based on an agent-long text tool output, the system comprising: The warehousing module is used for receiving the original response of the intelligent agent long text tool and carrying out standardized processing on the original response to obtain a standardized text; The segmentation and indexing module is used for cutting the standardized text into retrievable segments containing text data according to a preset strategy and generating corresponding index items and metadata for each retrievable segment; A data storage module for storing the retrievable segments, index items, and metadata; the NL2SQL module is used for generating an SQL query according to user input; And the execution aggregation and summary module is used for inquiring according to SQL, acquiring text data corresponding to the retrievable segment by matching with the index item or metadata, and calling the big model to generate a natural language answer according to the aggregated text data after the text data is aggregated. Further, the system further comprises: and the SQL verification and sandbox module is used for verifying the SQL, obtaining the verified SQL and then sending the SQL to the execution aggregation and summary module. By adding the SQL verification and sandbox module, after the NL2SQL module generates the SQL query and before the aggregation and summary module executes the query, a verification link is added, so that the problems that the unverified SQL is directly executed and possibly causes data security risks (such as malicious operation and unauthorized access) and invalid query consumes system resources are solved, illegal and unsafe SQL can be filtered, the security of data in the data storage module is ensured, meanwhile, the invalid query is prevented from occupying system resources, and the reliability and resource utilization efficiency of the query operation executed by the system are improved. Further, the verif