Search

CN-121997377-A - LLM-oriented Text-to-SQL intention query method and system

CN121997377ACN 121997377 ACN121997377 ACN 121997377ACN-121997377-A

Abstract

The invention discloses a Text-to-SQL intention query method and system for LLM, which are used for receiving natural language query, driving a large language model to conduct intention analysis and task decomposition on the natural language query through a prompt template with built-in security constraint rules, outputting an atomic task with security labels, combining task description and acquired database mode information to form prompts aiming at each atomic task, driving the large language model to generate candidate SQL, sequentially executing multi-layer verification of mode consistency verification, operation white list verification and semantic security verification on the generated candidate SQL, executing SQL marked as security executable to acquire an actual query result, capturing an intermediate reasoning result of the large language model in the generation process, and optimizing the large language model according to a comparison result of the real data and the intermediate reasoning result. The invention obviously improves the practicability, reliability and auditability of the intelligent query of the database.

Inventors

  • Wen Feiyang
  • LIU JINSHUO

Assignees

  • 武汉大学

Dates

Publication Date
20260508
Application Date
20260410

Claims (10)

  1. 1. An LLM-oriented Text-to-SQL intent query method is characterized by comprising the following steps: Receiving natural language query, driving a large language model to analyze intention and decompose task of the natural language query through a prompt template with built-in security constraint rules, and outputting an atomic task with a security label, wherein the atomic task with the security label is expressed in a structural form, and each task comprises a unique identifier, task description, an expected access list and a security attribute object; For each atomic task, acquiring database mode information of an expected access table from a pre-constructed knowledge base, combining task description and the database mode information to form a prompt, and driving a large language model to generate candidate SQL; Sequentially executing multi-layer verification of mode consistency verification, operation white list verification and semantic security verification on the generated candidate SQL based on the security label of the atomic task and the security constraint rule, and marking the candidate SQL as being safe and executable when the multi-layer verification is completely passed; And executing SQL marked as safe executable to acquire an actual query result, capturing an intermediate reasoning result of the large language model in the generation process, comparing the actual query result with the intermediate reasoning result as real data, and optimizing the large language model according to the comparison result.
  2. 2. The LLM oriented Text-to-SQL intent query method of claim 1, wherein the security constraint rules comprise an operation whitelist for performing permission control on specific SQL operations performed on the database, a table-level access control rule for defining access permissions of different roles or users to specific database tables, and a resource-level guard rule for limiting resource consumption of query operations.
  3. 3. The LLM-oriented Text-to-SQL intent query method of claim 2, wherein the driving of the large language model through the prompt template with built-in security constraint rules to perform intent resolution and task decomposition on the natural language query, outputting the security-tagged atomic task, comprises: performing intention analysis on the natural language query, and decomposing the natural language query into atomic tasks which can be independently executed according to an intention analysis result; And according to the matching condition of the role information of the current user and the table-level access control rule, adding a corresponding security label to the atomic task, and outputting the atomic task with the security label, wherein the atomic task is represented in a structural form.
  4. 4. The LLM oriented Text-to-SQL intention query method of claim 1, wherein the database schema information comprises table names, field names, data types, field notes, primary and foreign key constraints, index information, and associations between tables.
  5. 5. The LLM-oriented Text-to-SQL intent query method of claim 2, wherein the multi-layer verification of pattern consistency check, operation whitelist check, and semantic security check is sequentially performed on the generated candidate SQL based on security labels of atomic tasks and security constraint rules, comprising: checking whether table names and field names quoted in SQL exist in a database mode or not and the data types are matched, and if the table names and the field names are all met, passing the consistency check; Checking whether the operation type of SQL is in a preset operation white list or not, and simultaneously detecting whether dangerous grammar is contained or not, and if the operation type of SQL is in the operation white list and does not contain dangerous grammar, checking the operation white list to pass, wherein the dangerous grammar comprises data definition language operation, deletion without WHERE condition, update operation and bypass check through an annotator; Checking whether SQL complies with the security label of the atomic task, specifically including whether a table outside the list of expected access tables is accessed, whether field-level filtering requirements are met, and whether the constraint of resource protection rules is met, and if so, passing the semantic security check.
  6. 6. The LLM-oriented Text-to-SQL intent query method of claim 1, wherein comparing actual query results as real data with intermediate inference results, optimizing a large language model based on the comparison results, comprises: comparing the real data with the intermediate reasoning result, and identifying the cognitive deviation of the model on field mapping and logic understanding to obtain deviation data; forming an alignment data set from the deviation data; and (3) using the alignment data set to regularly calibrate and optimize the large language model by adopting a supervised fine tuning or reinforcement learning method.
  7. 7. The LLM oriented Text-to-SQL intent query method of claim 1, further comprising: and recording a structured audit log returned from the user query to the result, wherein the audit log is represented by a JSON structure and comprises the natural language query, an atomic task list, candidate SQL, each layer of verification result, execution result and feedback data of the user.
  8. 8. An LLM-oriented Text-to-SQL intent query system, comprising: The system comprises a safety intention analysis module, a safety constraint rule analysis module and a safety attribute analysis module, wherein the safety intention analysis module is used for receiving natural language query, driving a large language model to analyze intention and decompose task of the natural language query through a prompt template with the built-in safety constraint rule, and outputting an atomic task with a safety label, wherein the atomic task with the safety label is represented in a structural form, and each task comprises a unique identifier, a task description, an expected access list and a safety attribute object; The system comprises a safe SQL compiling module, a multi-layer verification module and a multi-layer verification module, wherein the safe SQL compiling module is used for acquiring database mode information of an expected access table of each atomic task from a pre-constructed knowledge base, combining task description and database mode information to form a prompt and driving a large language model to generate candidate SQL; The cognition enhancement feedback module is used for executing SQL marked as safe executable to acquire an actual query result, capturing an intermediate reasoning result of the large language model in the generation process, comparing the actual query result with the intermediate reasoning result as real data, and optimizing the large language model according to the comparison result.
  9. 9. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements a LLM oriented Text-to-SQL intent query method as claimed in any of claims 1 to 7.
  10. 10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a LLM oriented Text-to-SQL intent query method as claimed in any of claims 1 to 7 when the program is executed by the processor.

Description

LLM-oriented Text-to-SQL intention query method and system Technical Field The invention relates to the field of intersection of artificial intelligence and database technology, in particular to a LLM-oriented Text-to-SQL intention query method and system. Background Natural language based database query (Text-to-SQL) technology is a key bridge connecting human intent with structured data, and its development history is closely related to artificial intelligence technology evolution. The early method mainly relies on rule templates and grammar analysis, and natural language keywords are mapped to SQL components through predefined rules, so that the method has strong interpretability, but weak generalization capability, and is difficult to cope with complex and diverse user expressions. With the rise of machine learning, a statistical learning method is introduced, and a mapping relation from language to SQL is learned through a labeling data training model, but the mapping relation is limited by labeling cost and semantic understanding depth, and is inadequately adapted to complex query and unknown database modes. In recent years, deep learning and pre-training language models have driven paradigm changes in Text-to-SQL technology. Particularly, the method based on the Large Language Model (LLM) can better process semantic ambiguity, context dependence and complex logic in natural language by virtue of strong language understanding and generating capability, so that the overall performance of the task is remarkably improved, and the interaction between non-technical users and the database is possible in natural language. However, despite the continuous progress of technology, the existing LLM-driven Text-to-SQL system has the problems of security risk and insufficient credibility in industrial application scenes facing high security and high credibility requirements. Disclosure of Invention Aiming at the technical problems of safety risk and insufficient credibility in the prior art, the invention provides a LLM-oriented Text-to-SQL intention query method and system, which aim to construct a closed loop system comprising intention safety analysis, query safety compilation, cognition enhancement based on real feedback and full-link audit traceability so as to realize natural language query oriented to a large language model and high credibility application in database interaction. The core scheme of the invention is to design a dual-channel processing and enhancing framework based on safe proactive and cognitive evolution. The first channel is a secure execution channel. The channel converts user intention into an atomic task with a security level mark by introducing a security constraint and task decomposition mechanism in an intention analysis stage, and generates and compiles SQL in a security query framework formed by database mode sensing and operation whitelist. The multi-layer defense system formed by mode verification, operation white list verification and semantic security verification ensures that query operation is executed within a compliance boundary, and solves the problems of passive security control and easy override or illegal operation in the traditional method. The second channel is a cognition enhancement and audit channel. The channel automatically recognizes and captures the cognitive deviation of the model in field mapping and logic understanding by comparing the real result returned by the database with the middle expectation (such as thinking chain) of the large language model in the generation process, forms a feedback signal to construct an alignment data set, and drives the continuous calibration and optimization of the model. This mechanism upgrades the traditional single-execution error correction to a system-level continuous evolutionary capability, fundamentally alleviating the data illusion. Meanwhile, the structured audit log from intention analysis to result return is recorded in the whole process of the channel, so that the whole process traceability and credible evaluation of an operation link are realized, and the rigid requirement of the high-safety field on process compliance audit is met. Through the cooperative work of the two channels, the invention realizes systematic jump from single error correction to continuous evolution, from passive verification to active defense and from operation record to compliance audit, and remarkably improves the practicability, reliability and auditability of the Text-to-SQL system under the scene of high safety requirements. In order to achieve the above object, a first aspect of the present invention provides an intent query method of LLM-oriented Text-to-SQL, comprising: Receiving natural language query, driving a large language model to analyze intention and decompose task of the natural language query through a prompt template with built-in security constraint rules, and outputting an atomic task with a security label, wherein th