Search

KR-20260064451-A - Method, apparatus and computer program for generating SQL for natural language queries

KR20260064451AKR 20260064451 AKR20260064451 AKR 20260064451AKR-20260064451-A

Abstract

The present invention relates to a method, apparatus, and computer program for generating SQL for a natural language query. The method for generating SQL for a natural language query according to the present invention comprises the steps of: receiving a natural language query from a user terminal; transmitting the natural language query, table information, and table column information to an LLM; receiving SQL generated in response to the natural language query from the LLM; and transmitting the SQL to the user terminal.

Inventors

  • 김형찬
  • 박준희

Assignees

  • 삼성에스디에스 주식회사

Dates

Publication Date
20260507
Application Date
20250411
Priority Date
20241031

Claims (19)

  1. Regarding the method of generating SQL for natural language queries, A step of receiving a natural language query from a user terminal; A step of transmitting the above natural language query, table information, and table column information to the LLM; A step of receiving SQL generated in response to the natural language query from the above LLM; and A method comprising the step of transmitting the above SQL to the user terminal.
  2. In paragraph 1, The above natural language query is a method that is an SQL request prompt expressed in natural language.
  3. In paragraph 1, A method in which the table information is schema information for a table related to the natural language query, comprising at least one of a table name, a table column name, pk (primary key) information, and fk (foreign key) information.
  4. In paragraph 1, A method in which the above table column information is schema information for a table related to the above natural language query, and is the data of the table column related to the above natural language query.
  5. In paragraph 1, The step of transmitting the above natural language query, table information, and table column information to the LLM is, Step of receiving a first table list from a database; A step of transmitting the above natural language query and the first table list to the LLM; A step of receiving a second table list from the above LLM; A step of requesting the table information and table column information for each table included in the second table list from the vector DB; A step of receiving the table information and the table column information from the above vector DB; and A method comprising the step of transmitting the above natural language query, the above table information, and the above table column information to the above LLM.
  6. In paragraph 5, The above database is a relational database (RDB) that stores data in the form of tables and manages data through relationships between tables.
  7. In paragraph 5, The first table list is a method that is the entire table list.
  8. In Paragraph 7, A method in which the second table list is a table list among the first table lists related to the above natural language query.
  9. In paragraph 5, A method for storing the above vector DB by embedding the above table information and the above table column information.
  10. A computer program stored on a medium for executing a method of generating SQL for a natural language query of any one of claims 1 through 9, combined with hardware.
  11. A device comprising a processor and generating SQL for natural language queries, The above processor is, Receiving a natural language query from a user terminal; Transmitting the above natural language query, table information, and table column information to the LLM; Receiving SQL generated from the above LLM in response to the above natural language query; and A device that executes, including transmitting the above SQL to the user terminal.
  12. In Paragraph 11, The above natural language query is a device that is an SQL request prompt expressed in natural language.
  13. In Paragraph 11, A device comprising at least one of table name, table column name, pk (primary key) information, and fk (foreign key) information, wherein the table information is schema information for a table related to the natural language query.
  14. In Paragraph 11, A device in which the above table column information is schema information for a table related to the above natural language query, and is the data of the table column related to the above natural language query.
  15. In Paragraph 11, Transmitting the above natural language query, table information, and table column information to the LLM is, Receiving a first table list from the database; Transmitting the above natural language query and the first table list to the LLM; Receiving a second table list from the above LLM; Requesting the table information and table column information for each table included in the second table list from the vector DB; Receiving the table information and the table column information from the above vector DB; and A device that performs the following: transmitting the above natural language query, the above table information, and the above table column information to the above LLM.
  16. In paragraph 15, The above database is a device that is a relational database (RDB) that stores data in the form of tables and manages data through relationships between tables.
  17. In paragraph 15, The first table list is a device that is the entire table list.
  18. In Paragraph 17, A device in which the second table list is a table list among the first table lists related to the above natural language query.
  19. In paragraph 15, The above vector DB is a device that stores the above table information and the above table column information by embedding them.

Description

Method, apparatus and computer program for generating SQL for natural language queries The present invention relates to a method, apparatus, and computer program for generating SQL (Structured Query Language) through natural language queries. More specifically, it relates to a method, apparatus, and computer program for generating SQL for natural language queries using a general-purpose LLM that has not undergone separate additional training (such as fine-tuning). Even more specifically, it relates to a method, apparatus, and computer program for improving the response accuracy of a general-purpose LLM for generating SQL for natural language queries (Text-to-SQL). Recently, various services that generate code using general-purpose LLMs have emerged to enhance development productivity. Among these, systems that respond to user queries by converting them into SQL or displaying the execution results of corresponding SQL queries are being introduced. However, compared to the generation of development code (e.g., Java, C, C++, Python, etc.), SQL generation often exhibits hallucination because it fails to generate accurate tables, columns, and join conditions based solely on user queries. Furthermore, while SQL generated solely from user queries can serve as a reference for users unfamiliar with SQL, it may not function in systems that retrieve the actual execution results. To address this issue, a method is sometimes adopted where schema information of related tables (such as table names, column names, primary keys, and foreign keys) is appended to the prompt sent to the LLM along with the user query. However, even in this case, if the database contains hundreds or thousands of tables, it is difficult to practically apply this approach because there are limitations to the token size processed by the LLM when trying to include all table schemas in the prompt. Furthermore, there are many cases where a single table contains hundreds or more columns; in such instances, including a single table schema in the prompt may exceed the LLM's token size, making processing impossible. In other words, although there is a need for SQL generation methods to prevent hallucination even in general database environments where hundreds or thousands of tables exist and each table has hundreds of columns, an appropriate solution for this has not yet been presented. The accompanying drawings, which are included as part of the detailed description to aid in understanding the present invention, provide embodiments of the present invention and explain the technical concept of the present invention together with the detailed description. Figure 1 illustrates an SQL generation process according to the prior art. FIG. 2 illustrates an SQL generation process according to one embodiment of the present invention. FIG. 3 illustrates SQL generated for a natural language query according to one embodiment of the present invention. FIG. 4 illustrates a function for generating an SQL query that requests to search for desired data in a database according to an embodiment of the present invention. FIG. 5 is a schematic diagram illustrating the process of generating SQL based on natural language query input according to one embodiment of the present invention. FIG. 6 is a flowchart illustrating a method for generating SQL through a natural language query according to an embodiment of the present invention. FIG. 7 is a signal flow diagram illustrating the operation of an SQL generation system according to one embodiment of the present invention. FIG. 8 illustrates a prompt sent to an LLM to receive a final table list according to one embodiment of the present invention. FIG. 9 illustrates a prompt sent to an LLM for SQL generation according to one embodiment of the present invention. FIG. 10 illustrates an apparatus to which the proposed method of the present invention can be applied. Hereinafter, embodiments disclosed in this specification will be described in detail with reference to the accompanying drawings. The objects, specific advantages, and novel features of the present invention will become more apparent from the following detailed description and preferred embodiments in conjunction with the accompanying drawings. Prior to this, the terms and words used in this specification and claims are appropriately defined by the inventor to best describe his invention and should be interpreted in a meaning and concept consistent with the technical spirit of the invention; they are intended merely to describe embodiments and should not be interpreted as limiting the invention. In assigning reference numerals to components, identical or similar components are assigned the same reference numeral regardless of the reference numeral, and redundant descriptions thereof are omitted. The suffixes "module" and "part" used for components in the following description are assigned or used interchangeably for the sake of ease of drafting the specification; they do no