Search

CN-116756167-B - Method for generating query statement set of database and method for training ordering model

CN116756167BCN 116756167 BCN116756167 BCN 116756167BCN-116756167-B

Abstract

The application relates to the field of information technology and discloses a method for generating a query statement set of a database and a method for training a sequencing model, wherein the method for generating the query statement set of the database comprises a labeling step, a step of extracting a plurality of keywords from a plurality of sample Structured Query Language (SQL) statements of the database, and a step of acquiring semantic labels of each keyword; the method comprises a sample SQL sentence dividing step, a conversion step and a conversion step, wherein the sample SQL sentence dividing step divides each sample SQL sentence into a plurality of query units, and the plurality of query units are recombined for a plurality of times based on a preset recombination rule to obtain a plurality of candidate SQL sentences, and the conversion step converts each candidate SQL sentence into a template language sentence based on a plurality of semantic labels to obtain a plurality of template language sentences, wherein the plurality of candidate SQL sentences and the plurality of template language sentences form a query sentence set. The application can improve the accuracy of the data query result.

Inventors

  • HE ZHENYING
  • WANG XIAOYANG
  • FAN YUANKAI
  • REN TONGHUI

Assignees

  • 复旦大学

Dates

Publication Date
20260508
Application Date
20230515

Claims (15)

  1. 1. A method for generating a set of query statements for a database for an electronic device, the method comprising: A labeling step, extracting a plurality of keywords from a plurality of sample structured query language sentences of a database, and acquiring semantic labels of the keywords, wherein the keywords comprise table names, column names and table connection relations in the database; A reorganization step, based on grammar rules of a structured query language, splitting each sample structured query language sentence into a plurality of query units, and based on preset reorganization rules, reorganizing the plurality of query units for a plurality of times to obtain a plurality of candidate structured query language sentences, wherein the query units have semantics and are minimum composition structures of the structured query language sentences, and the preset reorganization rules comprise grammar rules of the structured query language, occurrence frequency rules of the query units, sentence word number rules and any combination of the grammar rules; a conversion step of converting each of the candidate structured query language sentences into a template language sentence based on a plurality of the semantic annotations to obtain a plurality of template language sentences, the template language sentences being natural-like language sentences, Wherein the plurality of candidate structured query language statements and the plurality of template language statements form the query statement set in one-to-one correspondence.
  2. 2. The method of claim 1, wherein each structured query language statement is comprised of a plurality of query units, the keywords being associated with the query units.
  3. 3. The method according to claim 2, wherein in the reorganizing step, each of the sample structured query language sentences is split into a plurality of the query units based on a grammar rule of a structured query language, and the occurrence frequency of each of the query units is counted.
  4. 4. The method of claim 3, wherein the frequency of occurrence rule for query units comprises that the number of reorganizations for query units with high frequency of occurrence is greater than the number of reorganizations for query units with low frequency of occurrence.
  5. 5. The method of claim 2, wherein the converting step further comprises: Splitting each candidate structured query language sentence into a plurality of query units; a translation step of translating each query unit of each candidate structured query language sentence into the semantic annotation of the associated keyword; Combining, namely combining the plurality of semantic labels obtained through translation to obtain template language sentences of each candidate structured query language sentence.
  6. 6. A method of training a ranking model for an electronic device, the method comprising: An extracting step of extracting a part of the template language sentence from the plurality of template language sentences obtained by the method according to any one of claims 1 to 5 as training data; Training the ranking model by using the training data and standard natural language sentences to obtain a trained ranking model.
  7. 7. The method of claim 6, wherein the extracting step further comprises: Splitting, namely splitting candidate structured query language sentences corresponding to each template language sentence into a plurality of query units; a first calculation step of comparing each query unit with a standard query unit and calculating to obtain the similarity of each query unit; A second calculation step of calculating a score of each template language sentence based on a plurality of the similarities; And a sorting step, sorting based on the respective scores of the plurality of template language sentences, and extracting a part of template language sentences from the sorted plurality of template language sentences to serve as the training data.
  8. 8. A data query method for an electronic device, the method comprising: a first acquisition step of acquiring a query request described in natural language; A generating step of generating the query statement set of the current database using the method according to any one of claims 1-5, the query statement set comprising a plurality of current candidate structured query language statements and a plurality of current template language statements, when the current database to be queried is a new database; A ranking step of inputting the query request and the plurality of current template language sentences into the trained ranking model obtained according to the method of any one of claims 6-7, thereby ranking the plurality of current template language sentences; a determining step, namely acquiring target template language sentences from the sequenced multiple current template language sentences, and determining the current candidate structured query language sentences corresponding to the target template language sentences as target structured query language sentences; And a query step, wherein query is performed based on the target structured query language statement so as to query a query result corresponding to the query request.
  9. 9. The method of claim 8, wherein the target template language sentence is a current template language sentence ordered first of the plurality of current template language sentences ordered.
  10. 10. An apparatus for generating a set of query statements for a database, the apparatus comprising: The marking unit extracts a plurality of keywords from a plurality of sample structured query language sentences of a database, and acquires semantic marks of each keyword, wherein the keywords comprise table names, column names and table connection relations in the database; a reorganizing unit, splitting each sample structured query language sentence into a plurality of query units based on grammar rules of a structured query language, and reorganizing the plurality of query units for a plurality of times based on preset reorganizing rules to obtain a plurality of candidate structured query language sentences, wherein the query units have semantics and are minimum composition structures of the structured query language sentences, and the preset reorganizing rules comprise grammar rules of the structured query language, occurrence frequency rules of the query units, sentence word number rules and any combination of the grammar rules; a conversion unit for converting each of the candidate structured query language sentences into a template language sentence based on a plurality of the semantic annotations to obtain a plurality of template language sentences, the template language sentences being natural-language-like sentences, Wherein the plurality of candidate structured query language statements and the plurality of template language statements form the query statement set in one-to-one correspondence.
  11. 11. An apparatus for training a ranking model, the apparatus comprising: an extracting unit that extracts a part of the template language sentence from the plurality of template language sentences obtained by the apparatus according to claim 10 as training data; And the training unit is used for training the sequencing model by using the training data and the standard natural language to obtain the trained sequencing unit.
  12. 12. A data querying device, the device comprising: a first acquisition unit that acquires a query request described in natural language; A generation unit that generates the set of query sentences of the current database using the apparatus according to claim 10, the set of query sentences including a plurality of current candidate structured query language sentences and a plurality of current template language sentences, when the current database to be queried is a new database; A ranking unit that inputs the query request and the plurality of current template language sentences into the trained ranking model obtained by the apparatus according to claim 11, thereby ranking the plurality of current template language sentences; a determining unit, configured to obtain a target template language sentence from the plurality of sequenced current template language sentences, and determine a current candidate structured query language sentence corresponding to the target template language sentence as a target structured query language sentence; And the query unit is used for querying based on the target structured query language statement so as to query a query result corresponding to the query request.
  13. 13. A computer readable storage medium having instructions stored thereon which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 5, or the method of any one of claims 6 to 7, or the method of any one of claims 8 to 9.
  14. 14. An electronic device comprising one or more processors, one or more memories, the one or more memories storing one or more programs that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-6, or the method of any of claims 6-7, or the method of any of claims 8-9.
  15. 15. A computer program product comprising computer executable instructions, characterized in that the instructions are executed by a processor to implement the method of any one of claims 1 to 5, or the method of any one of claims 6 to 7, or the method of any one of claims 8 to 9.

Description

Method for generating query statement set of database and method for training ordering model Technical Field The present application relates to the field of information technology, and more particularly, to a method for generating a query statement set of a database, a method for training a ranking model, a data query method, apparatus, medium, electronic device, and computer program product. Background Currently, data information for various industries is growing explosively, with large amounts of data information stored in structured and semi-structured knowledge bases (e.g., databases). For analysis and retrieval of such data information, the traditional approach is to interoperate with databases using Structured Query Language (SQL), where the use of structured query language requires specialized training by the user and requires a more familiar knowledge of the database schema used. This greatly limits untrained non-technical users, while providing a high threshold for data analysis and use. For non-technical users who are not trained, how to implement natural language interactions with databases is a problem that has attracted extensive attention in industry and academia, as well as a challenging problem. In the face of massive databases, the data that one wants to acquire is no longer limited to simple data in a single table, but data with a special structure and specific semantics. For example, an existing user may want to find out different codes of a used template in an article, and an "article used template" may be considered as data with special semantics, i.e., the data needs to be analyzed and acquired through the relevance of the primary foreign key between tables in the database. One of the core problems with existing models is that data in the database with specific structure and specific semantics is not adequately represented and cannot assist non-technical users in performing interactions with the database. Thus, some data with special structure and specific semantics make the existing model unsuitable. Disclosure of Invention Embodiments of the present application provide a method for generating a set of query sentences of a database, a method for training a ranking model, a data query method, an apparatus, a medium, an electronic device, a computer program product. In a first aspect, an embodiment of the present application provides a method for generating a query statement set of a database, for an electronic device, the method including: A labeling step, extracting a plurality of keywords from a plurality of sample Structured Query Language (SQL) sentences of a database, and acquiring semantic labels of the keywords; A reorganization step, namely splitting each sample SQL sentence into a plurality of query units, and reorganizing the plurality of query units for a plurality of times based on a preset reorganization rule to obtain a plurality of candidate SQL sentences; A conversion step of converting each of the candidate SQL sentences into a template language sentence based on a plurality of the semantic annotations to obtain a plurality of template language sentences, Wherein the plurality of candidate SQL statements and the plurality of template language statements form the set of query statements. In a possible implementation of the first aspect, the template language sentence is a natural language-like sentence. In a possible implementation of the first aspect, each SQL statement is composed of a plurality of query units, the keywords being associated with the query units, Wherein the query unit has semantics and is the smallest constituent structure of an SQL statement. In a possible implementation manner of the first aspect, in the reorganizing step, each sample SQL statement is split into a plurality of query units based on a syntax rule of SQL, and occurrence frequencies of each query unit are counted, The preset reorganization rule comprises a grammar rule of the SQL, an appearance frequency rule of a query unit, a statement word number rule and any combination of the grammar rule and the appearance frequency rule of the query unit. In a possible implementation manner of the first aspect, the rule of occurrence frequency of the query units includes that the number of reorganization times of the query units with high occurrence frequency is larger than the number of reorganization times of the query units with low occurrence frequency. In a possible implementation of the first aspect, the converting step further includes: Splitting each candidate SQL sentence into a plurality of query units; A translation step of translating each query unit of each candidate SQL sentence into the semantic annotation of the associated keyword; combining, namely combining the semantic labels obtained through translation to obtain template language sentences of each candidate SQL sentence. In a possible implementation of the first aspect, the keywords include table names, column names, and tabl