Search

CN-122019565-A - Data query statement generation method, device, equipment, medium and program product

CN122019565ACN 122019565 ACN122019565 ACN 122019565ACN-122019565-A

Abstract

The data query statement generation method comprises the steps of firstly determining a query problem and predicted result description information corresponding to the query problem based on a received natural language query request, wherein the predicted result description information is used for representing query characteristics which are contained in predicted query results aiming at the query problem, then determining target reference information matched with the query problem in a preset knowledge base based on the predicted result description information, wherein the knowledge base contains at least one of metadata information describing a target database structure and historical data query records, and finally generating a target data query statement based on the query problem and the target reference information. The method and the device can effectively improve the accuracy of data query statement generation.

Inventors

  • LI ZHUOHAO
  • WANG JINJIE
  • LIN AJUN
  • HAN AIDONG
  • LIAN ZONGMIN
  • JIANG HONGXIANG
  • QU XIANGDONG
  • ZHU JUNHUA

Assignees

  • 杭州网易智企科技有限公司

Dates

Publication Date
20260512
Application Date
20251229

Claims (10)

  1. 1. A method for generating a data query statement, the method comprising: Determining a query problem and predicted result description information corresponding to the query problem based on a received natural language query request, wherein the predicted result description information is used for representing query characteristics contained in predicted query results aiming at the query problem; determining target reference information matched with the query problem in a preset knowledge base based on the predicted result description information, wherein the preset knowledge base comprises at least one of metadata information describing a target database structure and historical data query records; And generating a target data query statement based on the query question and the target reference information.
  2. 2. The method according to claim 1, wherein determining a query question and predicted outcome description information corresponding to the query question based on the received natural language query request comprises: performing semantic disambiguation on the natural language query request to generate a standardized query problem conforming to a preset specification; Inputting the standardized query problem to a preset large language model for hypothesis reasoning to obtain hypothesis replies, and determining the hypothesis replies as the predicted result description information, wherein the hypothesis replies comprise predicted key indexes, predicted result characteristics and predicted field attributes aiming at the standardized query problem.
  3. 3. The method of claim 2, wherein said performing semantic disambiguation on said natural language query request comprises: judging whether the natural language query request meets preset rewrite conditions or not, wherein the preset rewrite conditions comprise at least one of fuzzy expression, semantic ambiguity, lack of key screening conditions and non-compliance with business semantic specifications; And if the preset rewrite condition is met, rewriting the natural language query request based on a preset problem template to generate the standardized query problem, wherein the data structure of the standardized query problem comprises a requirement target description, an output field list and a screening condition field.
  4. 4. The method of claim 1, wherein determining target reference information matching the query question in a preset knowledge base comprises: based on the prediction result description information, vector semantic matching and keyword accurate matching are carried out in the knowledge base; and based on a preset weight strategy, carrying out weighted sorting on the semantic matching result and the keyword matching result so as to determine the target reference information.
  5. 5. The method according to claim 1, wherein the step of constructing the historical data query record in the preset knowledge base includes: The method comprises the steps of obtaining a target data table, wherein the target data table is a data table with the read times exceeding the preset times in a target database; Determining historical query sentences read from the target data table based on the target data table; analyzing the historical query statement, and extracting a query fragment and a use scene description corresponding to the query fragment; And storing the query fragment and the usage scenario description into the knowledge base as the historical data query record.
  6. 6. The method according to claim 1, wherein the step of constructing metadata information in the preset knowledge base includes: Acquiring original table structure information and field attribute information from a heterogeneous data source; converting the original table structure information and field attribute information into unified intermediate format data; And acquiring service semantic information, establishing an association mapping between the intermediate format data and the service semantic information, and generating metadata information, wherein the service semantic information comprises at least one of service index definition, data labels and theme zone division.
  7. 7. A data query statement generation apparatus, the apparatus comprising: The system comprises a receiving module, a query module and a query module, wherein the receiving module is used for determining a query problem and predicted result description information corresponding to the query problem based on a received natural language query request, and the predicted result description information is used for representing query characteristics contained in predicted query results aiming at the query problem; The searching module is used for determining target reference information matched with the query problem in a preset knowledge base based on the prediction result description information, wherein the preset knowledge base comprises at least one of metadata information describing a target database structure and historical data query records; and the generation module is used for generating a target data query statement based on the query problem and the target reference information.
  8. 8. An electronic device, comprising: A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the data query statement generation method of any one of claims 1 to 6.
  9. 9. A computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the data query statement generation method of any one of claims 1 to 6.
  10. 10. A computer program product comprising computer instructions for causing a computer to perform the data query statement generation method of any one of claims 1 to 6.

Description

Data query statement generation method, device, equipment, medium and program product Technical Field The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for generating a data query statement. Background With the popularization of big data technology, the direct generation of structured query sentences by using natural language has become a key technology for reducing the data use threshold. Existing natural language-to-query sentence techniques, while attempting to directly convert a user's natural language into a query sentence, often rely on direct parsing of the user's original questions or simple keyword matching. Because natural language expression is usually ambiguous, and a huge semantic gap exists between a user problem and a physical structure of an underlying database or historical query logic, a direct conversion mode is difficult to accurately capture the real intention of a user, so that a generated query statement often has field mapping errors or logical deviations. Therefore, a method for generating data query sentences is needed to solve the problem of low data query sentence generation accuracy in the related art. Disclosure of Invention The disclosure provides a data query statement generation method, a device, equipment, a medium and a program product, which are used for solving the problem of low data query statement generation accuracy in the related art. In a first aspect, the present disclosure provides a method for generating a data query statement, the method comprising: Determining a query problem and predicted result description information corresponding to the query problem based on the received natural language query request, wherein the predicted result description information is used for representing query characteristics contained in predicted query results aiming at the query problem; Determining target reference information matched with the query problem in a preset knowledge base based on the predicted result description information, wherein the preset knowledge base comprises at least one of metadata information describing a target database structure and historical data query records; And generating a target data query statement based on the query question and the target reference information. According to the data query statement generation method, the predicted result description information is determined based on the natural language query request, and the target reference information is determined in the preset knowledge base based on the predicted result description information, so that the target data query statement is finally generated. The method uses the description information of the predicted result (i.e. the prejudgment of the query result) as an intermediate medium, can effectively make up the semantic difference between the natural language question of the user and the bottom storage structure of the database, and simultaneously, can fully utilize the existing data asset and history experience by combining the knowledge base containing metadata and history query records, thereby remarkably improving the accuracy and usability of the generated data query statement. In an alternative embodiment, determining a query question and predicted outcome description information corresponding to the query question based on a received natural language query request includes: Semantic disambiguation processing is carried out on the natural language query request, and standardized query problems conforming to preset specifications are generated; The standardized query problem is input into a preset large language model for hypothesis reasoning, a hypothesis reply is obtained, the hypothesis reply is determined to be the predicted result description information, and the hypothesis reply comprises a predicted key index, predicted result characteristics and predicted field attributes aiming at the standardized query problem. The method and the device also convert the fuzzy user request into the canonical expression, and explicitly generate potential database related terms through hypothesis reasoning, so that semantic association between natural language and knowledge in the database field is established before the knowledge base is retrieved, and the accuracy of subsequent knowledge recall is greatly improved. In an alternative embodiment, the semantic disambiguation of the natural language query request includes: judging whether the natural language query request meets preset rewrite conditions or not, wherein the preset rewrite conditions comprise at least one of fuzzy expression, semantic ambiguity, lack of key screening conditions and non-conformity with business semantic specifications; If the preset rewrite condition is met, the natural language query request is rewritten based on the preset problem template to generate a standardized query problem, where