Search

CN-122019568-A - Data query method and device based on natural language

CN122019568ACN 122019568 ACN122019568 ACN 122019568ACN-122019568-A

Abstract

The application discloses a data query method and device based on natural language. The method comprises the steps of receiving natural language input of a user, conducting semantic analysis on the natural language input, determining user intention and a plurality of semantic elements, determining target query operators corresponding to the semantic elements respectively in a query operator sequence based on the user intention, enabling the query operator sequence to comprise a plurality of query operators arranged according to an execution sequence, generating a query plan based on the target query operators and the corresponding semantic elements and the execution sequence of the target query operators, generating a query statement based on a target database object identifier corresponding to the query plan and the semantic elements respectively, wherein the target database object identifier comprises at least one of a table name and a field name, and conducting data query based on the query statement. According to the embodiment of the application, the accuracy and the interpretability of the query statement can be improved, and the accuracy and the reliability of the query result are further improved.

Inventors

  • NIU YUANYUAN
  • WANG YINGZHUO
  • ZOU YONG
  • CHEN ZHONGYAN
  • QI WEIZHENG
  • PAN JUN

Assignees

  • 中国银联股份有限公司

Dates

Publication Date
20260512
Application Date
20260126

Claims (20)

  1. 1. A data query method based on natural language, comprising: Receiving natural language input of a user; carrying out semantic analysis on the natural language input, and determining user intention and a plurality of semantic elements; Determining target query operators respectively corresponding to the plurality of semantic elements in a query operator sequence based on the user intention, wherein the query operator sequence comprises a plurality of query operators arranged according to an execution sequence; generating a query plan based on the target query operator and the corresponding semantic elements thereof and the execution sequence of a plurality of target query operators; Generating a query statement based on the query plan and target database object identifiers respectively corresponding to the plurality of semantic elements, wherein the target database object identifiers comprise at least one of table names and field names; And carrying out data query based on the query statement.
  2. 2. The method of claim 1, wherein prior to generating a query statement based on the query plan and the target database object identifiers to which the plurality of semantic elements correspond, respectively, the method further comprises: Based on a mixed retrieval mode, determining candidate database object identifiers corresponding to the plurality of semantic elements in an original database object identifier set to obtain candidate sets corresponding to the plurality of semantic elements; For each of the semantic elements, performing the following operations: Determining confidence degrees respectively corresponding to a plurality of candidate database object identifiers in the candidate set based on the database mode indexes corresponding to the candidate set through a reordering model; and determining the candidate data object identifiers corresponding to the confidence degrees meeting the confidence degree conditions as the target database object identifiers.
  3. 3. The method of claim 2, wherein determining as the target database object identifier the candidate data object identifier corresponding to the confidence that satisfies the confidence condition comprises: And outputting a database object identifier confirmation request when the difference between the maximum confidence and the next-largest confidence is smaller than a difference threshold value in the plurality of the confidences, wherein the database object identifier confirmation request is used for requesting a user to determine the target database object identifier in the database object identifiers corresponding to the maximum confidence and the database object identifiers corresponding to the next-largest confidence.
  4. 4. The method according to claim 2, wherein the set of original database object identifiers includes a plurality of original database object identifiers, and the determining candidate database object identifiers respectively corresponding to the plurality of semantic elements in the set of original database object identifiers based on the hybrid search method includes: For each of the semantic elements, performing the following operations: determining first similarity between the semantic elements and each original database object identifier in a word face matching mode; determining a second similarity between the semantic element and each original database object identifier in a vector matching mode; determining a third similarity between the semantic element and each original database object identifier based on a database mode index corresponding to each original database object identifier through the reordering model; carrying out weighted summation on the first similarity, the second similarity and the third similarity to obtain a retrieval score of each original database object identifier; And determining the original database object identifiers corresponding to the search scores meeting the search score conditions as the candidate database object identifiers.
  5. 5. The method of claim 2, wherein generating a query statement based on the query plan and the target database object identifiers respectively corresponding to the plurality of semantic elements comprises: Performing logic structure consistency check based on the query plan and the target database object identifier to obtain a first check result, wherein the logic structure consistency check comprises at least one of connection reachability check, data object existence check, data type consistency check and aggregation dimension consistency check; And generating a query statement based on the query plan and the target database object identifiers respectively corresponding to the plurality of semantic elements when the first check result is that the check is passed.
  6. 6. The method of claim 5, wherein after obtaining the first verification result, the method further comprises: if the first verification result is verification failure and the failure reason is that connection is unreachable or a data object does not exist, the target database object identifier is redetermined, the query plan and the target database object identifier are executed in a returning mode, and logic structure consistency verification is conducted, so that the first verification result is obtained until the first verification result is verification passing; And under the condition that the first checking result is failed in checking, and the failure cause is inconsistent in data type or inconsistent in aggregation dimension, regenerating the query plan, and returning to execute the logical structure consistency checking based on the query plan and the target database object identifier to obtain the first checking result until the first checking result is checked.
  7. 7. The method of claim 1, wherein the performing a data query based on the query statement comprises: Matching a plurality of query elements in the query statement with the plurality of semantic elements to obtain a matching result; and under the condition that the matching result is successful, carrying out data query based on the query statement.
  8. 8. The method of claim 7, wherein after obtaining the matching result, the method further comprises: And under the condition that the matching result is that the matching is failed, returning to execute the semantic analysis on the natural language input, and determining the user intention and a plurality of semantic elements until the matching result is that the matching is successful.
  9. 9. The method of claim 1, wherein the performing a data query based on the query statement comprises: performing two-channel introspection verification on the query statement to obtain a second verification result, wherein the two-channel introspection verification comprises formalized rule verification and semantic consistency verification; Performing performance evaluation before execution on the query statement under the condition that the second check result is that the check is passed, so as to obtain a performance evaluation result; and under the condition that the performance evaluation result characterizes that the query statement meets the execution condition, carrying out data query based on the query statement.
  10. 10. The method of claim 9, wherein performing a two-channel introspection check on the query statement to obtain a second check result comprises: Carrying out formal rule verification on the query statement to obtain a third verification result, wherein the formal rule verification comprises at least one of key filtering condition missing verification, aggregation function rationality verification, window function integrity verification and table connection reachability verification; Performing semantic consistency verification based on semantic similarity between the query statement and the natural language input to obtain a fourth verification result; and under the condition that the third check result and the fourth check result are both check passing, determining that the second check result is check passing.
  11. 11. The method of claim 10, wherein after obtaining the second test result, the method further comprises: under the condition that the third verification result is verification failure, determining a first verification type which causes verification failure; determining a first query optimization strategy based on the first check type; Reconstructing the query plan based on the first query optimization strategy to obtain a first optimized query plan; regenerating the query statement based on the first optimized query plan, and returning to execute the double-channel introspection check on the query statement until the third check result is a check pass; And under the condition that the fourth checking result is failed in checking, returning to execute the semantic analysis on the natural language input, and determining the user intention and a plurality of semantic elements until the fourth checking result is passed in checking.
  12. 12. The method of claim 9, wherein after obtaining the performance evaluation result, the method further comprises: determining a performance bottleneck operation based on the performance evaluation result under the condition that the performance evaluation result represents that the query statement does not meet the execution condition; determining a second query optimization strategy based on the performance bottleneck operation; Reconstructing the query plan based on the second query optimization strategy to obtain a second optimized query plan; And regenerating the query statement based on the second optimized query plan, and returning to execute the double-channel introspection check on the query statement until the performance evaluation result indicates that the query statement meets the execution condition.
  13. 13. The method of claim 1, wherein said semantically parsing the natural language input to determine a user intent and a plurality of semantic elements comprises: carrying out semantic analysis on the natural language input to determine user intention and query complexity; under the condition that the query complexity characterizes the natural language input as a complex query, the natural language input is disassembled into a plurality of sub-query tasks based on the user intention of the natural language input; and carrying out semantic analysis on each sub-query task, and determining the user intention and a plurality of semantic elements of each sub-query task.
  14. 14. The method of claim 13, wherein said semantically parsing the natural language input to determine user intent and query complexity comprises: carrying out semantic analysis on the natural language input to obtain thinking chain data representing the intention of a user; Determining target description text related to query complexity in the thought chain data; Determining a complexity tag based on the target descriptive text and the user intent of the mental chain data characterization; and determining the query complexity based on the complexity label.
  15. 15. The method of claim 13, wherein generating a query statement based on the query plan and the target database object identifiers respectively corresponding to the plurality of semantic elements comprises: For each sub-query task, generating a sub-query statement corresponding to each sub-query task based on the query plan and target database object identifiers respectively corresponding to the plurality of semantic elements; And generating the query statement based on the dependency relationship among the sub-query tasks and the sub-query statements.
  16. 16. The method of claim 15, wherein the generating the query statement based on the dependencies among the plurality of sub-query tasks and the plurality of sub-query statements comprises: Performing global consistency check on a plurality of sub-query sentences based on the dependency relationship among the plurality of sub-query tasks to obtain a fifth check result, wherein the global consistency check comprises at least one of time range alignment check, service definition consistency check and data granularity consistency check and data flow reachability check; And generating the query statement based on the dependency relationship among the sub-query tasks and the sub-query statements under the condition that the fifth check result is that the check is passed.
  17. 17. The method of claim 16, wherein after obtaining the fifth verification result, the method further comprises: under the condition that the fifth checking result is checking failure, determining a second checking type and a first sub-query task which lead to checking failure; determining a target level for correcting the first sub-query task based on the second check type, wherein the target level is one of a plurality of processing levels involved in the process of generating a sub-query statement corresponding to the first sub-query task; correcting the output of the first sub-query task at the target level to obtain a correction result; And regenerating sub-query sentences corresponding to the first sub-query task based on the correction result, and returning to execute the dependency relationship among the sub-query tasks, and performing global consistency check on the sub-query sentences until the fifth check result is passed.
  18. 18. The method of any one of claims 1-17, wherein after the querying of the data based on the query statement, the method further comprises: Displaying a query result corresponding to the query statement, wherein the query result comprises at least one interactable element corresponding to target data, and the target data is query data corresponding to variable query conditions; Receiving a first input of a user selecting a target interactable element from at least one interactable element; responding to the first input, and displaying a target interface corresponding to the target interactable element; Receiving a second input of modifying a variable query condition corresponding to the target interactable element from first query data to second query data in the target interface by a user; Updating the query statement in response to the second input; And carrying out data query based on the updated query statement.
  19. 19. The method of claim 18, wherein before the presenting the query result corresponding to the query statement, the method further comprises: Determining a plurality of initial variable query conditions based on an abstract syntax tree of the query statement; Performing value evaluation on the plurality of initial variable query conditions based on the semantic criticality and the interactive value of the initial variable query conditions to obtain a value evaluation result; at least one of the variable query conditions is determined at the plurality of initial variable query conditions based on the value assessment results.
  20. 20. The method of claim 18, wherein the performing a data query based on the updated query statement comprises: performing double-channel introspection verification on the updated query statement to obtain a sixth verification result; if the sixth verification result is verification failure, determining a third verification type which causes verification failure; Determining a third query optimization strategy based on the third check type; reconstructing the query plan based on the third query optimization strategy to obtain a third optimized query plan; regenerating the query statement based on the third optimized query plan; and carrying out data query based on the regenerated query statement.

Description

Data query method and device based on natural language Technical Field The application belongs to the technical field of data query, and particularly relates to a data query method and device based on natural language. Background With the breakthrough of large language models (Large Language Model, LLM) in the fields of natural language understanding and structured data generation, natural language query databases have become an important technical direction for improving the level of intellectualization of enterprise data analysis. In the current practice of natural language database question-answering, the natural language can be converted into query sentences through combining the large model with the prompt words, the knowledge base and the preset question-answering pairs, and the data query results can be obtained through executing the query sentences, so that a user can directly query the database through the natural language to obtain the data query results. However, in actual complex business scenarios, especially when facing the situations of multi-table association, complex business logic and the like, the query statement generated by the method is easy to generate problems of semantic deviation, logic confusion and the like, so that the accuracy of the query statement is low, and the accuracy and reliability of the query result are further directly affected. Disclosure of Invention The embodiment of the application provides a data query method, a device, electronic equipment, a computer readable storage medium and a computer program product based on natural language, which can improve the accuracy and the interpretability of query sentences and further improve the accuracy and the reliability of query results. In a first aspect, an embodiment of the present application provides a data query method based on natural language, where the method includes: Receiving natural language input of a user; carrying out semantic analysis on the natural language input, and determining user intention and a plurality of semantic elements; Determining target query operators respectively corresponding to the plurality of semantic elements in a query operator sequence based on the user intention, wherein the query operator sequence comprises a plurality of query operators arranged according to an execution sequence; generating a query plan based on the target query operator and the corresponding semantic elements thereof and the execution sequence of a plurality of target query operators; Generating a query statement based on the query plan and target database object identifiers respectively corresponding to the plurality of semantic elements, wherein the target database object identifiers comprise at least one of table names and field names; And carrying out data query based on the query statement. In a second aspect, an embodiment of the present application provides a data query device based on natural language, where the device includes: The receiving module is used for receiving natural language input of a user; the analysis module is used for carrying out semantic analysis on the natural language input and determining user intention and a plurality of semantic elements; a determining module, configured to determine, based on the user intention, a target query operator corresponding to each of the plurality of semantic elements in a query operator sequence, where the query operator sequence includes a plurality of query operators arranged according to an execution order; The generation module is used for generating a query plan based on the target query operator and the corresponding semantic elements thereof and the execution sequence of a plurality of target query operators; The generating module is further configured to generate a query statement based on the query plan and target database object identifiers corresponding to the plurality of semantic elements respectively, where the target database object identifiers include at least one of table names and field names; And the query module is used for carrying out data query based on the query statement. In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing computer program instructions; The processor, when executing the computer program instructions, implements the method of any one of the possible implementation methods of the first aspect. In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method according to any one of the possible implementation methods of the first aspect. In a fifth aspect, embodiments of the present application provide a computer program product, instructions in which, when executed by a processor of an electronic device, cause the electronic device to perform a method as in any of the possible implementati