CN-122019594-A - Automatic conversion method and system for query sentences among heterogeneous databases
Abstract
The invention relates to the technical field of big data processing, in particular to an automatic conversion method of query sentences among heterogeneous databases, which comprises the steps of firstly analyzing and reconstructing a logic query request of a user to obtain a logic query object, and carrying out information association and matching by utilizing components of the logic query object so as to quickly find out an optimal physical table and a corresponding target data source engine, thereby overcoming the problem of low response speed caused by direct query in the prior art. Compared with the prior art, the method can automatically select the optimal physical carrier based on semantic analysis, accurately complete data processing of dialect-level grammar conversion and improve query efficiency and universality among heterogeneous databases.
Inventors
- HUA JIANHE
Assignees
- 钛动科技股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260202
Claims (10)
- 1. The automatic query statement conversion method between heterogeneous databases is characterized by comprising the following steps: Responding to the receiving of a logic query request, generating a corresponding abstract syntax tree by adopting a preset parser, and traversing the abstract syntax tree to construct a logic query object, wherein the logic query object comprises a logic topic name, a query column and filtering conditions; Acquiring a plurality of candidate physical table information associated with the logical theme name from preset metadata configuration; Taking the query column and the filtering condition as evaluation items, carrying out matching evaluation on the candidate physical table information, and selecting an optimal physical table and a corresponding target data source engine; The converter is used for converting each component in the logic query object into a grammar fragment conforming to the target data source dialect specification, and generating a physical query instruction by combining the real name of the optimal physical table in a splicing way; executing the physical query instruction through a preset standard database access interface to obtain an original result set; And converting the heterogeneous data types in the original result set into preset unified data types, and packaging the unified data types into standard results to return.
- 2. The method for automatically converting query sentences among heterogeneous databases according to claim 1, wherein performing matching evaluation on the candidate physical table information comprises: Judging whether the candidate physical table corresponding to the candidate physical table information meets the following conditions at the same time: the field set contained in the candidate physical table covers the query column; The time span of the candidate physical table covers the time interval appointed by the filtering condition; And acquiring the pre-counted data line number from the candidate physical table meeting the condition, and selecting the candidate physical table with the minimum data line number as the optimal physical table.
- 3. The method for automatically converting query statements among heterogeneous databases according to claim 2, wherein the optimal physical table is configured with table names of the target data source engines.
- 4. The method for automatically converting query statements among heterogeneous databases according to claim 1, wherein the physical query instruction comprises an objective function name and an objective operator, and the process of converting the logical query object into the physical query instruction by the converter comprises: traversing the logical query object by adopting a visitor mode; extracting a logical column name in the query column, and mapping the logical column name into a physical field name or an alias in the optimal physical table; Replacing the logical theme name with the real table name of the optimal physical table; recursively processing the expression tree in the filter term, identifying a generic function or generic operator, replacing the generic function or generic operator with a corresponding objective function or objective operator in the objective data source dialect specification.
- 5. The method for automatically converting query terms between heterogeneous databases according to claim 4, wherein the objective function comprises a date formatting function, and the process for generating the date formatting function comprises: if the target data source engine is of MySQL type, generating a standard date formatting function instruction and reserving original formatting parameters; And if the target data source engine is of a Presto type, generating a date formatting function instruction in a full lowercase form.
- 6. The method for automatically converting query sentences among heterogeneous databases according to claim 4, wherein said objective function comprises a string concatenation function, said objective operator comprises a string concatenation operator, and the process for generating physical query instructions comprises: If the target data source engine supports a standard splicing function, generating a character string splicing function instruction corresponding to the character string splicing function, wherein the character string splicing function takes commas as separation parameters; And if the target data source engine does not support the standard splicing function, generating a character string splicing operator instruction corresponding to the character string splicing operator, wherein the character string splicing operator takes a double-vertical-line connector as a connection parameter.
- 7. The method for automatically converting query sentences among heterogeneous databases according to claim 1, wherein converting heterogeneous data types in the original result set into preset unified data types comprises: Loading a preset global type mapping table, wherein the mapping table defines the mapping relation between the native type and the unified type of each target data source engine; And traversing the original result set, and converting the heterogeneous data types into preset unified data types according to the mapping table.
- 8. The method for automatically converting query statements among heterogeneous databases according to claim 7, wherein the global type mapping table supports mapped data types including integer type, decimal type, string type, date type and boolean type.
- 9. The method for automatically converting query sentences among heterogeneous databases according to claim 1, wherein said target data source engine comprises at least any one or more of MySQL engine, prest engine and Doris engine.
- 10. An automatic query sentence conversion system between heterogeneous databases, characterized by comprising a processor and a memory, wherein the memory stores computer program instructions, which when executed by the processor, implement the automatic query sentence conversion method between heterogeneous databases as claimed in any one of claims 1-9.
Description
Automatic conversion method and system for query sentences among heterogeneous databases Technical Field The invention relates to the technical field of big data processing. More particularly, the invention relates to a method and a system for automatically converting query sentences among heterogeneous databases. Background With the development of big data technology, digital transformation is becoming a necessary way for enterprises to develop. To facilitate data management, businesses typically employ a hybrid architecture to store data, e.g., real-time detail data using MySQL, and historical summary data or user/customer portraits using prest or Doris. In practical applications, the upper-layer business analysis generally uses unified logic SQL to initiate the query, and the following technical defects exist in the way: first, lacking intelligent routing capabilities, traditional middleware often passes queries directly through to the underlying physical table. When the data volume is huge (for example, hundreds of millions of levels of detail tables), the speed of directly querying the detail tables and returning the response is extremely slow. While pre-computed materialized views may be configured in a database, the prior art fails to automatically make optimal choices between "detail table" or "materialized view" depending on the time range or granularity of the query. Secondly, the grammar conversion across databases is hard, the processing difference of different databases on functions (such as DATE_FORMAT and date_parameter) and character string splicing is huge, the scheme is realized by adopting simple regular replacement, the function signature difference of complex nested logic or specific dialect cannot be processed, and the generated physical SQL execution error is easy to cause. Therefore, how to automatically select the optimal physical carrier based on semantic analysis and accurately complete the data processing of dialect-level grammar conversion so as to improve the query efficiency and universality between heterogeneous databases is a technical problem to be solved in the prior art. Disclosure of Invention In order to solve the technical problems, the invention discloses a method and a system for automatically converting query sentences among heterogeneous databases. In a first aspect, the invention discloses a method for automatically converting query sentences among heterogeneous databases, which comprises the following steps: Responding to the receiving of the logic query request, adopting a preset parser to generate a corresponding abstract syntax tree, and traversing the abstract syntax tree to construct a logic query object, wherein the logic query object comprises a logic topic name, a query column and filtering conditions; acquiring a plurality of candidate physical table information associated with the logical theme name from preset metadata configuration; Taking the query column and the filtering condition as evaluation items, carrying out matching evaluation on candidate physical table information, and selecting an optimal physical table and a corresponding target data source engine; the method comprises the steps of calling a converter which is matched with a target data source engine, wherein the converter is used for converting each component in a logic query object into a grammar fragment which accords with target data source dialect specification, combining the real names of an optimal physical table, and splicing to generate a physical query instruction; Executing a physical query instruction through a preset standard database access interface to obtain an original result set; And converting the heterogeneous data types in the original result set into preset unified data types, and packaging the unified data types into standard results to return. Preferably, the matching evaluation of the candidate physical table information includes: Judging whether the candidate physical table corresponding to the candidate physical table information meets the following conditions at the same time: the field set contained in the candidate physical table covers the query column; The time span of the candidate physical table covers the time interval specified by the filtering condition; and acquiring the pre-counted data line number from the candidate physical table meeting the condition, and selecting the candidate physical table with the minimum data line number as the optimal physical table. Preferably, the optimal physical table is configured with the table name of the target data source engine. Preferably, the physical query instruction includes a target function name and a target operator, and the process of converting the logical query object into the physical query instruction by the converter includes: Traversing the logical query object by adopting a visitor mode; Extracting a logical column name in the query column, and mapping the logical column name into a physical field