CN-117149866-B - Method and device for exporting multi-source heterogeneous data, readable storage medium and terminal
Abstract
A multi-source heterogeneous data exporting method and device, a readable storage medium and a terminal comprise the steps of acquiring description information of data to be exported from a data exporting task when the data exporting task is received, enabling the description information of the data to be exported to be used for indicating attribute information of the data to be exported and all data sources from the data exporting task, enabling the attribute information of the data to be exported to be set based on preset metadata, selecting a target execution engine according to all the data sources from which the data to be exported are obtained and current resource loads of an execution engine, converting the data exporting task into an executable task matched with the target execution engine, distributing the executable task to the target execution engine, inquiring the data at the corresponding data sources by the target execution engine, and obtaining and outputting export results based on data inquiring results. According to the scheme, heterogeneous data fusion export of different data sources can be realized, and the data export convenience is improved.
Inventors
- GUO JIAWEI
- SONG XIANGPING
Assignees
- 杭州数云信息技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20230721
Claims (7)
- 1. The method for exporting the multi-source heterogeneous data is characterized by being used for an e-commerce scene, different e-commerce platforms can adopt different data sources for data storage, and the exporting method comprises the following steps: When a data export task is received, descriptive information of data to be exported is obtained from the data export task, wherein the descriptive information of the data to be exported is used for indicating attribute information of the data to be exported and all data sources from the data export task, the attribute information of the data to be exported is set based on preset metadata, the metadata is generated in such a way that storage configuration information of all data sources is obtained and is used for indicating storage definition and structure definition of the data, association information and processing rule information of the attribute information of all data and the storage definition and the structure definition of corresponding data in all data sources are established respectively, the metadata is obtained based on the association information and the processing rule information, the attribute information of the data comprises fields of the data in each data source, table information of the data from all data sources, column information in a data table and association information among multiple tables, and the processing rule information describes which tables are used for processing the cross-joint difference relation among the tables and which fields are extracted from which tables; selecting a target execution engine according to each data source from which the data to be exported come and the current resource load of the execution engine; according to the metadata and the data source from which the data to be exported come, converting the data export task into an executable task adapted to the target execution engine, distributing the executable task to the target execution engine, and carrying out data query on the corresponding data source according to the executable task by the target execution engine; obtaining and outputting an export result based on the data query result returned by the target execution engine; The converting the data export task into an executable task adapted to the target execution engine, comprising: According to the metadata and the data source from which the data to be exported come, determining fields, table information, association information among tables and processing rule information corresponding to each data in the data to be exported; judging whether the processing rule information has a corresponding user-defined function or not; if the processing rule information has a corresponding user-defined function, generating an execution statement which is adapted to the target execution engine and a corresponding data source according to the type of the target execution engine, the user-defined function, fields of each data in the data to be exported, table information and associated information among tables, and taking the execution statement as the executable task; If the processing rule information does not have a corresponding user-defined function, registering the user-defined function corresponding to the processing rule information, and generating an execution statement which is adapted to the target execution engine and a corresponding data source according to the registered user-defined function, the type of the target execution engine, the fields of each data in the data to be exported, table information and the association information among tables, wherein the execution statement is used as the executable task; The data export task further comprises processing rule information, and the export result is obtained based on the data query results returned by the target execution engines, wherein the data query results returned by one or more target execution engines are integrated according to the processing rule information to obtain the export result.
- 2. The method for exporting multi-source heterogeneous data according to claim 1, wherein selecting the target execution engine according to the respective data source from which the data to be exported is derived and the current resource load of the execution engine comprises: And selecting the target execution engine according to the type of each data source from which the data to be exported is derived, the current resource residual quantity of the execution engine and the attribute information of the data to be exported.
- 3. The method of exporting multi-source heterogeneous data of claim 1, further comprising: After the executable task is distributed to the target execution engine, if a data query result returned by the target execution engine is not received, determining a derived task failure type; Pre-estimating retry time length according to the derived task failure type; and after the retry period is reached, re-distributing the executable task to the target execution engine.
- 4. The method of exporting multi-source heterogeneous data of claim 3, wherein said re-distributing the executable task to the target execution engine after the retry period is reached comprises: After the retry duration is reached, reselecting an execution engine as the target execution engine according to the failure type of the export task, each data source from which the data to be exported come and the current resource load of the execution engine; and converting the data export task into an executable task matched with the target execution engine, and distributing the executable task to the target execution engine.
- 5. A multi-source heterogeneous data exporting device, which is used in e-commerce scenes, wherein different e-commerce platforms can use different data sources for data storage, the exporting device comprises: The device comprises an acquisition unit, a data export unit and a storage unit, wherein the acquisition unit is used for acquiring description information of data to be exported from a data export task when receiving the data export task, the description information of the data to be exported is used for indicating attribute information of the data to be exported and all data sources from the data export task, and the attribute information of the data to be exported is set based on preset metadata; the metadata is generated in the following modes of acquiring storage configuration information of each data source, wherein the storage configuration information is used for indicating storage definition and structure definition of the data, establishing association information and processing rule information of attribute information of each data and storage definition and structure definition of corresponding data in each data source respectively, acquiring the metadata based on the association information and the processing rule information, wherein the attribute information of the data comprises fields of the data in each data source, table information of the data from each data source, column information of the data table and association information among multiple tables, and the processing rule information describes which tables are used for processing the fields, cross-correlation relations among the tables and extracting which fields from the tables; The selecting unit is used for selecting a target executing engine according to each data source from which the data to be exported come and the current resource load of the executing engine; The distribution unit is used for converting the data export task into an executable task which is adapted to the target execution engine, distributing the executable task to the target execution engine, and carrying out data query on a corresponding data source according to the executable task by the target execution engine; The output unit is used for obtaining and outputting a derived result based on the data query result returned by the target execution engine; The distribution unit is used for determining fields, table information, association information among tables and processing rule information corresponding to each data in the data to be exported according to the metadata and the data source from which the data to be exported come; judging whether the processing rule information has a corresponding user-defined function or not; if the processing rule information has a corresponding user-defined function, generating an execution statement which is adapted to the target execution engine and a corresponding data source according to the type of the target execution engine, the user-defined function, fields of each data in the data to be exported, table information and associated information among tables, and taking the execution statement as the executable task; if the processing rule information does not have a corresponding user-defined function, registering the user-defined function corresponding to the processing rule information, and generating an execution statement which is adapted to the target execution engine and a corresponding data source according to the registered user-defined function, the type of the target execution engine, the fields of each data in the data to be exported, table information and the association information among tables, wherein the execution statement is used as the executable task; the data export task further comprises processing rule information, and the output unit is used for integrating data query results returned by one or more target execution engines according to the processing rule information to obtain export results.
- 6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the method of exporting multi-source heterogeneous data according to any of claims 1 to 4.
- 7. A terminal comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor, when executing the computer program, performs the steps of the method of exporting multi-source heterogeneous data according to any of claims 1 to 4.
Description
Method and device for exporting multi-source heterogeneous data, readable storage medium and terminal Technical Field The embodiment of the invention relates to the technical field of data processing, in particular to a method and a device for exporting multi-source heterogeneous data, a readable storage medium and a terminal. Background With the continuous development of storage technology, more and more databases with different characteristics are presented. With the development of business and the diversification of databases, single storage often cannot meet the development requirement of business, and multiple storages coexist to be normal. This will cause that. Sometimes the required data may come from different stores, i.e. from different data sources, and may involve the data from different data sources to perform the cross-correlation operation. Because different data sources are mutually independent and the data storage formats are different, data fusion cannot be carried out when data are exported. When data is needed to be exported across data sources in the prior art, data can be generated locally only in each data source, and then data of each data source are fused by a user according to actual requirements to obtain required data, so that a data export mode is complex. Disclosure of Invention The technical problem solved by the embodiment of the invention is that the existing multi-source heterogeneous data export mode is complicated. In order to solve the technical problems, the embodiment of the invention provides a multi-source heterogeneous data export method, which comprises the steps of acquiring description information of data to be exported from a data export task when the data export task is received, wherein the description information of the data to be exported is used for indicating attribute information of the data to be exported and all data sources from the data export task, the attribute information of the data to be exported is set based on preset metadata, selecting a target execution engine according to all the data sources from which the data to be exported is obtained and current resource loads of an execution engine, converting the data export task into an executable task matched with the target execution engine, distributing the executable task to the target execution engine, inquiring data at the corresponding data sources by the target execution engine according to the executable task, obtaining an export result based on a data inquiry result returned by the target execution engine, and outputting the export result. Optionally, selecting the target execution engine according to the data sources from which the data to be exported come and the current resource load of the execution engine comprises selecting the target execution engine according to the type of the data sources from which the data to be exported comes, the current resource residual quantity of the execution engine and the attribute information of the data to be exported. Optionally, the metadata is generated by acquiring storage configuration information of each data source, wherein the storage configuration information is used for indicating storage definition and structure definition of the data, establishing association information and processing rule information of attribute information of each data and storage definition and structure definition of corresponding data in each data source respectively, and obtaining the metadata based on the association information and the processing rule information. Optionally, the attribute information of the data includes a field of the data in each data source, table information of the data from each data source, column information in the data table, and association information between multiple tables. Optionally, the method for exporting the multi-source heterogeneous data further comprises the steps of determining the failure type of the exported task if a data query result returned by the target execution engine is not received after the executable task is distributed to the target execution engine, pre-estimating a retry time according to the failure type of the exported task, and re-distributing the executable task to the target execution engine after the retry time is reached. Optionally, after the retry duration is reached, the executable task is redistributed to the target execution engine, and the method comprises the steps of reselecting the execution engine to be used as the target execution engine according to the failure type of the exported task, each data source from which the data to be exported comes and the current resource load of the execution engine after the retry duration is reached, and reconverting the data exported task to be used as the executable task matched with the target execution engine, and distributing the executable task to the target execution engine. Optionally, the data export task is converted into an executable task adapte