CN-116257567-B - Clickhouse-based data analysis method, clickhouse-based data analysis device and Clickhouse-based data analysis system
Abstract
The invention discloses a Data analysis method, a device and a system based on Clickhouse, which belong to the field of Data analysis, data containing a node side relation list and a node list are obtained according to selection and operation selection of a Data set, a ClickHouse core analyzer is adopted to conduct node relation analysis based on the node side relation list and the node list in the Data to obtain a directed acyclic graph, SQL assembly is conducted on the directed acyclic graph to obtain target SQL, and finally the target SQL is automatically executed to obtain target Data. When ClickHouse is selected as a bottom layer (OLAP) to develop Data analysis service, the Data throughput capacity is high, the directed acyclic graph is obtained according to the node side relation list and the node list in the Data, then the directed acyclic graph is automatically generated ClickHouse to be executable target SQL, no SQL is written by Data analysis personnel, the requirement on service analysis personnel is greatly reduced, the Data analysis use cost is low, only the Data set is selected by a user and the operation of selecting the Data set is needed, and the analysis mode is simple.
Inventors
- LUO CHAOXIN
- WU XIAOQIAN
Assignees
- 北京滴普科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20221229
Claims (8)
- 1. A Clickhouse-based data analysis method, comprising the steps of: receiving a selection of a Data set by a user and an operation selection of the Data set to obtain Data, wherein the Data comprises a node edge relation list and a node list; Carrying out node relation analysis based on a node side relation list and a node list in the Data by adopting ClickHouse core analyzers to obtain a directed acyclic graph, wherein the directed acyclic graph comprises the steps of obtaining the sequence of each node according to the node side relation list and the node list; The method comprises the steps of combining ClickHouse SQL specifications and functions to perform SQL assembly on the directed acyclic graph to obtain target SQL, wherein fixed SQL templates are preset for each operation according to ClickHouse SQL specifications and the functions, and are sentences which can be executed at ClickHouse; And executing the target SQL to obtain target data for data analysis.
- 2. The method of claim 1, wherein receiving user selections of Data sets and operational selections of the Data sets to obtain Data comprises: taking the operation and the data set corresponding to the operation as nodes to obtain a node list; Setting edges between two nodes corresponding to any two association operations to obtain a node edge relation list; and taking the node list and the node edge relation list as the Data.
- 3. The method of claim 1, wherein the obtaining the target SQL from the SQL statement of each node and the directed acyclic graph comprises: and traversing the directed acyclic graph by adopting a depth traversing algorithm, and simultaneously storing and analyzing nodes and edges of the directed acyclic graph by adopting a stack and a HashMap to obtain the target SQL.
- 4. The method of claim 3, wherein the traversing the directed acyclic graph using the depth traversal algorithm while storing and parsing nodes and edges of the directed acyclic graph using stacks and HashMap to obtain the target SQL comprises: Generating a stack for storing nodes traversed by the graph, a HashMap for storing association relations analyzed by edge traversal and an LIST for temporarily storing data popped by the nodes according to the directed acyclic graph; Stopping graph traversal when encountering a JOIN NODE, marking, performing a pop operation on the NODE from a stack, putting the NODE into the LIST, and stopping pop when the stack is empty or encounters a SUB SELECT SUB query to obtain a branch of the directed acyclic graph; traversing the LIST from back to front, and combining the SQL statement of each node in the branch to obtain the SQL statement of one branch; Recursively acquiring the last node of the marked JOIN node from the HashMap until no associated key is obtained, and obtaining the starting node of the adjacent branch; Starting the depth traversal of the new branch until traversing to the currently marked JOIN node, and generating an SQL sentence of the new branch; Connecting SQL sentences of all branches associated with the JOIN NODE by using JOIN; And (5) starting from the marked JOIN NODE, deeply traversing the subsequent NODEs until the whole graph is traversed, and obtaining the target SQL.
- 5. The method of claim 4, wherein the generating a HashMap for preserving association of graph traversal and a stack for preserving edge traversal analysis according to the directed acyclic graph comprises: And stacking the nodes according to the traversing sequence, and storing node Ids of the edge analysis FROM and END nodes as key-value of the HashMap.
- 6. The method of claim 1, wherein the operation selection comprises association query, filtering, group by aggregation and ranking.
- 7. A Clickhouse-based data analysis device, comprising: The Data acquisition module is used for receiving the selection of a Data set by a user and the operation selection of the Data set so as to obtain Data, wherein the Data comprises a node edge relation list and a node list; The node relation analysis module is used for carrying out node relation analysis based on a node side relation list and a node list in the Data by adopting a ClickHouse core analyzer to obtain a directed acyclic graph, and is particularly used for acquiring the sequence of each node according to the node side relation list and the node list; The target SQL acquisition module is used for carrying out SQL assembly on the directed acyclic graph by combining ClickHouse SQL specifications and functions to obtain target SQL, and is specifically used for setting a fixed SQL template for each operation in advance according to ClickHouse SQL specifications and functions, wherein the SQL template is a statement which can be executed at ClickHouse; and the target data acquisition module is used for executing the target SQL to obtain target data for data analysis.
- 8. A Clickhouse-based data analysis system, comprising: A processor; A memory for storing the processor-executable instructions; The processor is configured to perform the method of any of claims 1-6.
Description
Clickhouse-based data analysis method, clickhouse-based data analysis device and Clickhouse-based data analysis system Technical Field The present invention relates to the field of data analysis, and in particular, to a Clickhouse-based data analysis method, apparatus, and system. Background In the field of big data analysis, traditional big data analysis requires different frameworks and technical combinations to achieve the final effect, and big data analysis becomes expensive in terms of manpower cost, technical capacity and hardware cost, and maintenance cost. Because in some other systems different columns may be stored separately, it is not possible to process the analysis query efficiently due to the optimization performed on other scenarios. For example, HBase, bigTable, cassandra, hyperTable. In these systems hundreds of thousands of throughput per second are available, but in some cases, users require hundreds of millions of lines per second throughput, so existing data analysis speeds are not satisfactory for users. In addition, the data analysis is mainly aimed at service analysis staff, the existing data analysis requires the service analysis staff to manually write SQL sentences, and the requirements on the service analysis staff are high due to high writing complexity, so that the existing data analysis has high use cost and complex analysis mode. Disclosure of Invention In order to overcome the defects of the prior art, the invention provides a data analysis method, device and system based on Clickhouse, which are used for solving the problems that the existing data analysis speed cannot meet the requirements of users and the requirements on service analysis personnel are high, so that the existing data analysis is high in use cost and complex in analysis mode. The technical scheme adopted for solving the technical problems is as follows: in a first aspect, a data analysis method based on Clickhouse is provided, including the steps of: receiving a selection of a Data set by a user and an operation selection of the Data set to obtain Data, wherein the Data comprises a node edge relation list and a node list; Adopting ClickHouse core resolvers to analyze node relations based on the node edge relation list and the node list in the Data to obtain a directed acyclic graph; SQL assembling is carried out on the directed acyclic graph by combining ClickHouse SQL specifications and functions to obtain target SQL; And executing the target SQL to obtain target data for data analysis. Further, the receiving the user selection of the Data set and the operation selection of the Data set to obtain Data includes: taking the operation and the data set corresponding to the operation as nodes to obtain a node list; Setting edges between two nodes corresponding to any two association operations to obtain a node edge relation list; and taking the node list and the node relation list as the Data. Further, the adopting ClickHouse core parser to parse the node relation based on the node edge relation list and the node list in the Data to obtain the directed acyclic graph includes: Acquiring the sequence of each node according to the node edge relation list and the node list; and sequentially connecting the nodes with edges according to the sequence to obtain the directed acyclic graph. Further, the combining ClickHouse SQL specification and the function perform SQL assembly on the directed acyclic graph to obtain a target SQL, which includes: Setting a fixed SQL template for each operation in advance according to ClickHouse SQL specifications and functions, wherein the SQL template is a statement capable of being executed at ClickHouse; Establishing association with the SQL template according to a data set corresponding to the operation in each node in the directed acyclic graph to obtain an SQL statement of each node; and obtaining target SQL according to the SQL statement of each node and the directed acyclic graph. Further, the obtaining the target SQL according to the SQL statement of each node and the directed acyclic graph includes: and traversing the directed acyclic graph by adopting a depth traversing algorithm, and simultaneously storing and analyzing nodes and edges of the directed acyclic graph by adopting a stack and a HashMap to obtain the target SQL. Further, the performing graph traversal on the directed acyclic graph by using a depth traversal algorithm, and simultaneously storing and resolving nodes and edges of the directed acyclic graph by using a stack and HashMap to obtain a target SQL, including: Generating a stack for storing nodes traversed by the graph, a HashMap for storing association relations analyzed by edge traversal and an LIST for temporarily storing data popped by the nodes according to the directed acyclic graph; Stopping graph traversal when encountering a JOIN NODE, marking, performing a pop operation on the NODE from a stack, putting the NODE into the LIST, and stopping pop wh