CN-122019628-A - Distributed query method and related equipment
Abstract
The embodiment of the application provides a distributed query method and related equipment. The distributed query method comprises the steps of responding to an SQL task submitted by a submitting end, splitting the SQL task to obtain a plurality of SQL subtasks, performing distributed execution of the SQL subtasks through cluster nodes to obtain corresponding subtask data results, wherein the cluster nodes are nodes of a computer cluster, and aggregating the subtask results to obtain a final query result. According to the technical scheme provided by the embodiment of the application, the SQL query task submitted by the submitting end is intelligently split, and the complex SQL query task is decomposed into a plurality of subtasks which can be processed in parallel, so that the query speed is obviously improved by utilizing the parallel computing capability of the computer cluster. When the sub-tasks are executed, the split SQL sub-tasks are distributed to a plurality of computing nodes to be executed, and a plurality of tasks can be processed simultaneously by a parallel execution mechanism of some cluster frameworks, so that the overall processing time is shortened.
Inventors
- WANG YI
Assignees
- 三六零数字安全科技集团有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20241108
Claims (10)
- 1. A distributed query method, applied to a computer cluster, the distributed query method comprising: responding to an SQL task submitted by a submitting end, splitting the SQL task to obtain a plurality of SQL subtasks; each SQL subtask is executed in a distributed mode through each cluster node to obtain a corresponding subtask data result, wherein the cluster nodes are nodes of the computer cluster; And aggregating the subtask results to obtain a final query result.
- 2. The distributed query method of claim 1, wherein splitting the SQL task to obtain a plurality of SQL subtasks specifically comprises: preprocessing the SQL task into an SQL sentence text; And inputting the SQL sentence text into a task segmentation model to obtain a plurality of SQL subtasks.
- 3. The distributed query method of claim 1, wherein splitting the SQL task to obtain a plurality of SQL subtasks specifically comprises: acquiring task information of the SQL task; If the task information meets a first splitting condition, splitting the SQL task in a first splitting mode to obtain a plurality of SQL subtasks; and if the task information meets a second splitting condition, splitting the SQL task in a second splitting mode to obtain a plurality of SQL subtasks.
- 4. The distributed query method as claimed in claim 3, wherein said task information includes a time range of a requested query, and said splitting said SQL task in a first split mode if said task information satisfies a first split condition to obtain a plurality of SQL subtasks, comprising: If the time range of the request query exceeds a preset time range threshold, a first request condition is met; And splitting the SQL task according to the time range of the request query and by taking a time period as a unit to obtain a plurality of SQL subtasks.
- 5. The distributed query method as claimed in claim 3, wherein the task information includes a length of a filtering condition, and if the task information satisfies a second splitting condition, splitting the SQL task in a second splitting mode to obtain a plurality of SQL subtasks, including: If the length of the screening condition exceeds a preset length threshold, a second request condition is met; And splitting the SQL task according to the length of the screening condition and the number of the cluster nodes to obtain a plurality of SQL subtasks.
- 6. The distributed query method as claimed in claim 1, wherein said executing each of said SQL subtasks in a distributed manner by each cluster node to obtain a corresponding subtask data result, specifically comprises: analyzing the SQL subtasks by using an SQL analyzer to obtain analysis information; and converting the analysis information into executable API call, and calling an API interface to directly acquire data to obtain a corresponding subtask data result.
- 7. The distributed query method of claim 6, wherein said analyzing said SQL subtasks using an SQL parser to obtain parsed information comprises: analyzing the SQL subtasks by using an SQL parser to generate an abstract syntax tree; and extracting query information from the abstract syntax tree to form analysis information.
- 8. A distributed query device, the distributed query device comprising: The task splitting module is used for responding to the SQL task submitted by the submitting end and splitting the SQL task to obtain a plurality of SQL subtasks; the task execution module is used for executing each SQL subtask in a distributed manner through each cluster node to obtain a corresponding subtask data result, wherein the cluster nodes are nodes of the computer cluster; and the result aggregation module is used for aggregating the subtask results to obtain a final query result.
- 9. A computer readable medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the distributed query method of any of claims 1 to 7.
- 10. An electronic device, comprising: one or more processors; storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the distributed query method of any of claims 1 to 7.
Description
Distributed query method and related equipment Technical Field The application relates to the technical field of computers and communication, in particular to a distributed query method and related equipment. Background In the prior art, when a centralized processing system processes large-scale security data, a central system may become a bottleneck, resulting in a longer response time. Especially for long and complex security data analysis tasks, the delay can increase significantly. Meanwhile, in a centralized system, computing resources (e.g., CPU, memory, etc.) may be unevenly distributed. For example, some tasks may occupy excessive resources, while other tasks may become inefficient due to insufficient resources, and such imbalance may further impact overall processing efficiency. Disclosure of Invention The embodiment of the application provides a distributed query method and related equipment, which can further solve the problem of low processing efficiency of a centralized processing system in the prior art at least to a certain extent. Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application. According to one aspect of the embodiment of the application, a distributed query method is provided and applied to a computer cluster, and the distributed query method comprises the steps of responding to an SQL task submitted by a submitting end, splitting the SQL task to obtain a plurality of SQL subtasks, performing distributed execution of the SQL subtasks through cluster nodes to obtain corresponding subtask data results, wherein the cluster nodes are nodes of the computer cluster, and aggregating the subtask results to obtain a final query result. In some embodiments of the application, the SQL task is split to obtain a plurality of SQL subtasks, which concretely comprises preprocessing the SQL task into SQL sentence text, and inputting the SQL sentence text into a task segmentation model to obtain a plurality of SQL subtasks. In some embodiments of the application, the SQL task is split to obtain a plurality of SQL subtasks, which concretely comprises the steps of obtaining task information of the SQL task, splitting the SQL task in a first splitting mode to obtain a plurality of SQL subtasks if the task information meets a first splitting condition, and splitting the SQL task in a second splitting mode to obtain a plurality of SQL subtasks if the task information meets a second splitting condition. In some embodiments of the present application, the task information includes a time range of a request query, and if the task information satisfies a first splitting condition, splitting the SQL task in a first splitting mode to obtain a plurality of SQL subtasks, including, if the time range of the request query exceeds a predetermined time range threshold, satisfying the first request condition; and splitting the SQL task according to the time range of the request query and by taking a time period as a unit to obtain a plurality of SQL subtasks. In some embodiments of the present application, the task information includes a length of a filtering condition, and if the task information satisfies a second splitting condition, splitting the SQL task in a second splitting mode to obtain a plurality of SQL subtasks, where the method specifically includes if the length of the filtering condition exceeds a predetermined length threshold, satisfying a second request condition, and splitting the SQL task according to the length of the filtering condition and the number of the cluster nodes to obtain a plurality of SQL subtasks. In some embodiments of the present application, the distributed execution of each SQL subtask by each cluster node obtains a corresponding subtask data result, and specifically includes analyzing the SQL subtask by using an SQL parser to obtain parsing information, converting the parsing information into executable API call, and calling an API interface to directly obtain data, thereby obtaining a corresponding subtask data result. In some embodiments of the present application, the analyzing the SQL subtasks using the SQL parser to obtain the parsing information includes analyzing the SQL subtasks using the SQL parser to generate an abstract syntax tree, and extracting query information from the abstract syntax tree to form the parsing information. In some embodiments of the present application, the converting the parsing information into executable API call, and calling an API interface to directly obtain data, thereby obtaining a corresponding subtask data result, specifically including constructing an API call request according to the parsing information, where the API call request includes a query parameter, and sending the API call request to a target API interface to obtain data, thereby obtaining a corresponding subtask data result. In some embodiments of the present application,