CN-121979916-A - Data processing method, apparatus and computer storage medium

CN121979916ACN 121979916 ACN121979916 ACN 121979916ACN-121979916-A

Abstract

The application provides a data processing method, data processing equipment and a computer storage medium. The data processing method comprises the steps of obtaining an identification file comprising at least part of tasks which are processed in a history mode, wherein an intermediate calculation result corresponding to each task in the identification file is stored in a target memory, determining a first target task which is not recorded in the identification file and a second target task which is recorded in the identification file in a plurality of tasks to be processed and corresponds to the current processing stage of a query flow, executing each first target task to obtain the intermediate calculation result corresponding to the first target task, and determining the intermediate calculation result corresponding to the second target task according to the intermediate calculation result stored in the target memory.

Inventors

ZHAN WENPING
QIAN HAODONG
HU RONGBAO
SHU FAN
LI ZHENGZHENG
ZHAO PENGFEI
ZHANG DONGLIN
ZHANG XIAOLONG

Assignees

浙江大华技术股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. A data processing method, characterized in that the data processing method comprises: Acquiring an identification file comprising at least part of the tasks processed by the history, wherein an intermediate calculation result corresponding to each task in the identification file is stored in a target memory; Determining a first target task which is not recorded in the identification file and a second target task which is recorded in the identification file in a plurality of tasks to be processed corresponding to the current processing stage of the query flow; executing each first target task to obtain an intermediate calculation result corresponding to the first target task; and determining an intermediate calculation result corresponding to the second target task according to the intermediate calculation result stored in the target memory.
2. A data processing method according to claim 1, wherein, After executing each first target task and obtaining an intermediate calculation result corresponding to the first target task, the method further comprises: Acquiring task complexity of the first target task and data quantity of a corresponding intermediate calculation result; And determining whether to add the first target task to the identification file according to the task complexity and the data volume.
3. A data processing method according to claim 2, wherein, Determining whether to add the first target task to the identification file according to the task complexity and the data volume, including: And under the condition that the task complexity is larger than a preset complexity threshold and the data volume is smaller than a preset data volume, adding the first target task into the identification file, and storing an intermediate calculation result corresponding to the first target task in the target memory.
4. The method of claim 2, wherein obtaining the task complexity of the first target task comprises: acquiring a task type corresponding to the first target task and metadata information, wherein the metadata information comprises data processing capacity corresponding to the first target task; the task complexity is determined based on the task type and the data throughput.
5. The data processing method according to claim 2, wherein after the first target task is added to the identification file, the method further comprises: And determining a time threshold corresponding to the first target task according to the task complexity, wherein the time threshold represents the maximum time length of static storage of the intermediate calculation result corresponding to the first target task in the target memory.
6. A data processing method according to claim 1, wherein, The method further comprises the steps of: determining a target duration between the current time and the last called time of each task in the identification file, and acquiring a time threshold corresponding to each task in the identification file; and deleting the task with the target duration exceeding the corresponding time threshold from the identification file, and deleting the corresponding intermediate calculation result from the target memory.
7. The method of claim 6, wherein determining a target time period between a current time and a last called time of each task in the identification file comprises: Acquiring a value of a life cycle corresponding to each task in the identification file at the current time; Determining the value of the life cycle as a corresponding target duration; When any task in the identification file is called, the value of the life cycle corresponding to the task is cleared and recalculated.
8. A data processing method according to claim 1, wherein, The method further comprises the steps of: acquiring the calling frequency of the intermediate calculation result corresponding to each task in the identification file in a preset time period; And deleting the task with the calling frequency smaller than the frequency threshold from the identification file, and deleting the corresponding intermediate calculation result from the target memory.
9. A data processing apparatus, the data processing apparatus comprising a memory and a processor coupled to the memory; Wherein the memory is for storing program data and the processor is for executing the program data to implement the data processing method according to any one of claims 1 to 8.
10. A computer storage medium for storing program data which, when executed by a computer, is adapted to carry out the data processing method of any one of claims 1 to 8.

Description

Data processing method, apparatus and computer storage medium Technical Field The present application relates to the field of big data analysis, and in particular, to a data processing method, apparatus, and computer storage medium. Background Batch processing inquiry refers to an inquiry mode of carrying out one-time and centralized processing on a large amount of data according to preset task logic, and is widely applied to various fields such as data warehouse construction, offline statistical analysis, data mining modeling and the like as one of core scenes of large data analysis. Trino is used as a high-performance distributed SQL (Structured Query Language ) query engine, and has been widely focused and applied in batch query scenes. The Trino Tardigrade architecture is an inquiry architecture obtained by upgrading the traditional Trino, namely, superposing the capability on the Trino original framework, and compared with the Trino original framework, the fault tolerance capability and stability under the batch inquiry scene are improved, but Trino Tardigrade has relatively large data volume and complex inquiry when operation is performed each time along with the explosive growth of service data in the batch inquiry scene of online analytical processing (OLAP), and the task execution time of each processing stage is generally longer, so that the completion period of batch inquiry is overlong, and the real-time requirement of the service on the data analysis result cannot be met. Disclosure of Invention In order to solve the technical problems, the application provides a data processing method, data processing equipment and a computer storage medium. In order to solve the technical problems, the application provides a data processing method, which comprises the steps of obtaining an identification file comprising at least part of tasks which are processed in a history, wherein an intermediate calculation result corresponding to each task in the identification file is stored in a target memory, determining a first target task which is not recorded in the identification file and a second target task which is recorded in the identification file in a plurality of tasks to be processed and corresponds to the current processing stage of a query flow, executing each first target task to obtain the intermediate calculation result corresponding to the first target task, and determining the intermediate calculation result corresponding to the second target task according to the intermediate calculation result stored in the target memory. After each first target task is executed to obtain an intermediate calculation result corresponding to the first target task, the method further comprises the steps of obtaining task complexity of the first target task and data quantity of the corresponding intermediate calculation result, and determining whether to add the first target task to the identification file according to the task complexity and the data quantity. Determining whether to add the first target task to the identification file according to the task complexity and the data volume includes adding the first target task to the identification file and storing an intermediate calculation result corresponding to the first target task in the target memory when the task complexity is greater than a preset complexity threshold and the data volume is less than a preset data volume. The task complexity of the first target task is obtained, wherein the task complexity comprises a task type corresponding to the first target task and metadata information, the metadata information comprises data processing amount corresponding to the first target task, and the task complexity is determined based on the task type and the data processing amount. After the first target task is added to the identification file, the method further comprises determining a time threshold corresponding to the first target task according to the task complexity, wherein the time threshold represents the maximum duration of static storage of an intermediate calculation result corresponding to the first target task in the target memory. The method further comprises the steps of determining target time length between the current time and the last called time of each task in the identification file, obtaining time thresholds corresponding to the tasks in the identification file respectively, deleting the tasks with the target time length exceeding the corresponding time thresholds from the identification file, and deleting corresponding intermediate calculation results from the target memory. The method comprises the steps of obtaining a value of a life cycle corresponding to each task in the identification file at the current time, determining the value of the life cycle as the corresponding target time length, and clearing and recalculating the value of the life cycle corresponding to any task in the identification file when the task is called. The met