CN-114489974-B - Method and device for processing real-time data
Abstract
The invention discloses a real-time data processing method which comprises the steps of executing a real-time acquisition task according to task configuration information, storing acquisition data, determining task scheduling attribute information according to summarizing requirements on the time dimension, granularity and data volume of the acquisition task, generating a corresponding task sub-scheduling list, scheduling the acquisition task according to the task sub-scheduling list, updating the task sub-scheduling list in real time according to the execution condition of the acquisition task, scanning the task sub-scheduling list in real time, and summarizing the acquisition data according to the summarizing requirements when the task sub-scheduling list is completely completed or reaches delay time set in the task scheduling attribute information. The invention also discloses a device for processing the real-time data. The invention can realize the real-time acquisition of the data and ensure the accuracy of the data.
Inventors
- Song shuanglong
Assignees
- 北京亿阳信通科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20211230
Claims (4)
- 1. A method of real-time data processing, the method comprising: Executing a real-time acquisition task according to the task configuration information, and storing acquisition data; Determining task scheduling attribute information according to the summarizing requirements on the time dimension, granularity and data volume of the collected tasks, and generating a corresponding task sub-scheduling list, wherein the granularity represents the number of task sub-scheduling times in a period, the granularity is smaller, the number of task sub-scheduling times is more, the granularity is thicker, the number of task sub-scheduling times is less, data is summarized in a period of one hour, the number of task scheduling times in the period is 4 in a period of 15 minutes, the time point mark represents the summarized granularity, the data in one hour is summarized in 15 minutes, and time zone bits of 00 minutes, 15 minutes, 30 minutes and 45 minutes are set; scheduling the acquisition task according to the task sub-scheduling list, and updating the task sub-scheduling list in real time according to the execution condition of the acquisition task; Scanning the task sub-scheduling list in real time, and summarizing the acquired data according to the summarizing requirement when the task sub-scheduling list is completely completed or reaches the delay time set in the task scheduling attribute information; updating the task sub-schedule list after the summary is completed, The method for determining task scheduling attribute information and generating a corresponding task sub-scheduling list according to the summarized requirements on the time dimension, granularity and data volume of the acquisition task comprises the following steps: Determining the period and the maximum delay of task scheduling according to the summarizing requirement on the time dimension of the acquisition task; determining the task scheduling times in a period according to the summarizing requirement on the granularity of the acquisition task; Generating a corresponding task sub-scheduling list according to the task scheduling times in the period; The method for updating the task sub-scheduling list in real time according to the execution condition of the acquisition task comprises the following specific steps: When the execution of the current acquisition task in the task sub-scheduling list is completed, updating a current task sub-scheduling completion flag bit; And when all the acquisition tasks in the task sub-scheduling list are executed, updating the task completion zone bit.
- 2. The method of claim 1, wherein the real-time scanning the task sub-schedule includes scanning the task sub-schedule in real time when the task sub-schedule is completed or a delay time set in the task scheduling attribute information is reached; When the task completion flag bit of the task sub-scheduling list is complete, summarizing the acquired data according to the data volume requirement in the summarizing requirement; And when the delay time set in the task scheduling attribute information is reached, summarizing the acquired data according to the data volume demand in the summarizing demand.
- 3. An apparatus for real-time data processing, the apparatus comprising: the task management unit is used for executing a real-time acquisition task according to the task configuration information and storing acquisition data; The task splitting unit is used for determining task scheduling attribute information according to the summarizing requirements on the time dimension, granularity and data volume of the collected tasks and generating a corresponding task sub-scheduling list, wherein the granularity represents the number of times of task sub-scheduling in a period, the finer the granularity is, the more the number of times of task sub-scheduling is, otherwise, the coarser the granularity is, the fewer the number of times of task sub-scheduling is, summarizing data in a period of one hour and the number of times of task scheduling in the period of 15 minutes is, wherein the summarizing granularity is represented by a time point mark, the data in one hour is summarized in 15 minutes, and time zone bits of 00 minutes, 15 minutes, 30 minutes and 45 minutes are set; The task progress updating unit is used for scheduling the acquisition task according to the task sub-scheduling list and updating the task sub-scheduling list in real time according to the execution condition of the acquisition task; The summarizing unit scans the task sub-scheduling list in real time, and when the task sub-scheduling list is completely completed or reaches the delay time set in the task scheduling attribute information, the collected data are summarized according to the summarizing requirement; an updating unit for updating the task sub-schedule list after the summarizing unit finishes summarizing, Wherein the task splitting unit further includes: The parameter setting module is used for determining the scheduling period and the maximum delay of the real-time acquisition task in the task management unit according to the summarizing requirement on the time dimension of the acquisition task; The task sub-scheduling list determining module is used for generating a corresponding task sub-scheduling list according to the task scheduling times in the period set by the parameter setting module; The task progress update unit further includes: The verification module is used for verifying the current acquisition task and the execution state of all the acquisition tasks; the flag bit updating module is used for updating the task flag bit according to the task execution state checked by the checking module; When the execution of the current acquisition task in the task sub-scheduling list is completed, updating a current task sub-scheduling completion flag bit; And when all the acquisition tasks in the task sub-scheduling list are executed, updating the task completion zone bit.
- 4. The apparatus of claim 3, wherein the summarizing unit further comprises: the real-time scanning module is used for scanning the task sub-scheduling list in real time; And the summarizing module is used for summarizing the acquired data according to the data volume demand in the summarizing demand when the task completion zone bit of the task sub-scheduling list scanned by the real-time scanning module is completed, and summarizing the acquired data according to the data volume demand in the summarizing demand when the delay time set in the task scheduling attribute information is reached.
Description
Method and device for processing real-time data Technical Field The invention relates to the computer software industry, in particular to a real-time data processing technology. Background With the rise of big data, the demands of various industries for showing summary services on the data are increasing. Real-time problems of data processing and presentation are also a focus of attention. The real-time data monitoring system is gradually changed from simple delay display to real-time data monitoring system, the requirements on the real-time performance and accuracy of the data are higher and higher, the data are delayed from days to hours to minutes, and finally the data are quasi-real-time, so that the data are provided and displayed, the traditional task scheduling mode is simplistic, the timing mode is adopted, and the following problems exist in the timing mode: firstly, the data scheduling is early, the data time delay is short, and the data accuracy is problematic. And secondly, data scheduling is late, and data time is prolonged. There is no problem with the accuracy of the data. It is difficult to find a time point, and the requirements of accurate data and short time delay are met. Even if a relatively reasonable time is found, the result cannot be obtained as expected due to the delay problem of the data interface. Timing scheduling and single-threaded scheduling are currently used. The timing scheduling mode adopts a mode of a periodic scheduler CONTAB of LINUX, and the written processing procedure is scheduled by writing a timing processing script, and after the scheduling, the source data is calculated and inserted into the target table data. In the processing process, the data are not interdependent, and each data runs independently. The single-thread scheduling mode adopts a task serial mode, the serial mode is simple in configuration, the tasks adopt a one-by-one scheduling mode to complete one task, then the next task is carried out, and the tasks are collected. Resources are not fully utilized. And often one acquisition task is delayed, a plurality of delay of subsequent summary tasks can be caused, and presentation is seriously influenced. Therefore, how to perform real-time and accurate data acquisition and processing becomes a problem to be solved. Disclosure of Invention The invention provides a method for processing real-time data, which comprises the following steps: Executing a real-time acquisition task according to the task configuration information, and storing acquisition data; Determining task scheduling attribute information according to the summarizing requirements on the time dimension, granularity and data volume of the acquisition task, and generating a corresponding task sub-scheduling list; scheduling the acquisition task according to the task sub-scheduling list, and updating the task sub-scheduling list in real time according to the execution condition of the acquisition task; And scanning the task sub-scheduling list in real time, and summarizing the acquired data according to the summarizing requirement when the task sub-scheduling list is completely completed or reaches the delay time set in the task scheduling attribute information. Further, the method for determining task scheduling attribute information and generating a corresponding task sub-scheduling list according to the summarized requirements on the time dimension, granularity and data volume of the collected task specifically comprises the following steps: determining the task scheduling period and the maximum delay according to the requirement on the collecting task summarizing time dimension; Determining the task scheduling times in a period according to the requirements for collecting task summary granularity; and generating a corresponding task sub-scheduling list according to the task scheduling times in the period. Further, the method for updating the task sub-scheduling list in real time according to the execution condition of the acquisition task is specifically as follows: when the execution of the current acquisition task in the task sub-scheduling list is completed, updating a current task sub-scheduling completion flag bit; and when all the acquisition tasks in the task sub-scheduling list are executed, updating the task completion zone bit. Further, the real-time scanning of the task sub-schedule list, when the task sub-schedule list is completed or reaches the delay time set in the task scheduling attribute information, the method for summarizing the acquired data according to the summarizing requirement comprises the following steps: Scanning the task sub-scheduling list in real time; when the task completion flag bit of the task sub-scheduling list is complete, summarizing the acquired data according to the data volume requirement in the summarizing requirement; And when the delay time set in the task scheduling attribute information is reached, summarizing the acquired data according to t