CN-122027529-A - Data acquisition method, system, equipment and medium
Abstract
The invention discloses a data acquisition method, a system, equipment and a medium, belonging to the field of data processing, wherein the method comprises the steps of analyzing the processing capacity of operation data of a service system to determine a target period; analyzing the content and the state of the real-time acquisition data and the batch acquisition data of each target period to obtain repeated acquisition data and state conflict data, calculating the conflict probability, determining the target period with the conflict probability larger than a conflict threshold value as the conflict period, obtaining batch interference values by analyzing the time and the number of the real-time acquisition tasks of the conflict period, obtaining the period to be scheduled by combining the time sequence position and the interference values if the interference values are larger than the interference threshold value, determining the priority of each period to be scheduled according to all the interference values, dynamically scheduling based on the priority pair, obtaining the current data acquisition scheme and executing the current data acquisition scheme.
Inventors
- LIN JIAXIN
- YAN YUPING
- YANG YONGJIAO
- HUANG CHAOLIN
- CHEN YANGPING
- ZHANG XIAOYE
- HUANG YUETIAN
- LI RUIQI
- JIANG ZIWEI
Assignees
- 广东电网有限责任公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260213
Claims (10)
- 1. A method of data acquisition, comprising: Acquiring historical operation data of a plurality of historical processing periods in a business system, and determining a target analysis period by carrying out throughput analysis on the historical operation data; For the real-time collected data and the batch collected data of each target analysis period, respectively obtaining repeated collected data and state conflict data by analyzing data content and service states, calculating based on the repeated collected data and the state conflict data to obtain a collection conflict probability, and determining the target analysis period with the collection conflict probability larger than a preset conflict threshold value as a collection conflict period; For each acquisition conflict period, acquiring interference values of batch acquisition tasks on the real-time acquisition tasks by analyzing the acquisition time and the acquisition quantity of the real-time acquisition tasks, and if the interference values are larger than a preset interference threshold, combining time sequence positions of the acquisition conflict period in a plurality of historical data processing periods and the interference values to acquire a period to be scheduled; and determining the priority of each time period to be scheduled according to all the interference values, dynamically scheduling each time period to be scheduled based on the priority, and obtaining and executing a data acquisition scheme under the current data processing period.
- 2. The data collection method according to claim 1, wherein the acquiring historical operation data of a plurality of historical processing periods in the service system determines a target analysis period by performing a throughput analysis on the historical operation data, specifically: dividing each of the history processing periods into a plurality of history processing periods; According to the first operation data of each historical processing period and the historical operation data of the historical processing period, calculating to obtain a first proportion used for representing the processing amount proportion, and taking the historical processing period with the first proportion being greater than or equal to a preset proportion as a period to be analyzed; calculating according to the first proportion of the period to be analyzed in all the historical processing periods to obtain a mean value and a standard deviation; Inputting the mean value and the standard deviation into a preset function to obtain a classification value, and determining a period to be analyzed, of which the classification value is greater than or equal to a preset classification threshold value, as the target analysis period.
- 3. The data acquisition method according to claim 1, wherein the repeated acquisition data and the status conflict data are obtained by analyzing the data content and the service status, respectively, specifically: comparing, for each business object, first real-time collected data corresponding to the business object with first batch collected data; If the data types of the first real-time acquisition data and the first batch acquisition data are the same, judging that the first real-time acquisition data and the first batch acquisition data are repeated acquisition data; and if the state values corresponding to the business objects are different, judging that the data corresponding to the business objects are state conflict data.
- 4. The data acquisition method according to claim 3, wherein the calculation is performed based on the repeated acquisition data and the state conflict data to obtain an acquisition conflict probability, specifically: acquiring a first data volume of the target analysis period, and acquiring second data volumes corresponding to all repeated acquisition data and all conflict data; And calculating the ratio of the first data volume to the second data volume to obtain the acquisition conflict probability.
- 5. The data acquisition method according to claim 1, wherein the interference value of the batch acquisition task on the real-time acquisition task is obtained by analyzing the acquisition time and the acquisition number of the real-time acquisition task, specifically: aiming at each real-time acquisition task, acquiring actual completion time and preset completion time, calculating a difference value to obtain a time difference corresponding to the real-time acquisition task, and counting time differences of all the real-time acquisition tasks in the acquisition conflict period to obtain acquisition time delay; Aiming at each real-time acquisition task, acquiring the actual acquisition total amount corresponding to the real-time acquisition task, and comparing the ratio of the actual acquisition total amount to the preset acquisition total amount to obtain the data loss rate; And combining the acquisition time delay and the data loss rate to obtain the interference value.
- 6. The data acquisition method according to claim 1, wherein the combining the timing positions of the acquisition conflict period in a plurality of historical data processing cycles with the interference values obtains a period to be scheduled, specifically: Calculating a difference value according to the time sequence position of any two historical data processing periods in the acquisition conflict period to obtain a first value used for representing the stability of the time sequence position; calculating a standard deviation of the interference value corresponding to the historical data processing period to obtain a second value used for representing fluctuation of the interference value; performing fusion calculation on the first value and the second value to obtain a target value; And if the target value is larger than a preset stability threshold value, judging the acquisition conflict period as the period to be scheduled.
- 7. The data collection method according to claim 1, wherein the calculating obtains the priority of each period to be scheduled, specifically: For each time period to be scheduled, calculating interference values of the time period to be scheduled in all the historical data processing periods, and carrying out average calculation to obtain average interference values corresponding to the time period to be scheduled; And determining the priority corresponding to each time period to be scheduled according to the average interference value corresponding to each time period to be scheduled.
- 8. The data acquisition system is characterized by comprising a peak identification module, a conflict identification module, a scheduling analysis module and an acquisition scheme module; the peak identification module is used for acquiring historical operation data of a plurality of historical processing periods in the business system and determining a target analysis period by carrying out processing capacity analysis on the historical operation data; The conflict recognition module is used for acquiring data in real time and acquiring data in batches for each target analysis period, acquiring repeated acquisition data and status conflict data respectively by analyzing data content and service status, calculating based on the repeated acquisition data and the status conflict data to acquire acquisition conflict probability, and determining the target analysis period with the acquisition conflict probability larger than a preset conflict threshold as an acquisition conflict period; The scheduling analysis module is used for analyzing the acquisition time and the acquisition quantity of the real-time acquisition tasks for each acquisition conflict period to obtain the interference value of the batch acquisition tasks on the real-time acquisition tasks, and if the interference value is larger than a preset interference threshold value, combining the time sequence positions of the acquisition conflict period in a plurality of historical data processing periods and the interference to obtain a period to be scheduled; The acquisition scheme module is used for determining the priority of each period to be scheduled according to all the interference values, dynamically scheduling each period to be scheduled based on the priority, obtaining a data acquisition scheme under the current data processing period and executing the data acquisition scheme.
- 9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the data acquisition method according to any one of claims 1-7 when executing the computer program.
- 10. A computer readable storage medium comprising a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the data acquisition method according to any one of claims 1-7.
Description
Data acquisition method, system, equipment and medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to a data acquisition method, system, device, and medium. Background In a modern enterprise data-driven operation mode, real-time acquisition and batch acquisition of operation data are two basic tasks supporting business analysis and decision making. Along with the continuous increase of data volume and the improvement of business complexity, when the two tasks are executed concurrently, especially in the period of concentrated data processing demands, direct competition is often formed on calculation, storage and network resources, so that acquisition tasks are delayed or failed, and the accuracy of data analysis results and the timeliness of business decisions are further affected. At present, by analyzing historical load data, a service peak period and a service valley period in an acquisition period are identified, and batch acquisition tasks are adjusted from the service peak period to the service valley period in advance, so that a new acquisition scheme is obtained and executed, and acquisition conflicts are avoided. However, the real-time acquisition task needs to respond to service dynamics at a high frequency, the execution period of the real-time acquisition task is highly coincident with the service peak period, and part of the batch acquisition tasks are scheduled and executed in the service peak period usually, so that the service timeliness of the batch acquisition task is ignored in the prior art, the batch analysis result is distorted due to data delay, and the accuracy of the subsequent data processing task is further reduced. Disclosure of Invention The invention provides a data acquisition method, a system, equipment and a medium, which can solve the problem of how to avoid the conflict between real-time acquisition and batch acquisition of resources and ensure the timeliness of batch acquisition of data of tasks so as to improve the accuracy of data analysis results. The invention provides a data acquisition method, which comprises the following steps: Acquiring historical operation data of a plurality of historical processing periods in a business system, and determining a target analysis period by carrying out throughput analysis on the historical operation data; For the real-time collected data and the batch collected data of each target analysis period, respectively obtaining repeated collected data and state conflict data by analyzing data content and service states, calculating based on the repeated collected data and the state conflict data to obtain a collection conflict probability, and determining the target analysis period with the collection conflict probability larger than a preset conflict threshold value as a collection conflict period; for each acquisition conflict period, acquiring interference values of batch acquisition tasks on the real-time acquisition tasks by analyzing the acquisition time and the acquisition quantity of the real-time acquisition tasks, and if the interference values are larger than a preset interference threshold, combining time sequence positions of the acquisition conflict period in a plurality of historical data processing periods and the interference to acquire a period to be scheduled; and determining the priority of each time period to be scheduled according to all the interference values, dynamically scheduling each time period to be scheduled based on the priority, and obtaining and executing a data acquisition scheme under the current data processing period. The embodiment of the invention identifies the business peak period through the quantitative analysis of the historical data, provides a time window basis for the subsequent conflict identification, avoids blind scheduling, quantifies the conflict probability through accurately identifying the data conflict types (repetition and inconsistent state), improves the identification accuracy of the conflict period, screens out the key unstable period really needing scheduling through the quantitative batch acquisition of time delay and loss number to the interference of real-time acquisition in combination with the historical stability evaluation, avoids excessive scheduling, realizes closed-loop control, dynamically adjusts the batch acquisition time, reduces the interference to the real-time acquisition, and improves the overall operation throughput rate and the stability. In the embodiment, the complete closed loop of historical data analysis, conflict identification, influence evaluation, stability screening and priority scheduling is adopted, the resource competition is relieved through intelligent scheduling on the premise that the timeliness, the integrity and the consistency of the data are guaranteed preferentially, the collaborative optimization between the data quality target and the efficiency target is realized, t