EP-4742056-A1 - METHOD AND APPARATUS FOR PRE-AGGREGATING TIME SERIES DATA

EP4742056A1EP 4742056 A1EP4742056 A1EP 4742056A1EP-4742056-A1

Abstract

A method and an apparatus for pre-aggregating time series data are provided. The method includes: obtaining at least one piece of time series data from a first device, where the first device is a generation device of the at least one piece of time series data, or the first device is a node device of a time series database, and any one of the at least one piece of time series data includes an identifier; determining a pre-aggregation method based on the identifier of the at least one piece of time series data, where a preset binding relationship exists between the pre-aggregation method and the identifier of the at least one piece of time series data; processing the at least one piece of time series data by using the pre-aggregation method, to generate pre-aggregated data; and when a usage of a memory satisfies a trigger condition, writing the pre-aggregated data and the at least one piece of time series data into a disk. The time series data and the pre-aggregated data can be evenly distributed, and storage space utilization can be improved.

Inventors

WANG, CHAO
ZHI, Wei
YI, Chuan

Assignees

Huawei Cloud Computing Technologies Co., Ltd.

Dates

Publication Date: 20260513
Application Date: 20240223

Claims (13)

A method for pre-aggregating time series data, comprising: obtaining at least one piece of time series data from a first device, wherein the first device is a generation device of the at least one piece of time series data, or the first device is a node device of a time series database, and any one of the at least one piece of time series data comprises an identifier; determining a pre-aggregation method based on the identifier of the at least one piece of time series data, wherein a preset binding relationship exists between the pre-aggregation method and the identifier of the at least one piece of time series data; processing the at least one piece of time series data by using the pre-aggregation method, to generate pre-aggregated data; writing the pre-aggregated data and the at least one piece of time series data into a memory; and when a usage of the memory satisfies a trigger condition, writing the pre-aggregated data and the at least one piece of time series data into a disk.
The method according to claim 1, wherein the writing the pre-aggregated data and the at least one piece of time series data into the disk comprises: writing the at least one piece of time series data into a first file in the disk; and writing the pre-aggregated data into a second file in the disk.
The method according to claim 2, wherein the first file and the second file are stored in a same directory.
The method according to any one of claims 1 to 3, wherein the writing the pre-aggregated data and the at least one piece of time series data into the disk comprises: writing the pre-aggregated data and the at least one piece of time series data into the disk in parallel.
The method according to any one of claims 1 to 4, wherein the at least one piece of time series data comprises a plurality of pieces of time series data, the plurality of pieces of time series data are time series data of a same type, and the plurality of pieces of time series data are stored in a same memory block in the memory.
The method according to any one of claims 1 to 5, wherein the trigger condition comprises: the usage of the memory is greater than or equal to a usage threshold; or a usage of the memory block in the memory is greater than or equal to a usage threshold, and the memory block is used for storing the pre-aggregated data and the at least one piece of time series data.
The method according to any one of claims 1 to 6, wherein before the processing the at least one piece of time series data by using the pre-aggregation method, the method further comprises: receiving first indication information, wherein the first indication information indicates the binding relationship; and storing the identifier of the at least one piece of time series data and an identifier of the pre-aggregation method in a metadata module based on the first indication information; and the determining the pre-aggregation method based on the identifier of the at least one piece of time series data comprises: determining the identifier of the pre-aggregation method from the metadata module based on the identifier of the at least one piece of time series data; and determining the pre-aggregation method based on the identifier of the pre-aggregation method.
The method according to claim 7, wherein after the writing the pre-aggregated data and the at least one piece of time series data into the disk, the method further comprises: receiving second indication information, wherein the second indication information indicates to remove the binding relationship; and deleting, based on the second indication information, the identifier of the pre-aggregation method stored in the metadata module.
The method according to claim 8, wherein the method further comprises: receiving third indication information, wherein the third indication information indicates to deregister the pre-aggregation method; and deleting, based on the third indication information, the pre-aggregation method stored in the metadata module.
An apparatus for pre-aggregating time series data, comprising: a module, configured to perform the method according to any one of claims 1 to 9.
An apparatus for pre-aggregating time series data, comprising: a processor, configured to execute a computer program stored in a storage, wherein the apparatus is caused to perform the method according to any one of claims 1 to 9; and a communication interface, wherein the communication interface is coupled to the processor, and is configured to input or output information.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program; and when the computer program is executed by a processor, the processor is caused to perform the method according to any one of claims 1 to 9.
A computer program product, wherein when the computer program product is executed by a processor, the method according to any one of claims 1 to 9 is caused to be performed.

Description

This application claims priorities to Chinese Patent Application No. 202310966765.X, filed with the China National Intellectual Property Administration on August 2, 2023 and entitled "PRE-AGGREGATION METHOD AND PRE-AGGREGATION APPARATUS", and to Chinese Patent Application No. 202311235046.7, filed with the China National Intellectual Property Administration on September 22, 2023 and entitled "METHOD AND APPARATUS FOR PRE-AGGREGATING TIME SERIES DATA", which are incorporated herein by reference in their entireties. TECHNICAL FIELD This application relates to the computer field, and in particular, to a method and an apparatus for pre-aggregating time series data. BACKGROUND Time series data refers to a series of data that is continuously generated over time. With the continuous development of a 5th generation mobile communication technology (5th generation mobile communication technology, 5G) and an internet of things (internet of things, IoT) technology, a data amount increases explosively. The time series data is widely used in common scenarios, including an IoT, an internet of vehicles, an industrial internet, and application performance monitoring. In these scenarios, the time series data may be used for recording key information such as a device running status, operation data, and monitoring data. Analyzing and processing the time series data can help enterprises predict faults and optimize production, to support decision-making of the enterprises. The time series data features high-frequency data generation and continuous high-concurrency writes. These features lead to long processing time of the time series data. Pre-aggregation is a method for resolving a problem of the long processing time of the time series data. In the method, the time series data is pre-aggregated in a process of writing the time series data, to generate pre-aggregated data, and the time series data is re-aggregated by using the pre-aggregated data during querying, so that efficiency of querying the time series data can be improved. When pre-aggregation processing is performed on the time series data, a pre-aggregation time range needs to be manually set. However, generation frequencies of the time series data in different time periods are different, and manually setting the pre-aggregation time range makes it difficult to ensure that data amounts of the time series data and the pre-aggregated data in the different time ranges are the same. Uneven distribution of the time series data and the pre-aggregated data causes a decrease in storage space utilization. SUMMARY Embodiments of this application provide a method and an apparatus for pre-aggregating time series data, a computer-readable storage medium, and a computer program product, to evenly distribute the time series data and pre-aggregated data, and improve storage space utilization. According to a first aspect, an embodiment of this application provides a method for pre-aggregating time series data. The method may be performed by a server or a chip used in a server. The following uses an example in which the method is performed by the server for description. The method includes: obtaining at least one piece of time series data from a first device, where the first device is a generation device of the at least one piece of time series data, or the first device is a node device of a time series database, and any one of the at least one piece of time series data includes an identifier; determining a pre-aggregation method based on the identifier of the at least one piece of time series data, where a preset binding relationship exists between the pre-aggregation method and the identifier of the at least one piece of time series data; processing the at least one piece of time series data by using the pre-aggregation method, to generate pre-aggregated data; writing the pre-aggregated data and the at least one piece of time series data into a memory; and when a usage of the memory satisfies a trigger condition, writing the pre-aggregated data and the at least one piece of time series data into a disk. The server may directly obtain the time series data from the generation device of the time series data, or may obtain the time series data from the node device in a server cluster. The time series data generally includes the identifier (or may be referred to as a "metric"). Based on the binding relationship between the pre-aggregation method and the identifier of the time series data, the server may determine the pre-aggregation method that needs to be used. Then, the server processes the time series data based on the determined pre-aggregation method, and writes a pre-aggregation result and the time series data into the memory. When the memory does not satisfy the trigger condition, the server may continuously obtain the time series data and pre-aggregate the time series data. When the memory satisfies the trigger condition, the server may write the time series data and the pre-aggreg