Search

CN-122026919-A - Time sequence data compression method, system and computer program product

CN122026919ACN 122026919 ACN122026919 ACN 122026919ACN-122026919-A

Abstract

The present disclosure relates to the field of data processing technology, and in particular, to a method, a system, and a computer program product for compressing time series data, where the method includes performing a structuring process on original time series data; the method comprises the steps of respectively compressing structured time sequence data through at least two compression algorithms, calculating the compression rate of each compression algorithm based on the data length of the structured time sequence data before and after compression, selecting a target compression algorithm according to the compression rate of each compression algorithm, and compressing the structured time sequence data through the target compression algorithm. The data compression scheme provided by the present disclosure has good compatibility, high compression rate and fast compression and decompression speed, and improves the management and utilization efficiency of time sequence data.

Inventors

  • LI WEI
  • HE PENG
  • LI YANJIE
  • TANG JIE
  • Yao Niyu
  • LIU CHENYONG
  • ZHAO MINGDONG

Assignees

  • 中国航发湖南动力机械研究所

Dates

Publication Date
20260512
Application Date
20260113

Claims (10)

  1. 1. A time-series data compression method, comprising: Carrying out structuring treatment on the original time sequence data; Respectively compressing the structured time sequence data through at least two compression algorithms; Calculating the compression rate of each compression algorithm based on the data length of the structured time series data before and after compression; Selecting a target compression algorithm according to the compression rate of each compression algorithm; and compressing the structured time sequence data through the target compression algorithm.
  2. 2. The method of time series data compression as set forth in claim 1, wherein the structuring process of the original time series data includes: The method comprises the steps of storing original time sequence data in a binary file mode, wherein time sequence data information and 32-bit floating point type original time sequence data are stored in the binary file, and the time sequence data information comprises the number of sampling channels, sampling frequency and test number corresponding to the original time sequence data.
  3. 3. The method of time series data compression as set forth in claim 2, wherein the structuring process is performed on the original time series data, further comprising: And dividing and storing the original time sequence data according to the sampling channels.
  4. 4. The method of time series data compression as claimed in claim 1, wherein the compression algorithm employed in compressing the structured time series data comprises: Swing compression algorithm and gorella compression algorithm.
  5. 5. The method of time-series data compression according to claim 4, wherein selecting the target compression algorithm according to the compression rate of each compression algorithm comprises: judging whether the compression rate of the Swing compression algorithm is smaller than a first threshold value or not, and obtaining a first judgment result; if the first judgment result is yes, the Swing compression algorithm is used as a target compression algorithm; if the first judgment result is negative, judging whether the compression rate of the Gorilla compression algorithm is smaller than a second threshold value, and obtaining a second judgment result; And if the second judgment result is yes, taking the Gorilla compression algorithm as a target compression algorithm.
  6. 6. The method of time-series data compression according to claim 5, wherein selecting a target compression algorithm according to the compression rate of each compression algorithm further comprises: And if the second judging result is negative, not compressing the structured time sequence data.
  7. 7. A time series data compression method as claimed in any one of claims 1 to 6, wherein selecting a target compression algorithm based on the compression rate of each compression algorithm comprises: dividing the original time sequence data into a plurality of data blocks according to a sampling channel; the target compression algorithm for each data block is determined based on the compression rate for each data block for each compression algorithm.
  8. 8. A time series data compression system, comprising: the data processing module is configured to perform structural processing on the original time sequence data; the first compression module is configured to compress the structured time sequence data through at least two compression algorithms respectively; a benefit calculating module configured to calculate a compression rate of each compression algorithm based on a data length of the structured time series data before and after compression; The algorithm selection module is configured to select a target compression algorithm according to the compression rate of each compression algorithm; and the second compression module is configured to compress the structured time sequence data through the target compression algorithm.
  9. 9. The time series data compression system of claim 8, wherein the algorithm selection module is configured to: dividing the original time sequence data into a plurality of data blocks according to a sampling channel; the target compression algorithm for each data block is determined based on the compression rate for each data block for each compression algorithm.
  10. 10. A computer program product stored in a computer readable storage medium, characterized in that the computer program product is at least for implementing the time-series data compression method according to any one of claims 1-7 when being executed by a processor.

Description

Time sequence data compression method, system and computer program product Technical Field The disclosure belongs to the technical field of data processing, and in particular relates to a time sequence data compression method, a time sequence data compression system and a computer program product. Background The turboshaft engine test and simulation data show mass growth due to high-frequency sampling, long-time and multi-physical field coupling simulation, and the prior art is difficult to manage efficiently. The traditional CSV and other text formats have serious storage redundancy, more than 50% of space waste caused by character coding and line organization, difficult metadata mixing and retrieval, poor adaptability of a single compression algorithm, low compression rate on low-entropy data such as constants, square waves and the like, easy anti-overstock or lost characteristics on random data, long serial processing time, no resource protection of multithreading, easy data competition initiation, weak data management, difficult retrieval and positioning, cross-scene multiplexing obstacle and super PB grade annual storage cost. The contradiction between massive data and storage redundancy, low compression efficiency, slow processing and weak management stands out, and a solution for adapting the data characteristics is needed. Disclosure of Invention In order to solve the above problems, the present disclosure provides a time-series data compression method, which has the advantages of fast compression speed and high compression yield, and the method includes: Carrying out structuring treatment on the original time sequence data; Respectively compressing the structured time sequence data through at least two compression algorithms; Calculating the compression rate of each compression algorithm based on the data length of the structured time series data before and after compression; Selecting a target compression algorithm according to the compression rate of each compression algorithm; and compressing the structured time sequence data through the target compression algorithm. Further, the structuring process is performed on the original time sequence data, including: The method comprises the steps of storing original time sequence data in a binary file mode, wherein time sequence data information and 32-bit floating point type original time sequence data are stored in the binary file, and the time sequence data information comprises the number of sampling channels, sampling frequency and test number corresponding to the original time sequence data. Further, the method for structuring the original time sequence data further comprises the following steps: And dividing and storing the original time sequence data according to the sampling channels. Further, in compressing the structured time series data, a compression algorithm is adopted, which includes: Swing compression algorithm and gorella compression algorithm. Further, selecting a target compression algorithm according to the compression rate of each compression algorithm, including: judging whether the compression rate of the Swing compression algorithm is smaller than a first threshold value or not, and obtaining a first judgment result; if the first judgment result is yes, the Swing compression algorithm is used as a target compression algorithm; if the first judgment result is negative, judging whether the compression rate of the Gorilla compression algorithm is smaller than a second threshold value, and obtaining a second judgment result; And if the second judgment result is yes, taking the Gorilla compression algorithm as a target compression algorithm. Further, selecting a target compression algorithm according to the compression rate of each compression algorithm, further comprising: And if the second judging result is negative, not compressing the structured time sequence data. Further, selecting a target compression algorithm according to the compression rate of each compression algorithm, including: dividing the original time sequence data into a plurality of data blocks according to a sampling channel; the target compression algorithm for each data block is determined based on the compression rate for each data block for each compression algorithm. The present disclosure also proposes a time-series data compression system comprising: the data processing module is configured to perform structural processing on the original time sequence data; the first compression module is configured to compress the structured time sequence data through at least two compression algorithms respectively; a benefit calculating module configured to calculate a compression rate of each compression algorithm based on a data length of the structured time series data before and after compression; The algorithm selection module is configured to select a target compression algorithm according to the compression rate of each compression algorithm; and the second comp