Search

US-12625610-B2 - Data compression method and apparatus, computing device, and storage system

US12625610B2US 12625610 B2US12625610 B2US 12625610B2US-12625610-B2

Abstract

In a data compression method, a computing device determines a compression feature value of to-be-compressed data based on a first parameter that affects a compression result of the to-be-compressed data. The computing device determines, based on the compression feature value, a compression policy for compressing the to-be-compressed data. The computing device then compresses the to-be-compressed data according to the compression policy to obtain compressed data, and stores the compressed data.

Inventors

  • Sizhe Luo
  • Ruliang DONG
  • Hongde Zhu
  • Yong Sun

Assignees

  • HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date
20260512
Application Date
20240102
Priority Date
20210708

Claims (18)

  1. 1 . A data compression method performed by a computing device, the method comprising: determining a compression feature value of to-be-compressed data based on a first parameter that affects a compression result of the to-be-compressed data, wherein the to-be-compressed data is to be sent by the computing device to a storage device; determining, based on the compression feature value, a compression policy for compressing the to-be-compressed data, wherein the compression policy comprises a plurality of compression windows; compressing the to-be-compressed data according to the compression policy to obtain compressed data, wherein the step of compressing the to-be-compressed data according to the compression policy comprises: separately compressing the to-be-compressed data based on the plurality of compression windows to obtain a plurality of pieces of compressed data; comparing compression rates of the plurality of pieces of compressed data; and selecting, from the plurality of pieces of compressed data based on the comparing, first compressed data with a highest compression rate as the compressed data; and sending a data packet to the storage device, wherein the data packet comprises a packet header and a payload, and the payload includes the first compressed data and a compression window of the first compressed data.
  2. 2 . The method according to claim 1 , wherein the first parameter comprises a parameter of a hardware resource used when the to-be-compressed data is compressed, or a parameter of a data feature that describes the to-be-compressed data.
  3. 3 . The method according to claim 2 , wherein the parameter of the data feature comprises a data type, a data block size, or distribution of characters comprised in the to-be-compressed data.
  4. 4 . The method according to claim 2 , wherein the parameter of the hardware resource comprises a usage ratio of a processor of the computing device, a network bandwidth between the computing device and the storage device when the compressed data is sent to the storage device, or an available storage capacity of the storage device.
  5. 5 . The method according to claim 1 , wherein the computing device stores correspondences between a plurality of compression feature values and compression policies, and wherein the step of determining the compression policy for compressing the to-be-compressed data comprises: determining a compression feature value that is in the correspondences and that corresponds to the compression feature value of the to-be-compressed data; and determining, based on the compression feature value determined based on the correspondences, a compression policy corresponding to the compression feature value as the compression policy for compressing the to-be-compressed data.
  6. 6 . The method according to claim 5 , wherein the correspondences between the plurality of compression feature values and the compression policies are obtained based on neural network training.
  7. 7 . The method according to claim 1 , wherein after obtaining the compressed data, the method further comprises: determining a compression rate of the compression policy used when data is compressed; and adjusting the compression feature value and a parameter of the compression policy in the correspondence based on the compression rate.
  8. 8 . A computing device comprising: a memory storing executable instructions; and a processor configured to execute the executable instructions to perform operations of: determining a compression feature value of to-be-compressed data based on a first parameter that affects a compression result of the to-be-compressed data, wherein the to-be-compressed data is to be sent by the computing device to a storage device; determining, based on the compression feature value, a compression policy for compressing the to-be-compressed data, wherein the compression policy comprises a plurality of compression windows; compressing the to-be-compressed data according to the compression policy to obtain compressed data, wherein the step of compressing the to-be-compressed data according to the compression policy comprises: separately compressing the to-be-compressed data based on the plurality of compression windows to obtain a plurality of pieces of compressed data; comparing compression rates of the plurality of pieces of compressed data; and selecting, from the plurality of pieces of compressed data based on the comparing, first compressed data with a highest compression rate as the compressed data; and sending a data packet to the storage device, wherein the data packet comprises a packet header and a payload, and the payload includes the first compressed data and a compression window of the first compressed data.
  9. 9 . The computing device according to claim 8 , wherein the first parameter comprises a parameter of a hardware resource used when the to-be-compressed data is compressed, or a parameter of a data feature that describes the to-be-compressed data.
  10. 10 . The computing device according to claim 9 , wherein the parameter of the data feature comprises a data type, a data block size, or distribution of characters comprised in the to-be-compressed data.
  11. 11 . The computing device according to claim 9 , wherein the parameter of the hardware resource comprises a usage ratio of a processor of the computing device, a network bandwidth between the computing device and the storage device when the compressed data is sent to the storage device, or an available storage capacity of the storage device.
  12. 12 . The computing device according to claim 8 , wherein the processor is configured to store correspondences between a plurality of compression feature values and compression policies, and wherein the operation of determining the compression policy for compressing the to-be-compressed data comprises: determining a compression feature value that is in the correspondences and that corresponds to the compression feature value of the to-be-compressed data; and determining, based on the compression feature value determined based on the correspondences, a compression policy corresponding to the compression feature value as the compression policy for compressing the to-be-compressed data.
  13. 13 . The computing device according to claim 12 , wherein the correspondences between the plurality of compression feature values and the compression policies are obtained based on neural network training.
  14. 14 . The computing device according to claim 8 , wherein after obtaining the compressed data, the processor is configured to perform operations of: determining a compression rate of the compression policy used when data is compressed; and adjusting the compression feature value and a parameter of the compression policy in the correspondence based on the compression rate.
  15. 15 . A storage system, comprises: a computing node; and a storage node, wherein the computing node is configured to perform operations of: determining a compression feature value of to-be-compressed data based on a first parameter that affects a compression result of the to-be-compressed data, wherein the to-be-compressed data is to be sent by the computing node to the storage node; determining, based on the compression feature value, a compression policy for compressing the to-be-compressed data, wherein the compression policy comprises a plurality of compression windows; compressing the to-be-compressed data according to the compression policy to obtain compressed data, and storing the compressed data, wherein the step of compressing the to-be-compressed data according to the compression policy comprises: separately compressing the to-be-compressed data based on the plurality of compression windows to obtain a plurality of pieces of compressed data; comparing compression rates of the plurality of pieces of compressed data; and selecting, from the plurality of pieces of compressed data based on the comparing, first compressed data with a highest compression rate as the compressed data; and sending a data packet to the storage node, wherein the data packet comprises a packet header and a payload, and the payload includes the first compressed data and a compression window of the first compressed data; and wherein the storage node is configured to store the first compressed data.
  16. 16 . The storage system according to claim 15 , wherein the first parameter comprises a parameter of a hardware resource used when the to-be-compressed data is compressed, or a parameter of a data feature that describes the to-be-compressed data.
  17. 17 . The storage system according to claim 16 , wherein the parameter of the data feature comprises a data type, a data block size, or distribution of characters comprised in the to-be-compressed data.
  18. 18 . The method according to claim 16 , wherein the parameter of the hardware resource comprises a usage ratio of a processor of the computing device, a network bandwidth between the computing device and the storage device when the compressed data is sent to the storage device, or an available storage capacity of the storage device.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of International Application PCT/CN2022/100440, filed on Jun. 22, 2022, which claims priority to Chinese Patent Application No. 202111109332. X, filed on Sep. 22, 2021, and Chinese Patent Application No. 202110773759.3, filed on Jul. 8, 2021. The aforementioned priority application are hereby incorporated by reference in their entirety. TECHNICAL FIELD This application relates to the field of data compression, and in particular, to a data compression method and apparatus, a computing device, and a storage system. BACKGROUND With the prosperity of cloud computing and big data industries, a larger data center scale indicates more data stored in the data center. A larger data size and a longer storage period mean higher storage costs. Currently, a computing device may compress to-be-stored data by using a compression algorithm (such as LZ4, LZO, or Snappy) before storing the to-be-stored data, thereby reducing a data size of the to-be-stored data, and reducing corresponding costs of hard disks, nodes, racks, sites, and operation and maintenance. In the computing device, after a user sets the compression algorithm, data compression is performed on all data based on the set compression algorithm, resulting in a low data compression rate. SUMMARY This application provides a data compression method and apparatus, a computing device, and a storage system, to improve a data compression rate. According to a first aspect, a data compression method is provided. The method may be executed by a computing device, and specifically includes the following steps: When compressing to-be-compressed data, the computing device determines a compression feature value of the to-be-compressed data based on a parameter that affects a compression result of the to-be-compressed data, determines, based on the compression feature value, a compression policy for compressing the to-be-compressed data, and compresses the to-be-compressed data according to the compression policy to obtain compressed data, and stores the compressed data. In this way, compared with a case in which the computing device compresses data with different features by using one compression algorithm, in the data compression method provided in this application, the compression policy used by the computing device is obtained based on selection of a feature that affects the compression result of the to-be-compressed data. The to-be-compressed data is compressed by using the compression policy that matches the feature of the to-be-compressed data, so that a data compression rate can be effectively improved. In a possible implementation, the parameter includes a parameter of a hardware resource used when the to-be-compressed data is compressed and/or a parameter of a data feature that describes the to-be-compressed data. The parameter of the data feature includes at least one of a data type, a data block size, and distribution of characters included in the to-be-compressed data. The parameter of the hardware resource includes at least one of a usage ratio of a processor of the computing device, a network bandwidth between the computing device and a storage device when the compressed data is stored in the storage device, and an available storage capacity of the storage device. In another possible implementation, the computing device stores correspondences between a plurality of compression feature values and compression policies. That determines, based on the compression feature value, a compression policy for compressing the to-be-compressed data includes: The computing device determines a compression feature value that is in the correspondences and that corresponds to the compression feature value of the to-be-compressed data; and determines, based on the compression feature value determined based on the correspondences, that a compression policy corresponding to the compression feature value is the compression policy for compressing the to-be-compressed data. Therefore, the correspondences between the plurality of compression feature values and the compression policies are preconfigured, so that when compressing the to-be-compressed data in real time, the computing device quickly and accurately selects the compression policy matching the feature of the to-be-compressed data. The correspondences between the plurality of compression feature values and the compression policies may be obtained based on neural network training. In another possible implementation, each compression policy includes a plurality of compression windows. That compresses the to-be-compressed data according to the compression policy includes: The computing device separately compresses the to-be-compressed data based on the plurality of compression windows to obtain a plurality of pieces of compressed data; and compares compression rates of the plurality of pieces of compressed data, and selects compressed data with a highest com