CN-122001385-A - Sparse data compression method based on bitmap dictionary and related equipment
Abstract
The invention discloses a sparse data compression method and related equipment based on a bitmap dictionary, belonging to the technical field of data compression, wherein the method comprises the steps of defining a default value according to an original data sequence; traversing the original data sequence, marking the data units in the original data sequence as a first state or a second state according to a default value to generate a bitmap dictionary, wherein one of the first state and the second state is defined as a valid state, extracting the data units marked as the valid state in the original data sequence and forming a valid data load according to the original sequence, and packaging the bitmap dictionary and the valid data load according to a preset format to form a compressed data block. The invention marks the distribution of the effective data by introducing the bitmap dictionary, realizes the compression and decompression of the sparse data with extremely high compression ratio and extremely high encoding and decoding speeds, and eliminates the storage and transmission overhead of a large number of repeated default values.
Inventors
- LI HAITAO
- LIU ZHEN
- HU XUHUI
- PAN NENGGANG
Assignees
- 徐工汉云技术股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251231
Claims (10)
- 1. The sparse data compression method based on the bitmap dictionary is characterized by comprising the following steps of: Receiving an original data sequence to be compressed, wherein the original data sequence comprises a plurality of data units; Defining a default value according to the original data sequence; Traversing the original data sequence, marking data units in the original data sequence as a first state or a second state according to the default value to generate a bitmap dictionary, wherein one of the first state and the second state is defined as a valid state; Extracting data units marked as effective states in an original data sequence, and forming effective data load according to the original sequence; And packaging the bitmap dictionary and the effective data load according to a preset format to form a compressed data block.
- 2. The bitmap dictionary-based sparse data compression method according to claim 1, further comprising decompressing the compressed data blocks, in particular: receiving and analyzing the compressed data block, and separating a bitmap dictionary and a valid data load; constructing an output sequence, and initializing all positions in the output sequence to the default value; Sequentially reading the bitmap dictionary, sequentially taking out one data unit from the effective data load every time the bit of the effective state is read, and filling the data unit into the corresponding position of the output sequence; And when the bitmap dictionary is traversed, generating and outputting a decompressed data sequence.
- 3. The bitmap dictionary-based sparse data compression method of claim 2, wherein the output sequence, bitmap dictionary, and original data sequence are identical in length.
- 4. The bitmap dictionary-based sparse data compression method of claim 1, wherein the traversing the original data sequence marks data units in the original data sequence as a first state or a second state according to the default value to generate a bitmap dictionary, and wherein the traversing rule is that in the original data sequence, if the value of the data units is equal to the default value, the corresponding data units are marked as the first state, otherwise, the corresponding data units are marked as the second state, and the data units are marked in turn.
- 5. The bitmap dictionary-based sparse data compression method of claim 1, wherein the bitmap dictionary and the payload are encapsulated in a predetermined format to form a compressed data block, and wherein the bitmap dictionary and the payload are encapsulated sequentially.
- 6. The bitmap dictionary-based sparse data compression method of claim 2, wherein traversing the original data sequence, marking data units in the original data sequence as either a first state or a second state according to the default value to generate a bitmap dictionary comprises: traversing the original data sequence, marking the data units in the original data sequence as a first state or a second state according to the default value, and generating a bitmap dictionary in a binary form; Converting the bitmap dictionary in the binary form into bytes and storing the bytes; In the decompression process, the bitmap dictionary in byte form is converted into binary form before the bitmap dictionary is sequentially read.
- 7. The bitmap dictionary-based sparse data compression method according to claim 1, wherein a default value is defined according to the original data sequence, specifically: A number is identified from the original data sequence and defined as a default value.
- 8. The bitmap dictionary-based sparse data compression method of claim 1, wherein a default value is defined according to the original data sequence, specifically, a number with the largest occurrence number in the original data sequence is defined as a default value.
- 9. A computer readable storage medium, having stored thereon a computer program which when executed by a processor performs the steps of the bitmap dictionary-based sparse data compression method of any one of claims 1 to 8.
- 10. A computer device, comprising: a memory for storing computer programs/instructions; A processor for executing the computer program/instructions to implement the steps of the bitmap dictionary-based sparse data compression method as claimed in any one of claims 1 to 8.
Description
Sparse data compression method based on bitmap dictionary and related equipment Technical Field The application relates to the technical field of data compression, in particular to a sparse data compression method based on a bitmap dictionary and related equipment. Background The information technology field, data compression is a key technology for saving storage space and improving transmission efficiency. In the prior art, there are various general compression algorithms, such as LZ77, LZ78, huffman coding, etc. However, these algorithms are not optimally efficient in processing a class of special data, sparse data. Sparse data is data where there are a large number of identical values (e.g., 0, NULL, or a particular default value) in the data sequence, and there are only a small number of positions where there are valid non-default values. Such data are commonly found in scientific computation, namely matrix with most areas unchanged in large-scale numerical simulation, internet of things sensing data, namely repeated numerical values continuously uploaded by a sensor in a stable state, image processing, namely sparse images or binary images, and database storage, namely sparse columns with most records being empty. The existing general compression algorithm still needs to calculate and encode each data unit when compressing sparse data, even if the data is a default value for a large number of repetitions. This results in unnecessary computational overhead and there is still room for improvement in compression rate. For example, for a sequence containing 8 bytes and mostly 0x00, the general algorithm still needs to process all 8 bytes, and the "skip" mechanism cannot be implemented from the data structure level. Therefore, there is an urgent need in the art for a dedicated compression method capable of directly recognizing and utilizing the data sparseness characteristics, realizing a higher compression ratio and a faster processing speed. Disclosure of Invention The invention aims to overcome the defects in the prior art and provides a sparse data compression method and related equipment based on a bitmap dictionary, which are used for identifying the distribution of effective data by introducing the bitmap dictionary, so that the storage and transmission cost of a large number of repeated default values is eliminated. In order to achieve the above purpose, the invention is realized by adopting the following technical scheme: In a first aspect, the present invention provides a bitmap dictionary-based sparse data compression method, including the steps of: Receiving an original data sequence to be compressed, wherein the original data sequence comprises a plurality of data units; Defining a default value according to the original data sequence; Traversing the original data sequence, marking data units in the original data sequence as a first state or a second state according to the default value to generate a bitmap dictionary, wherein one of the first state and the second state is defined as a valid state; Extracting data units marked as effective states in an original data sequence, and forming effective data load according to the original sequence; And packaging the bitmap dictionary and the effective data load according to a preset format to form a compressed data block. Further, the method further comprises decompressing the compressed data block, in particular: receiving and analyzing the compressed data block, and separating a bitmap dictionary and a valid data load; constructing an output sequence, and initializing all positions in the output sequence to the default value; Sequentially reading the bitmap dictionary, sequentially taking out one data unit from the effective data load every time the bit of the effective state is read, and filling the data unit into the corresponding position of the output sequence; And when the bitmap dictionary is traversed, generating and outputting a decompressed data sequence. Further, the output sequence, the bitmap dictionary and the original data sequence are consistent in length. Further, the traversing of the original data sequence marks the data units in the original data sequence as a first state or a second state according to the default value to generate a bitmap dictionary, wherein the traversing rule is that in the original data sequence, if the value of the data units is equal to the default value, the corresponding data units are marked as the first state, otherwise, the corresponding data units are marked as the second state, and the data units are marked in sequence. Further, the bitmap dictionary and the payload are encapsulated according to a predetermined format to form a compressed data block, specifically, the bitmap dictionary and the payload are encapsulated in sequence. Further, traversing the original data sequence, marking the data units in the original data sequence as a first state or a second state according to the default value, so