CN-115913246-B - Lossless data compression algorithm based on self-adaptive instantaneous entropy

CN115913246BCN 115913246 BCN115913246 BCN 115913246BCN-115913246-B

Abstract

The invention belongs to the technical field of data processing, in particular to a lossless data compression algorithm based on self-adaptive instantaneous entropy, which comprises a compression end and a decompression end, wherein the compression end consists of five parts of an input data stream, a lookup table, instantaneous entropy coding, a highest zone bit and a serializer, the decompression end consists of five parts of an deserializer, a highest zone bit, instantaneous entropy decoding, a lookup table and an output data stream, and the lossless data compression algorithm specifically comprises a compression flow, a decompression flow and a lookup table updating flow.

Inventors

XI CAIPING
WU YANXIA

Assignees

江苏科技大学

Dates

Publication Date: 20260512
Application Date: 20220907

Claims (5)

1. The lossless data compression algorithm based on the self-adaptive instantaneous entropy is characterized by comprising a compression end and a decompression end, wherein the compression end consists of five parts of an input data stream, a lookup table, instantaneous entropy coding, a highest flag bit and a serializer, the decompression end consists of five parts of an deserializer, the highest flag bit, instantaneous entropy decoding, the lookup table and an output data stream, the total number of lines of the lookup table is set to be K, the number of occupied lines is K, data symbols of each line are N bits, each line of the lookup table has a line index of M bits corresponding to the data symbols, and M is calculated as follows: , The decompression end decodes the compressed data and outputs the original data symbol by using the same lookup table as the compression end, the instantaneous entropy coding is a process of coding the data stream by calculating the instantaneous entropy self-adaption of the original data symbol, the calculation of the instantaneous entropy E depends on the number k occupied by the lookup table, then , The method comprises the specific steps of deleting high bits of a successfully matched line identifier string in a lookup table to reduce the high bits into E bits, reconstructing the obtained data stream into a D bit compressed data stream by a serializer, receiving the compressed data stream by a decompression end, performing deserialization and extraction on the compressed data by using instantaneous entropy decoding, and outputting an original symbol, wherein the algorithm specifically comprises a compression flow, a decompression flow and a lookup table updating flow, the compression flow specifically comprises the steps that when the input data stream arrives at a compression end, the compression end receives the first N bits of the original data symbol in the input data stream, matches the data symbol in the lookup table, if the same data symbol is not matched, the instantaneous entropy E=M indicates that the symbol is not occupied by the lookup table, only the symbol is output without compression, meanwhile, the data symbol is stored in the lowest bit in the lookup table, updating the lookup table, if the matching is successful, indicating that the symbol is occupied by the lookup table, taking the corresponding line identifier string as data to be compressed, calculating the instantaneous entropy to reduce the compressed data to encode the data, and generating the compressed data.
2. The adaptive instantaneous entropy-based lossless data compression algorithm according to claim 1, wherein in the compression process, a highest flag bit is set to distinguish whether data is compressed, if the data is not compressed, a flag position "0" is output as 0+ original data, if the data is compressed, a flag position "1" is output as 1+ compressed data, and when all input data is data-compressed, a serializer receives all data generated by the data compression operation and reconstructs the data into a D-bit compressed data stream output.
3. The adaptive instantaneous entropy-based lossless data compression algorithm of claim 2, wherein the decompression process specifically comprises the steps of starting decoding operation when the decompression end receives the first bit of the D-bit compressed data stream, extracting E-bit data from the D-bit compressed data stream and expanding the E-bit data to M-bit data according to the instantaneous entropy E obtained by the compression end if the highest flag bit is 1, outputting the data in the corresponding table item as an original symbol after decompressing the row identifier string, extracting N-bit data from the compressed data stream as the original data symbol if the highest flag bit is 0 and indicating the original data symbol appears, and outputting the N-bit data as the original data symbol.
4. The lossless data compression algorithm based on the adaptive instantaneous entropy of claim 3, wherein the lookup table updating process specifically comprises the steps of storing an input data symbol in a lowest bit of the lookup table to update the lookup table when the matching between the input data symbol and the lookup table is unsuccessful at a compression end, and moving the matched data symbol to the lowest bit of the lookup table to update the lookup table when the matching between the input data symbol and the lookup table is successful.
5. The adaptive instantaneous entropy-based lossless data compression algorithm according to claim 4, wherein in the lookup table update flow, when the table is full, the instantaneous entropy is at this time , A match counter C is defined, and when the input data symbol matches the data symbol in the table for C times, the most significant bit in the lookup table is popped up and discarded, and the value of C can be customized.

Description

Lossless data compression algorithm based on self-adaptive instantaneous entropy Technical Field The invention belongs to the technical field of data processing, and particularly relates to a lossless data compression algorithm based on self-adaptive instantaneous entropy. Background The advent of the information age has produced hundreds of millions of information data each day, which presents a significant challenge for the storage, transmission and processing of data. In order to save memory space and increase data transmission speed, it is necessary to use data compression techniques. The data compression technology can reduce the data storage cost and improve the data transmission speed by removing excessive redundant information, so that the data compression technology has important research significance and practical value. The data compression algorithm is a premise and a foundation for realizing data compression, and can be divided into lossless compression and lossy compression. Lossy compression refers to compressing a part of insignificant data under the condition that a certain amount of information is allowed to be lost, and lossless compression preserves the original information by encoding the data, so that the original value can be recovered from the compressed data without loss of data quality. Lossless data compression algorithms are mainly divided into two categories according to compression models, namely a statistical-based compression algorithm and a dictionary-based compression algorithm. Arithmetic coding is a well-known compression algorithm based on a statistical model, which adopts shannon information entropy, and the aim of data compression is achieved by converting the whole input character string into a numerical value with a very long length and the number of binary bits required for representing the numerical value is smaller than that of source data; huffman coding is also a compression algorithm based on a statistical model, and the working principle of the Huffman coding is that after the frequency of occurrence of data characters is ordered, a binary tree is built from bottom to top, then the binary tree is traversed from a root node until all characters appear at the nodes of the binary tree, then all left subtrees are assigned with 0, all right subtrees are assigned with 1, the code word of an original symbol is determined, and the data compression is completed by assigning the shortest code to the most common data mode. Another dictionary model-based lossless compression algorithm is to replace complex original data strings with simple codes, such as the LZW algorithm, which stores each first occurring string in a string table, and uses unique numbers as its codes instead of strings, and stores only its codes instead of the original strings during compression, thereby performing data compression, and uses a string table generated during compression, which is discarded after compression or decompression is completed, without taking up additional space. The Huffman coding technique has simple operation, can reduce the average length of the character string after coding, but needs to calculate the frequency of the symbol appearing in the data stream, and has longer compression time. And when the probability difference of occurrence of the data symbols is not obvious, the coding technology effect is not ideal. Arithmetic coding can theoretically achieve a good compression rate, but in practice, the implementation is very complex, requiring a large memory space and power consumption. The compression effect based on the dictionary compression algorithm is closely related to the repeatability of the occurrence of data and the size of the dictionary, so that the compression effect is not fixed. Compression algorithms based on deep learning can use a large number of computations that are specific to processing image data, but do not process the data stream, and cannot be applied to text data stream compression. Disclosure of Invention In order to solve the technical problems, the invention provides a lossless data compression algorithm based on adaptive instantaneous entropy. The algorithm receives continuous and rapid input data streams at a compression end, calculates instantaneous entropy based on the occupancy rate of a lookup table, adaptively entropy codes the input data streams, outputs compressed data streams, starts decoding operation when the decompression end receives the first bit of the compressed data streams, always keeps the same lookup table as the compression end, and finally recovers original data symbols. The invention adopts the following specific technical scheme: a lossless data compression algorithm based on self-adaptive instantaneous entropy is characterized in that the matching result and occupied rate of a lookup table are considered in the operation of the compression algorithm, the instantaneous entropy is calculated according to a received data s