Search

CN-120528436-B - Huffman coding method, coding device, chip and storage medium

CN120528436BCN 120528436 BCN120528436 BCN 120528436BCN-120528436-B

Abstract

The invention discloses a Huffman coding method, a coding device, a chip and a storage medium. The Huffman coding method comprises the steps of obtaining a character string to be coded and associated coding information, wherein the coding information comprises the length of the character string to be coded, an alphabet to which the character string belongs and a limit code length to be coded, scanning symbols of the character string based on a preset scanning length, counting the occurrence frequency of each symbol in the scanned symbols, generating a symbol frequency table corresponding to the character string to be coded based on the preset scanning length, determining a reference frequency for frequency adjustment according to the preset scanning length and the limit code length, optimizing the symbol frequency table based on the reference frequency, generating a Huffman codebook for the character string to be coded according to the optimized symbol frequency table, coding the character string to be coded by utilizing the Huffman codebook, and outputting a corresponding coding result. The method can adapt to different scanning progress, constructs the codebook meeting the limit code length, and effectively improves the Huffman coding speed.

Inventors

  • Tang Nianqi

Assignees

  • 希奥端(深圳)计算技术有限公司

Dates

Publication Date
20260512
Application Date
20250326

Claims (16)

  1. 1. A Huffman coding method characterized in that the Huffman coding method comprises: Acquiring a character string to be encoded and associated encoding information, wherein the encoding information comprises the length of the character string to be encoded, an alphabet to which the character string to be encoded belongs and a limit code length to be encoded; Scanning symbols of the character string to be encoded based on a preset scanning length, counting the occurrence frequency of each symbol in the scanned symbols, and generating a symbol frequency table corresponding to the character string to be encoded based on the preset scanning length, wherein the preset scanning length is smaller than the length of the character string to be encoded; Determining a reference frequency for frequency adjustment by utilizing a Fibonacci sequence according to the preset scanning length, an alphabet to which the character string to be encoded belongs and the limit code length, so as to optimize the symbol frequency table based on the reference frequency; generating a Huffman codebook aiming at the character string to be coded according to the optimized symbol frequency table, which comprises the following steps: performing ascending order on the optimized symbol frequency table, and recording position information of each symbol frequency after ordering; according to the ordered symbol frequency table and the position information, circularly combining the frequencies of the symbols, and determining the Huffman code length of each symbol of the character string to be encoded so as to generate a code length table of the character string to be encoded; Generating a Huffman codebook used for encoding the character string to be encoded based on the code length table; If the maximum code length in the generated code length table is greater than the limit code length, adjusting the code length table based on the maximum code length and the limit code length, so that all the code lengths in the code length table are smaller than or equal to the limit code length; and coding the character string to be coded by utilizing the Huffman codebook, and outputting a corresponding coding result.
  2. 2. The Huffman coding method according to claim 1, characterized in that the optimizing the symbol frequency table based on the reference frequency comprises: Traversing the counted frequency of occurrence of each symbol, comparing the frequency of occurrence of the symbol with the reference frequency, if the frequency of occurrence of the symbol is smaller than the reference frequency, adjusting the frequency of occurrence of the symbol to the reference frequency, otherwise, keeping the frequency of occurrence of the symbol unchanged.
  3. 3. The Huffman coding method according to claim 1, characterized in that the reference frequency is calculated using the following formula: Wherein, the For the reference frequency to be used, For the number of Fibonacci, For the purpose of the limit code length, For the predetermined scan length to be a function of the scan length, And the alphabet to which the character string to be encoded belongs.
  4. 4. The Huffman coding method according to claim 1, wherein the cyclically combining the frequencies of the symbols according to the ordered symbol frequencies and the position information, determining the Huffman code length of each symbol of the character string to be coded to generate the code length table of the character string to be coded comprises: acquiring and combining two symbols with the lowest frequency in the symbol frequency table; adjusting the symbol frequency based on the frequency obtained by combining the two symbols, and maintaining the ascending arrangement of the symbol frequency sequence; incrementally adjusting the code length of the combined symbol and updating the position of the combined symbol; Repeating the above process until merging to the last symbol to obtain the Huffman code length of all symbols; And adjusting the code length sequence of all the symbols by using a permutation function so that the code length sequence of the generated code length table corresponds to the original arrangement sequence of the symbols.
  5. 5. The Huffman coding method according to claim 1, characterized in that the generating a codebook for coding the character string to be coded based on the code length table comprises: Traversing the code length table, and counting the occurrence times of each code length; determining the initial prefix of the code length according to the occurrence times of the code length; For each symbol of the character string to be coded, converting the initial prefix of the corresponding code length into binary representation, and intercepting low bits based on the corresponding code length to be used as a code word of the symbol; the initial prefix of the corresponding code length is incrementally updated to allocate a different codeword for the next symbol with the code length based on the updated initial prefix.
  6. 6. The Huffman coding method according to claim 1, characterized in that the adjusting the code length table comprises: acquiring two symbols in the code length table corresponding to a first preset code length, wherein the first preset code length is associated with the maximum code length in the code length table; acquiring a symbol corresponding to a second preset code length in the code length table, wherein the second preset code length is associated with the limit code length; Adjusting the code length of the obtained symbol based on the first preset code length, the second preset code length, and preset increasing depth and decreasing depth; Repeating the above process, and circularly adjusting the code length table until the maximum code length in the code length table is smaller than or equal to the limit code length.
  7. 7. The Huffman coding method according to claim 1, wherein the frequency of occurrence of each symbol in the scanned symbols is characterized by the following formula: Wherein, the Is an illustrative variable, and the illustrative variable is characterized by the following formula: Wherein, the Is a symbol Is used for the frequency of occurrence of (a), For the predetermined scan length to be a function of the scan length, For the scanned position is Is a symbol sequence of (a).
  8. 8. A Huffman coding apparatus, characterized in that the Huffman coding apparatus comprises: The device comprises an acquisition unit, a coding unit and a coding unit, wherein the acquisition unit is used for acquiring a character string to be coded and associated coding information, and the coding information comprises the length of the character string to be coded, an alphabet to which the character string to be coded belongs and a limit code length to be coded; The statistics unit is used for scanning the symbols of the character string to be encoded based on a preset scanning length, counting the occurrence frequency of each symbol in the scanned symbols, and generating a symbol frequency table corresponding to the character string to be encoded based on the preset scanning length, wherein the preset scanning length is smaller than the length of the character string to be encoded; An optimizing unit, configured to determine a reference frequency for frequency adjustment by using a Fibonacci sequence according to the preset scan length, an alphabet to which the character string to be encoded belongs, and the constraint code length, so as to optimize the symbol frequency table based on the reference frequency; The generating unit is configured to generate a Huffman codebook for the character string to be encoded according to the optimized symbol frequency table, and includes: performing ascending order on the optimized symbol frequency table, and recording position information of each symbol frequency after ordering; according to the ordered symbol frequency table and the position information, circularly combining the frequencies of the symbols, and determining the Huffman code length of each symbol of the character string to be encoded so as to generate a code length table of the character string to be encoded; Generating a Huffman codebook used for encoding the character string to be encoded based on the code length table; If the maximum code length in the generated code length table is greater than the limit code length, adjusting the code length table based on the maximum code length and the limit code length, so that all the code lengths in the code length table are smaller than or equal to the limit code length; and the output unit is used for encoding the character string to be encoded by utilizing the Huffman codebook and outputting a corresponding encoding result.
  9. 9. The Huffman coding apparatus according to claim 8, wherein the optimizing unit is configured to optimize the symbol frequency table based on the reference frequency, comprising: Traversing the counted frequency of occurrence of each symbol, comparing the frequency of occurrence of the symbol with the reference frequency, if the frequency of occurrence of the symbol is smaller than the reference frequency, adjusting the frequency of occurrence of the symbol to the reference frequency, otherwise, keeping the frequency of occurrence of the symbol unchanged.
  10. 10. The Huffman coding apparatus according to claim 8, wherein the reference frequency is calculated using the following formula: Wherein, the For the reference frequency to be used, For the number of Fibonacci, For the purpose of the limit code length, For the predetermined scan length to be a function of the scan length, And the alphabet to which the character string to be encoded belongs.
  11. 11. The Huffman coding apparatus according to claim 8, wherein the generating unit is configured to determine a Huffman code length of each symbol of the character string to be coded to generate the code length table of the character string to be coded by circularly combining the frequencies of the symbols according to the ordered symbol frequencies and the position information, comprising: acquiring and combining two symbols with the lowest frequency in the symbol frequency table; Adjusting the symbol frequency based on the frequency obtained by combining the two symbols, and maintaining the ascending arrangement of the symbol frequencies; incrementally adjusting the code length of the combined symbol and updating the position of the combined symbol; Repeating the above process until merging to the last symbol to obtain the Huffman code length of all symbols; And adjusting the code length sequence of all the symbols by using a permutation function so that the code length sequence of the generated code length table corresponds to the original arrangement sequence of the symbols.
  12. 12. The Huffman coding apparatus according to claim 8, wherein the generating unit is configured to generate a codebook for coding the character string to be coded based on the code length table, comprising: Traversing the code length table, and counting the occurrence times of each code length; determining the initial prefix of the code length according to the occurrence times of the code length; For each symbol of the character string to be coded, converting the initial prefix of the corresponding code length into binary representation, and intercepting low bits based on the corresponding code length to be used as a code word of the symbol; the initial prefix of the corresponding code length is incrementally updated to allocate a different codeword for the next symbol with the code length based on the updated initial prefix.
  13. 13. The Huffman coding apparatus according to claim 8, wherein the generating unit is configured to adjust the code length table, comprising: acquiring two symbols in the code length table corresponding to a first preset code length, wherein the first preset code length is associated with the maximum code length in the code length table; acquiring a symbol corresponding to a second preset code length in the code length table, wherein the second preset code length is associated with the limit code length; Adjusting the code length of the obtained symbol based on the first preset code length, the second preset code length, and preset increasing depth and decreasing depth; Repeating the above process, and circularly adjusting the code length table until the maximum code length in the code length table is smaller than or equal to the limit code length.
  14. 14. The Huffman coding apparatus of claim 8, wherein the frequency of occurrence of each of the scanned symbols is characterized by the following formula: Wherein, the Is an illustrative variable, and the illustrative variable is characterized by the following formula: Wherein, the Is a symbol Is used for the frequency of occurrence of (a), For the predetermined scan length to be a function of the scan length, For the scanned position is Is a symbol sequence of (a).
  15. 15. A chip comprising a processor and a memory, wherein the memory stores a program capable of running on the processor, the processor being configured to implement the Huffman coding method according to any of the preceding claims 1-7 when executing the program.
  16. 16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the Huffman coding method according to any of claims 1-7.

Description

Huffman coding method, coding device, chip and storage medium Technical Field The present invention relates to the field of communications computing technologies, and in particular, to a Huffman coding method, a coding device, a chip, and a storage medium. Background Huffman codes are a type of prefix coding with compression effects that obtain a codebook based on statistical distribution by assigning shorter codewords to more frequently occurring symbols, while assigning longer codewords to less frequently occurring symbols. Thus, the original character string can be encoded based on the obtained codebook, and the original character string can be uniquely restored by decoding. Therefore, the length after coding can be smaller than the length of the original character string through Huffman coding, thereby achieving the compression effect. In practical applications, the maximum code length in Huffman codes is typically limited. This is advantageous for preserving the codebook and for constructing a compatible coding scheme. For example, in Deflate standard (rfc 1951), the maximum code length of Huffman code for the natural/length or distance alphabet is limited to 15. In addition, huffman codes are entropy codes, and statistics is needed to be carried out on the occurrence frequency of symbols, so that a codebook with better compression effect is constructed. However, in the implementation process, the data to be compressed needs to be scanned and counted, and this process limits the speed of Huffman coding. Therefore, the existing coding method cannot complete statistics of symbol frequencies through less scanning, or cannot quickly generate a codebook meeting the limit length. Disclosure of Invention The object of the present invention is to propose a coding scheme that solves at least to some extent one of the technical problems mentioned above. In order to achieve the above object, a first aspect of the present invention provides a Huffman coding method, which includes obtaining a character string to be coded and associated coding information, where the coding information includes a length of the character string to be coded, an alphabet to which the character string belongs, and a limited code length to be coded, scanning symbols of the character string based on a preset scanning length, counting occurrence frequencies of each symbol in the scanned symbols, generating a symbol frequency table corresponding to the character string to be coded based on the preset scanning length, determining a reference frequency for frequency adjustment according to the preset scanning length and the limited code length, optimizing the symbol frequency table based on the reference frequency, generating a Huffman codebook for the character string to be coded according to the optimized symbol frequency table, and coding the character string to be coded by using the Huffman codebook, and outputting a corresponding coding result. According to one embodiment of the invention, the optimizing the symbol frequency table based on the reference frequency comprises traversing the counted occurrence frequency of each symbol, comparing the occurrence frequency of the symbol with the reference frequency, adjusting the occurrence frequency of the symbol to the reference frequency if the occurrence frequency of the symbol is smaller than the reference frequency, otherwise keeping the occurrence frequency of the symbol unchanged. According to one embodiment of the invention, the reference frequency is calculated using the following formula: wherein p is the reference frequency, F k+3 is the Fibonacci number, k is the limit code length, n is the preset scan length, And the alphabet to which the character string to be encoded belongs. According to one embodiment of the invention, the Huffman codebook for the character string to be encoded is generated according to the optimized symbol frequency table, and comprises the steps of ascending sort of symbol frequencies in the adjusted symbol frequency table, recording position information of each symbol frequency after sorting, circularly combining the frequencies of the symbols according to the sorted symbol frequencies and the position information, determining the Huffman code length of each symbol of the character string to be encoded to generate a code length table of the character string to be encoded, and generating the Huffman codebook for encoding the character string to be encoded based on the code length table. According to one embodiment of the invention, the method comprises the steps of circularly combining the frequencies of the symbols according to the ordered symbol frequencies and the position information, determining the Huffman code length of each symbol of the character string to be encoded to generate a code length table of the character string to be encoded, acquiring and combining the frequencies of the two lowest-frequency symbols in the symbol frequency table, adjustin