Search

US-20260127437-A1 - METHOD AND APPARATUS FOR DYNAMIC DETERMINATION OF DATA COMPRESSION AND DECOMPRESSION METHOD IN NEURAL NETWORK MODEL

US20260127437A1US 20260127437 A1US20260127437 A1US 20260127437A1US-20260127437-A1

Abstract

A method and apparatus for a dynamic determination of a data compression and decompression method in a neural network model are provided. The apparatus for a dynamic determination of a data compression method computes an importance value based on input data and information related to the input data, determines, based on the importance value, whether to perform lossy compression or lossless compression on the input data, and performs, using the compression parameter, the lossy compression or the lossless compression on the input data, based on a result of the determination. In addition, the apparatus for a dynamic determination of a data decompression method decompresses data that is compressed by the apparatus for a dynamic determination of a data compression method.

Inventors

  • Suji KIM
  • Hyoa Kang
  • Sung Kwang Cho
  • Hee Min CHOI
  • Dokwan OH

Assignees

  • SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date
20260507
Application Date
20250430
Priority Date
20241106

Claims (20)

  1. 1 . A method for determining data compression method in a neural network model, the method comprising: computing an importance value based on input data and information related to the input data; determining, based on the importance value, whether to perform lossy compression or lossless compression on the input data; and performing, using a compression parameter, the lossy compression or the lossless compression on the input data, based on a result of the determination.
  2. 2 . The method of claim 1 , wherein the determining of whether to perform the lossy compression or the lossless compression comprises: determining to perform the lossy compression on the input data based on the importance value being less than a predetermined threshold; and determining to perform the lossless compression on the input data based on the importance value being greater than or equal to the predetermined threshold.
  3. 3 . The method of claim 1 , wherein the compression parameter comprises: a first compression parameter corresponding to the lossy compression or a second compression parameter corresponding to the lossless compression, generated based on the importance value, and wherein the performing of the lossy compression or the lossless compression comprises: performing the lossy compression using the first compression parameter; and performing the lossless compression using the second compression parameter.
  4. 4 . The method of claim 1 , wherein the compression parameter comprises: a predetermined third compression parameter corresponding to the lossy compression or a predetermined fourth compression parameter corresponding to the lossless compression parameter, the lossy compression is performed using the predetermined third compression parameter; and the lossless compression is performed using the predetermined fourth compression parameter.
  5. 5 . The method of claim 1 , wherein the computing of the importance value comprises: computing the importance value based on at least one of information on a layer block that outputs the input data or information on the neural network model.
  6. 6 . The method of claim 1 , wherein the computing of an importance value is performed by a first neural network model, wherein the first neural network model is trained based on data obtained through the neural network model and an importance value corresponding to the data.
  7. 7 . The method of claim 6 , wherein the performing of the lossy compression is performed by a second neural network model, the performing of the lossless compression is performed by a third neural network model, and the second neural network model and the third neural network model are trained using an objective function that reduces a data rate of the input data.
  8. 8 . The method of claim 7 , wherein the neural network model is configured to perform, based on a training result of at least one of the first neural network model, the second neural network model, or the third neural network model, at least one of: updating a parameter value of the neural network model; or changing a structure of the neural network model.
  9. 9 . The method of claim 1 , wherein the changing of the structure of the neural network model comprises at least one of: pruning a layer block of low importance among a plurality of layer blocks of the neural network model; or changing a channel of the neural network model for the layer block of the low importance, wherein the layer block of the low importance has a lowest importance among the plurality of layer blocks or has an importance below a predetermined threshold.
  10. 10 . The method of claim 1 , wherein the neural network model comprises: a plurality of layer blocks, and each of at least some of the plurality of layer blocks is configured to: transfer data that is output from a corresponding layer block to a next layer block, based on predetermined information.
  11. 11 . The method of claim 1 , wherein the neural network model comprises: a plurality of layer blocks, wherein the computing of an importance value comprises: computing the importance value based on data that is output from a corresponding layer block, among the plurality of layer blocks, and information on the output data, and wherein the determining of whether to perform the lossy compression or the lossless compression further comprises: transmitting the output data to a next layer block, among the plurality of layer blocks, based on the importance value of the corresponding layer block.
  12. 12 . The method of claim 1 , wherein the determining of whether to perform the lossy compression or the lossless compression on the input data comprises: determining whether to perform the lossy compression or the lossless compression based on the importance value and hardware resources.
  13. 13 . A method for determining a data decompression method in a neural network model, the method comprising: determining, based on input compressed data and a compression parameter, whether lossy compression or lossless compression has been performed on the input compressed data; and based on a result of the determination, performing, using the compression parameter, lossy decompression or lossless decompression on the input compressed data to obtain decompressed data.
  14. 14 . The method of claim 13 , wherein the lossy decompression is performed by a second neural network model, the lossless decompression is performed by a third neural network model, and the second neural network model is trained using an objective function that reduces a difference between the decompressed data and original data.
  15. 15 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 .
  16. 16 . An apparatus for determining a data compression method in a neural network model, the apparatus comprising: at least one memory configured to store compressed input data and a compression parameter; and at least one processor configured to execute instructions retrieved from the at least one memory to: compute an importance value based on input data and information related to the input data; determine, based on the importance value, whether to perform lossy compression or lossless compression on the input data; and perform, using the compression parameter, the lossy compression or the lossless compression on the input data, based on a result of the determination.
  17. 17 . The apparatus of claim 16 , wherein the at least one processor is further configured to: select, based on the importance value, a compression method from among a plurality of compression methods for performing the lossy compression or the lossless compression.
  18. 18 . The apparatus of claim 17 , wherein the at least one memory comprises at least one main memory and at least one cache memory, the at least one processor is executed using the at least one cache memory, and the input data and the compression parameter are stored in the at least one main memory.
  19. 19 . An apparatus for determining a data decompression method in a neural network model, the apparatus comprising: at least one memory configured to store compressed data and a compression parameter; at least one processor configured to execute instructions retrieved from the at least one memory to: determine, based on the compressed data and the compression parameter, whether lossy compression or lossless compression has been performed on the compressed data; and perform, using the compression parameter, the lossy decompression or the lossless decompression on the compressed data, based on a result of the determination.
  20. 20 . The apparatus of claim 19 , wherein the at least one memory comprises at least one main memory and at least one cache memory, the at least one processor is executed using the at least one cache memory, and the compressed data and the compression parameter are stored in the at least one main memory.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims priority from Korean Patent Application No. 10-2024-0156300, filed on Nov. 6, 2024, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes. BACKGROUND 1. Field Methods and apparatuses consistent with embodiments relate to a method and apparatus for a dynamic determination of a data compression and decompression method in a neural network model. 2. Description of the Related Art In recent years, artificial neural networks based on the Transformer structure have become the dominant structure for large-scale generative models in various domains such as language, vision, and multimodal processing. Transformer models have the power to process large amounts of data and provide advanced prediction and generation capabilities, but they require large amounts of hardware resources to implement. To effectively utilize limited hardware resources, efficient data compression and decompression techniques are essential. SUMMARY One or more embodiments may address at least the above problems and/or disadvantages and other disadvantages not described above. Also, the embodiments are not required to overcome the disadvantages described above, and an embodiment may not overcome any of the problems described above. According to an aspect of an embodiment, there is provided a method for a dynamic determination of a data compression method in a neural network model, the method including computing an importance value based on input data and information related to the input data, determining, based on the importance value, whether to perform lossy compression or lossless compression on the input data, and performing, using a compression parameter, the lossy compression or the lossless compression on the input data, based on a result of the determination. The compression parameter may include a first compression parameter corresponding to the lossy compression or a second compression parameter corresponding to the lossless compression, generated based on the importance value, and the performing of lossy compression or lossless compression may include performing lossy compression using the first compression parameter and performing lossless compression using the second compression parameter. The compression parameter may include a predetermined third compression parameter corresponding to the lossy compression or a predetermined fourth compression parameter corresponding to the lossless compression parameter, and the performing of lossy compression or lossless compression may include performing lossy compression using the third compression parameter and performing lossless compression using the fourth compression parameter. The computing of an importance value may include deriving the importance value based on at least one of information on a layer block that outputs the input data or information on the neural network model. The computing of an importance value may be performed by a first neural network model, wherein the first neural network model may be trained based on a plurality of pieces of data obtained through the neural network model and an importance value corresponding to each of the plurality of pieces of data. The performing of lossy compression may be performed by a second neural network model, the performing of lossless compression may be performed by a third neural network model, and the second neural network model and the third neural network model may be trained using an objective function that reduces a data rate of the input data. The neural network model may be configured to perform, based on a training result of at least one of the first neural network model, the second neural network model, or the third neural network model, at least one of updating a parameter value of the neural network model or changing a structure of the neural network model. The changing of the structure of the neural network model may include at least one of pruning a layer block of low importance among a plurality of layer blocks of the neural network model or changing a channel of the neural network model for the layer block of the low importance. The layer block of the low importance has a lowest importance among the plurality of layer blocks or has an importance below a predetermined threshold. The neural network model may include a plurality of layer blocks, and each of at least some of the plurality of layer blocks may be configured to transfer data that is output from a corresponding layer block to a next layer block, based on predetermined information. The neural network model may include a plurality of layer blocks, the computing of an importance value may include, computing the importance value based on data that is output from a corresponding layer block, among the plurality of layer blocks, and information on the output data, and the determining of whether to perform lossy compression or lossl