KR-20260067760-A - METHOD AND APPARATUS FOR DYNAMIC DETERMINATION OF DATA COMPRESSION AND DECOMPRESSION METHOD IN NEURAL NETWORK MODEL

KR20260067760AKR 20260067760 AKR20260067760 AKR 20260067760AKR-20260067760-A

Abstract

A method and apparatus for dynamically determining data compression and restoration in a neural network model according to one embodiment are disclosed. The method and apparatus for dynamically determining a data compression method can derive an importance value based on input data and information related to the input data, and can determine whether to perform lossy compression or lossless compression of the input data based on the importance value, and in response to the determination, can perform lossy compression or lossless compression of the input data using compression parameters. Additionally, the method and apparatus for dynamically determining a data restoration method can restore compressed data according to the method and apparatus for dynamically determining a data compression method.

Inventors

김수지
강효아
조성광
최희민
오도관

Assignees

삼성전자주식회사

Dates

Publication Date: 20260513
Application Date: 20241106

Claims (20)

In a dynamic determination method for a data compression method in a neural network model, A step of deriving an importance value based on input data and information related to the input data; A step of determining whether to perform lossy compression or lossless compression of the input data based on the importance value above; and In response to the above decision, the step of performing lossy compression or lossless compression of the input data using compression parameters including, method.
In paragraph 1, The step of determining whether the above input data is lossy or lossless compressed A step of determining lossy compression of the input data when the importance value is below a predetermined threshold; and A step of determining lossless compression of the input data when the importance value is greater than or equal to the predetermined threshold value. including, method.
In paragraph 1, The above compression parameters are, Based on the importance value above, a first compression parameter generated corresponding to the lossy compression or a second compression parameter generated corresponding to the lossless compression Includes, The above step of lossy compression or lossless compression is, A step of performing lossy compression using the first compression parameter above; and A step of lossless compression using the above second compression parameter including, method.
In paragraph 1, The above compression parameters are, A predetermined third compression parameter corresponding to the above lossy compression or a predetermined fourth compression parameter corresponding to the above lossless compression Includes, The above step of lossy compression or lossless compression is, A step of performing lossy compression using the above-mentioned third compression parameter; and Step of lossless compression using the above-mentioned fourth compression parameter including, method.
In paragraph 1, The step of deriving the above importance value is, A step of deriving the importance value based on at least one of the information of the layer block that outputs the input data and the information of the neural network model. including, method.
In paragraph 1, The step of deriving the above importance value Performed by the first neural network model, The above first neural network model is, Training based on multiple data obtained through the above neural network model and importance values corresponding to each of the multiple data, method.
In paragraph 6, The step of performing the above lossy compression is, Performed by a second neural network model, The step of performing the above lossless compression is, It is performed by a third neural network model, and The above second neural network model and the above third neural network model are, Trained using an objective function that reduces the amount of information (data rate) of the above input data, method.
In Paragraph 7, The above neural network model is, Based on the training results of at least one of the first neural network model, the second neural network model, and the third neural network model, Step of updating parameter values of the above neural network model; and Step of changing the structure of the above neural network model performing at least one of the following method.
In paragraph 1, The step of changing the structure of the above neural network model is, A step of removing (Pruning) layer blocks of low importance among a plurality of layer blocks of the above neural network model; and A step of changing the channels of the neural network model for the layer blocks with low importance. including at least one of, method.
In paragraph 1, The above neural network model It includes multiple layer blocks, At least a portion of each of the above plurality of layer blocks is, Based on predetermined information, transmitting data output from the corresponding layer block to the next layer block, method.
In paragraph 1, The above neural network model It includes multiple layer blocks, The step of deriving the above importance value is, Corresponding to each of at least a portion of the above plurality of layer blocks, A step of deriving an importance value based on data output from the corresponding layer block and information related to the output data. Includes, The step of determining whether the above input data is lossy or lossless compressed is: A step of transferring the output data to the next layer block based on the importance value of the corresponding layer block. including, method.
In paragraph 1, The step of determining whether the above input data is lossy or lossless compressed is: A step of determining whether to perform lossy compression or lossless compression based on the above importance value and hardware resources. including, method.
In a dynamic determination method for a data decompression method in a neural network model, A step of determining whether lossy compression or lossless compression has been performed based on input compressed data and compression parameters; and A step of obtaining restored data by restoring the input compressed data with loss or losslessness using the compression parameters in response to the result of the above judgment. including, method.
In Paragraph 13, The above-mentioned step of restoring loss is, Performed by a second neural network model, The above lossless restoration step is, It is performed by a third neural network model, and The above second neural network model is, The objective function that reduces the difference (distortion) between the above restored data and the original data is trained using, method.
A computer program stored on a computer-readable storage medium in combination with hardware to execute the method of any one of claims 1 to 14.
In a dynamic determination device for a compression method in a neural network model, At least one memory for storing compressed input data and compression parameters; and At least one processor connected to the at least one memory and configured to execute a computer-readable program contained in the at least one memory. Includes, The dynamic determination device of the above compression method is, A calculation module that derives an importance value based on input data and information related to the input data; A determination module that determines whether to perform lossy compression or lossless compression of the input data based on the above importance value; A compression module that, in response to the above decision, uses compression parameters to perform lossy or lossless compression on the input data. including, Dynamic determination device for compression method.
In Paragraph 16, The dynamic determination device of the above compression method is, It includes multiple compression modules, The above judgment module is, Selecting a compression module among the plurality of compression modules for performing lossy compression or lossless compression based on the above importance value, Dynamic determination device for compression method.
In Paragraph 17, The above at least one memory is, At least one main memory and at least one cache memory Includes, The above-mentioned at least one processor is, It is performed using at least one cache memory as described above, and The above input data and the above compression parameters are Stored in at least one main memory above, Dynamic determination device for compression method.
In a dynamic determination device for a data recovery method in a neural network model, At least one memory for storing compressed data and compression parameters; and At least one processor connected to the at least one memory and configured to execute a computer-readable program contained in the at least one memory. Includes, The dynamic determination device of the above restoration method is, A determination module for determining whether lossy compression or lossless compression has been performed based on the above compressed data and the above compression parameters; and A restoration module that restores the compressed data lossily or losslessly using the compression parameters in response to the result of the above judgment. including, Dynamic determination device for restoration method.
In Paragraph 19, The above at least one memory is, At least one main memory and at least one cache memory Includes, The above-mentioned at least one processor is, It is performed using at least one cache memory as described above, and The above compressed data and the above compression parameters are stored in the at least one main memory, Dynamic determination device for restoration method.

Description

Method and apparatus for dynamic determination of data compression and decompression method in a neural network model The following embodiments relate to a method and apparatus for dynamically determining a data compression and restoration method in a neural network model. Recently, in the field of artificial intelligence, artificial neural networks based on the Transformer architecture have established themselves as the representative structure for massive generative models across various domains, including language, vision, and multimodal processing. While Transformer models possess powerful performance capable of processing large-scale data and providing advanced prediction and generative capabilities, their implementation requires massive hardware resources. To effectively utilize these limited hardware resources, efficient data compression and decompression technologies are inevitably required. FIG. 1 is a schematic flowchart of a dynamic determination method for a data compression and recovery method according to one embodiment. FIG. 2 is a diagram illustrating an encoding module for dynamic determination of a data compression method according to one embodiment. FIG. 3 is a diagram illustrating a decoding module for dynamic determination of a data recovery method according to one embodiment. FIGS. 4a to 4c are diagrams illustrating a learning process for a dynamic determination method of a data compression and restoration method according to one embodiment. FIGS. 5a and 5b are configuration diagrams for explaining the dynamic determination of a data compression and recovery method according to one embodiment. FIG. 6 is a flowchart of a dynamic determination method for a data compression and recovery method according to one embodiment. FIG. 7 is a configuration diagram for explaining a data processing device according to one embodiment. FIG. 8 is a diagram of a device configuration for performing a dynamic determination method of a data compression and recovery method according to one embodiment. Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, actual implementations are not limited to the specific embodiments disclosed, and the scope of this specification includes modifications, equivalents, or substitutions included in the technical concept described by the embodiments. Terms such as "first" or "second" may be used to describe various components, but these terms should be interpreted solely for the purpose of distinguishing one component from another. For example, the first component may be named the second component, and similarly, the second component may be named the first component. When it is stated that a component is "connected" to another component, it should be understood that it may be directly connected to or joined to that other component, or that there may be other components in between. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this specification, terms such as "comprising" or "having" are intended to specify the existence of the described features, numbers, steps, actions, components, parts, or combinations thereof, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this specification. Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the attached drawings, identical components are given the same reference numeral regardless of the drawing number, and redundant descriptions thereof will be omitted. Neural network models are designed with a structure that includes multiple layers, and each layer can process input data and, based on this, generate output data to be passed to the next layer. A neural network model can be composed of an input layer, an intermediate layer (or hidden layer), and an output layer, and each layer can perform various operations depending on the purpose of the neural network. Input data is processed in each layer based on weights and biases, and through this process, the model can progressively abstract and learn from the data. The number and configuration of layers may vary depending on the type or purpose of the specific neural network and can perform various functions. Additionally, layers must necessarily contain one or more neurons, and learning can