KR-20260063650-A - APPARATUS AND METHOD FOR CONTEXT-AWARE SPATIAL VARIABLE-LENGTH CODING/DECODING
Abstract
The present invention relates to image compression and transmission using a neural network. The spatial dimension context-aware variable-length encoding device according to the present invention receives an image or a feature of the image as input, outputs a pixel-wise importance map of the image using a neural network, samples the image based on the importance map, and encodes the sampled image into a communication symbol. By concentrating resources according to importance while using a fixed communication symbol length, the device has the effect of improving image compression efficiency and image quality.
Inventors
- 채찬병
- 김윤태
- 양시윤
- 유한주
Assignees
- 연세대학교 산학협력단
Dates
- Publication Date
- 20260507
- Application Date
- 20241030
Claims (10)
- Memory containing one or more instructions; and A processor that executes one or more instructions stored in the memory above; Includes, The above processor is, An image or a feature of the image is received as input, and a pixel-wise importance map of the image is output using a neural network. Sample the image based on the above importance map, and A spatially context-aware variable-length encoding device characterized by encoding the above-mentioned sampled image into a communication symbol.
- In paragraph 1, Sampling the above image is, Calculate pixel coordinates within the original image corresponding to the sampled image above, and A spatial dimension context-aware variable-length encoding device characterized by determining actual image coordinates within the original image by interpolating the calculated pixel coordinate values.
- In paragraph 2, A spatial dimension context-aware variable-length encoding device characterized in that the above interpolation is performed by an inverse distance weighting method.
- In paragraph 1, A spatially dimensional context-aware variable-length encoding device characterized in that the loss function of the neural network for generating the above importance map uses a loss function separate from the loss function of another neural network.
- In paragraph 4, A spatial dimension context-aware variable-length encoding device characterized in that the loss function of the neural network for generating the above importance map is calculated such that the entropy of the pixel-by-pixel error of the image is maximized.
- Memory containing one or more instructions; and A processor that executes one or more instructions stored in the memory above; Includes, The above processor is, Decode an image sampled from communication symbols generated by an encoder, and A spatially dimensional context-aware variable-length decoding device characterized by inverse sampling the above-decoded image using a pixel-wise importance map of the image used to sample the above-decoded image.
- As a spatially context-aware variable-length encoding method performed by an encoding device comprising one or more processors and memory: A step of receiving an image or a feature of the image as input and outputting a pixel-wise importance map of the image using a neural network; A step of sampling the image based on the importance map; and A step of encoding the above-mentioned sampled image into a communication symbol; A spatial dimension context-aware variable-length encoding method characterized by including
- In Paragraph 7, A spatial dimension context-aware variable-length encoding method characterized by using a loss function of a neural network for generating the above importance map that is separate from the loss function of another neural network.
- In paragraph 8, A spatial dimension context-aware variable-length encoding method characterized in that the loss function of the neural network for generating the above importance map is calculated such that the entropy of the pixel-by-pixel error of the image is maximized.
- A spatially context-aware variable-length decoding method performed by a decoding device comprising one or more processors and memory: A step of decoding an image sampled from a communication symbol generated by an encoder; and A step of inverse sampling the above-decoded image using a pixel-wise importance map of the image used to sample the image; A spatial dimension context-aware variable-length decoding method characterized by including
Description
Apparatus and Method for Context-Aware Spatial Variable-Length Coding/Decoding The present invention relates to a deep learning-based image compression technology, and in particular to an image compression technology optimized for image compression and transmission in a communication environment. A deep learning-based image compression system uses a deep neural network to convert images into encoded bits in order to efficiently compress and transmit images. Entropy coding can be used in the compression process; entropy coding is a coding method that varies the length of the bit sequence representing a symbol based on the probability of that symbol appearing. In deep learning-based compression and transmission systems, this entropy coding is utilized to transmit feature values when transmitting images, by allocating bits of different lengths according to the probability values of the feature values. However, due to the characteristics of this entropy coding, the compressed size varies for each image, and consequently, there is a problem in that the number of communication symbols required for image transmission changes every time. To solve these problems, deep joint source-channel coding is sometimes used instead. Unlike conventional techniques where symbolization processes for compression and transmission are performed separately, deep co-source-channel coding uses a neural network to directly convert images into communication symbols. Deep co-source-channel coding does not involve an entropy coding process; therefore, the features themselves are used as communication symbols and transmitted without entropy coding. Therefore, since the number of features is fixed, the number of communication symbols also remains constant without varying depending on the image; however, because entropy coding is not used, there is a problem with reduced compression performance. The inventors of the present invention have long been making research efforts to solve the problems of entropy coding and deep co-source-channel coding as described above, and have come to complete a spatially context-aware variable-length encoding device and method that can improve compression efficiency through importance-based resource allocation while generating communication symbols of a fixed length. FIG. 1 is a schematic diagram of a spatial dimension context-aware variable-length encoding device according to a preferred embodiment of the present invention. FIG. 2 is a more detailed structural diagram of a processor of a spatial dimension context-aware variable-length encoding device according to a preferred embodiment of the present invention. FIG. 3 shows an example of an image sampled by an importance map of a spatial dimension context-aware variable-length encoding device according to a preferred embodiment of the present invention. FIG. 4 shows a schematic flow of a method for separating and learning the loss function of an importance generation unit of a spatial dimension context-aware variable-length encoding device according to a preferred embodiment of the present invention. FIG. 5 shows an example of a sampling image generated by a spatial dimension context-aware variable-length encoding device according to a preferred embodiment of the present invention. FIG. 6 is a schematic diagram of a spatial dimension context-aware variable-length decoding device according to another preferred embodiment of the present invention. FIG. 7 is a diagram schematically illustrating a backsampling method of a spatial dimension context-aware variable-length decoder according to another preferred embodiment of the present invention. FIG. 8 is a schematic flowchart of a spatial dimension context-aware variable-length encoding method according to another preferred embodiment of the present invention. FIG. 9 is a schematic flowchart of a spatial dimension context-aware variable-length decoding method according to another preferred embodiment of the present invention. The above-mentioned objectives, means, and resulting effects of the present invention will become clearer through the following detailed description in conjunction with the attached drawings, and accordingly, a person skilled in the art to which the present invention pertains will be able to easily implement the technical concept of the present invention. Furthermore, in describing the present invention, if it is determined that a detailed description of known technology related to the present invention may unnecessarily obscure the essence of the present invention, such detailed description will be omitted. The terms used herein are for describing the embodiments and are not intended to limit the invention. In this specification, the singular form includes the plural form as appropriate unless specifically stated otherwise in the text. In this specification, terms such as “comprising,” “providing,” “arranging,” or “having” do not exclude the presence or addition of one or more other components