KR-20260064607-A - METHOD OF ENCODING/DECODING FOR PERFORMING MACHINE VISION TASK AND COMPUTRE READABLE RECORDING MEDIUM STORING INSTRUCTIONS FOR IMPLEMENTING ENCODING METHOD

KR20260064607AKR 20260064607 AKR20260064607 AKR 20260064607AKR-20260064607-A

Abstract

An image decoding method for performing a machine vision task according to the present disclosure may include: a step of decoding an image; a step of applying a post-processing filter to the decoded image; and a step of restoring an image for a machine vision task based on the image to which the post-processing filter is applied. In this case, the post-processing filter may be trained based on a codec based on a differentiable neural network different from the codec used to decode the image.

Inventors

곽상운
정순흥
고종환
박찬웅
정성문

Assignees

한국전자통신연구원

Dates

Publication Date: 20260507
Application Date: 20251029
Priority Date: 20241030

Claims (16)

Step to decode the video; A step of applying a post-processing filter to the above-decoded image; and The method includes a step of restoring an image for a machine vision task based on an image to which the above-mentioned post-processing filter has been applied, wherein An image decoding method for performing machine vision tasks, characterized in that the above post-processing filter is trained based on a codec based on a differentiable neural network different from the codec used to decode the image.
In Article 1, An image decoding method for performing a machine vision task, characterized in that the above post-processing filter comprises a first branch composed of a deep learning neural network for image segmentation, a second branch connected to basic processing modules, and a skip connection in which input to the first branch and the second branch is skipped.
In Article 1, An image decoding method for performing machine vision tasks, characterized in that the above-mentioned post-processing filter is learned based on a loss function based on machine vision task performance.
In Article 1, An image decoding method for performing machine vision tasks, characterized in that a preprocessing filter on the encoder side is used when learning the above-mentioned postprocessing filter.
In Article 1, A method for decoding images for performing a machine vision task, characterized in that, when the decoded image is in a first type format and the post-processing filter is trained based on an image in a second type format, the decoded image is changed to the second type format, and the image with the changed format is input to the post-processing filter.
In Article 5, An image decoding method for performing a machine vision task, characterized in that the image of the second type format output from the above post-processing filter is converted back into the original first type image format.
In Article 1, A method for decoding images for performing machine vision tasks, characterized in that when the decoded image has an image format in which the sizes of the luminance component image and the chroma component image are different, the chroma component image is upsampled by the size of the luminance component image, and the upsampled image is input to the post-processing filter.
In Article 7, An image decoding method for performing machine vision tasks, characterized in that the chroma component image output from the above post-processing filter is downsampled to its original size.
Step of applying a preprocessing filter to the input image; A step of obtaining an image to be encoded from the output image of the above-mentioned preprocessing filter; and The method includes the step of encoding the above-mentioned image to be encoded, An image encoding method for performing machine vision tasks, characterized in that the above-mentioned preprocessing filter is trained based on a differentiable neural network model.
In Article 9, An image encoding method for performing a machine vision task, characterized in that the above-mentioned preprocessing filter is composed of a first branch consisting of a deep learning neural network for image segmentation and a second branch connected to basic processing modules.
In Article 9, An image encoding method for performing machine vision tasks, characterized in that the above-mentioned preprocessing filter is trained based on a first loss function based on a bit rate and a second loss function based on machine vision task performance.
In Article 11, An image encoding method for performing a machine vision task, characterized in that the weighted sum result of a first loss according to the first loss function and a second loss according to the second loss function is used for learning the preprocessing filter.
In Article 9, A method for encoding images for performing machine vision tasks, characterized in that, when the input image is in a first type format and the preprocessing filter is trained based on an image in a second type format, the input image is changed to the second type format, and the format-changed image is input to the preprocessing filter.
In Article 13, An image encoding method for performing a machine vision task, characterized in that the image of the second type format output from the above preprocessing filter is converted back into the original first type image format.
In Article 9, A method for image encoding for performing a machine vision task, characterized in that when the input image has an image format in which the sizes of the luminance component image and the chroma component image are different, the chroma component image is upsampled by the size of the luminance component image, and the upsampled image is input to the preprocessing filter.
Step of applying a preprocessing filter to the input image; A step of obtaining an image to be encoded from the output image of the above-mentioned preprocessing filter; and The method includes the step of encoding the above-mentioned image to be encoded, A computer-readable recording medium storing instructions for performing an image encoding method for performing a machine vision task, wherein the above-mentioned preprocessing filter is characterized by being learned based on a differentiable neural network model.

Description

Method of encoding/decoding for performing machine vision tasks and computer-readable recording medium storing instructions for executing the encoding method The present disclosure relates to an image encoding/decoding method and apparatus for performing machine vision tasks. Traditionally, video encoding and decoding technologies have achieved improved compression efficiency and image quality by considering the human visual system. However, in the future, video encoding and decoding technologies are expected to be widely utilized not only for human vision but also in machine vision fields such as surveillance, intelligent transportation, smart cities, and intelligent factories. Accordingly, there is a need to develop image encoding/decoding technology capable of achieving high-efficiency compression and recognition accuracy by simultaneously considering both human vision and machine vision. FIG. 1 is a block diagram of an image encoder according to one embodiment of the present disclosure. FIG. 2 is a block diagram of an image decoder according to one embodiment of the present disclosure. FIGS. 3 and 4 show the configuration of an image encoder and an image decoder with additional configurations for applying a preprocessing filter and a postprocessing filter according to one embodiment of the present disclosure. FIG. 5 illustrates the structure of a preprocessing filter network according to one embodiment of the present disclosure. FIG. 6 illustrates the structure of a post-processing filter network according to one embodiment of the present disclosure. FIG. 7 is a diagram illustrating the learning process of a preprocessing filter network according to one embodiment of the present disclosure. FIG. 8 is a diagram illustrating the learning process of a post-processing filter network according to one embodiment of the present disclosure. Figure 9 shows an example where a YUV format image is input into a filter network trained on an RGB format image. Figure 10 shows an example of upsampling a YUV420 image to a YUV444 image for filter network training. FIG. 11 is a flowchart of an image preprocessing method according to one embodiment of the present disclosure. FIG. 12 is a flowchart of an image post-processing method according to one embodiment of the present disclosure. FIG. 13 shows an example of the preprocessing filter network and postprocessing filter network proposed in the present disclosure being applied to a system that encodes/decodes a feature map. The present disclosure is subject to various modifications and may have various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present disclosure to specific embodiments, and it should be understood that it includes all modifications, equivalents, and substitutions that fall within the spirit and scope of the present disclosure. Similar reference numerals in the drawings refer to the same or similar functions across various aspects. The shapes and sizes of elements in the drawings may be exaggerated for clearer explanation. The detailed description of exemplary embodiments described below refers to the accompanying drawings, which illustrate specific embodiments as examples. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments. It should be understood that various embodiments are different but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the present disclosure in relation to one embodiment. It should also be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the embodiment. Accordingly, the following detailed description is not intended to be taken in a limiting sense, and the scope of exemplary embodiments is limited only by the appended claims, together with all equivalents to those claimed therein, provided they are properly described. In this disclosure, terms such as first, second, etc. may be used to describe various components, but said components should not be limited by said terms. Such terms are used solely for the purpose of distinguishing one component from another. For example, without departing from the scope of this disclosure, the first component may be named the second component, and similarly, the second component may be named the first component. The term "and/or" includes a combination of a plurality of related described items or any of a plurality of related described items. Where it is stated that any component of the present disclosure is "connected" or "connected" to another component, it should be understood that it may be directly connected or connected to that other component, or that the