KR-102964370-B1 - OPERATING SERVER FOR PERFORMING IMAGE CODING BASED ON ARTIFICIAL NEURAL NETWORKS AND OPERATION METHOD THEREOF

KR102964370B1KR 102964370 B1KR102964370 B1KR 102964370B1KR-102964370-B1

Abstract

An operating server for performing video encoding based on an artificial neural network and a method of operating the same are disclosed. The operating server of the present invention includes: a video block generation unit that acquires a plurality of frames for a video, determines the number of blocks for each frame using the frame rate of the video and the frame characteristics of each frame, and generates blocks for each frame based on the determined number of blocks; an important block determination unit that determines the importance of each block, which is a criterion for determining whether each block is a target for encoding using the block characteristics of each block generated for each frame, and determines an important block, which is a block to be encoded among a plurality of blocks generated for each frame based on the importance; and an artificial neural network-based encoding unit that extracts video information corresponding to an encoding target signal to be compressed from the important block, converts the extracted video information into a hidden vector through an artificial neural network-based encoder, and generates a compressed bitstream by entropy encoding the code in which the hidden vector is quantized.

Inventors

문지호
이채형
김상현

Assignees

주식회사 빅태블릿

Dates

Publication Date: 20260513
Application Date: 20250613
Priority Date: 20250610

Claims (9)

As an operating server that performs video encoding based on Artificial Neural Networks (ANN), An image block generation unit that acquires multiple frames of an image, determines the number of blocks for each frame using the frame rate of the image and the frame characteristics of each frame, and generates blocks for each frame based on the determined number of blocks; An important block determination unit that determines the importance of each block, which is a criterion for determining whether each block is a target for encoding, using the block characteristics of each block generated for each frame, and determines an important block, which is a block to be encoded among a plurality of blocks generated for each frame based on the importance; and An artificial neural network-based encoding unit comprising: extracting image information corresponding to an encoding target signal subject to compression in the aforementioned important block, converting the extracted image information into a hidden vector through an artificial neural network-based encoder, and generating a compressed bitstream by entropy encoding the code in which the hidden vector is quantized; An operating server that performs artificial neural network-based image encoding.
In claim 1, The above image block generation unit is, A frame extraction unit that obtains an image requiring image encoding through the operating server from a user terminal linked to the operating server, and extracts a plurality of frames constituting the image; A frame characteristic analysis unit that obtains color change amounts, object number change amounts, and object size change amounts over time for the above plurality of frames as frame characteristics; A frame importance analysis unit that inputs a feature vector including the frame characteristics and the frame rate into a pre-trained artificial neural network to calculate an importance score for each frame; and A frame block analysis unit further comprising: determining the division depth of each frame based on the importance score for each frame calculated by the frame importance analysis unit, and generating a number of blocks according to the division depth. An operating server that performs artificial neural network-based image encoding.
In claim 2, The above important block determination unit is, A block importance map generation unit that calculates a spatial correlation value, a temporal rate of change, an object inclusion index, a visual attention value, a distortion sensitivity index, and a restoreability value as block characteristics for each block constituting each of the above frames, and generates a block importance map with block-unit importance applied using the calculated block characteristics; and A critical block analysis unit that calculates a critical score for each block based on the block critical score map and determines a critical block to be encoded among the plurality of blocks based on the critical score for each block; further comprising An operating server that performs artificial neural network-based image encoding.
In claim 3, The above block unit importance is, It is generated in the form of Fi = f(Sc i , Tv i , Oi i , Sa i , Ds i , Rc i ), a block characteristic vector to which block characteristics for block i, one of the plurality of blocks mentioned above, are applied, and The above Sc i represents a spatial correlation coefficient calculated based on the similarity between block i and an adjacent block for one of the color vectors and feature vectors, and The above Tv i refers to one of the average change amounts of pixel brightness and feature vectors between the previous frame and the subsequent frame based on the time axis of block i position, and The above Oi i represents a binary judgment value indicating whether an object exists, determined through a pre-trained object recognition artificial neural network for block i, and The above Sa i represents the attention value normalized through an artificial neural network for visual attention prediction of block i, and The above Ds i represents a distortion sensitivity index determined based on high-frequency components and texture complexity corresponding to a preset range for block i, and The above Rc i represents a prediction reliability value indicating the degree to which recovery is possible through adjacent blocks in the event that block i is lost, and The block importance map above is, The above block feature vector is input into a pre-trained importance evaluation artificial neural network to calculate the importance score for each block, and the calculated importance score for each block is generated by mapping the calculated importance score for each block to the location of each block. The above important block analysis unit is, Determining, based on the importance score of each block among the plurality of blocks above, a block whose importance score is greater than or equal to a preset threshold score as the important block to be encoded. An operating server that performs artificial neural network-based image encoding.
In claim 4, The above artificial neural network-based encoding unit is, An image information extraction unit that extracts image information including at least one of pixel values, feature vectors, and residual information from each block determined as the above important block; A vector transformation unit that inputs the above image information into the above artificial neural network-based encoder and converts the input image information into a hidden vector in a low-dimensional latent space; and A compression bitstream generation unit that generates the compression bitstream by converting the above hidden vector into a quantized code through a pre-trained artificial neural network, predicting a context-based probability distribution using quantized code values corresponding to blocks adjacent in the horizontal and vertical directions of the important block within the same frame of the quantized code, and performing entropy encoding using the probability distribution; further comprising An operating server that performs artificial neural network-based image encoding.
As a system that performs image encoding based on Artificial Neural Networks (ANN), A user terminal providing an image to be the subject of image encoding; and An operating server that performs artificial neural network-based image encoding for an image provided by the user terminal; comprising The above operating server is, An image block generation unit that acquires multiple frames of an image, determines the number of blocks for each frame using the frame rate of the image and the frame characteristics of each frame, and generates blocks for each frame based on the determined number of blocks; An important block determination unit that determines the importance of each block, which is a criterion for determining whether each block is a target for encoding, using the block characteristics of each block generated for each frame, and determines an important block, which is a block to be encoded among a plurality of blocks generated for each frame based on the importance; and An artificial neural network-based encoding unit comprising: extracting image information corresponding to an encoding target signal subject to compression in the aforementioned important block, converting the extracted image information into a hidden vector through an artificial neural network-based encoder, and generating a compressed bitstream by entropy encoding the code in which the hidden vector is quantized; A system that performs artificial neural network-based image encoding.
As a method of operation for an operating server that performs video encoding based on Artificial Neural Networks (ANN), A step of acquiring multiple frames of an image, determining the number of blocks for each frame using the frame rate of the image and the frame characteristics of each frame, and generating blocks for each frame based on the determined number of blocks; A step of determining the importance of each block, which is a criterion for determining whether each block is a target for encoding, using the block characteristics of each block generated for each frame, and determining an important block, which is a block to be encoded among a plurality of blocks generated for each frame based on the importance; and A step comprising: extracting image information corresponding to an encoding target signal subject to compression in the above important block; converting the extracted image information into a hidden vector through an artificial neural network-based encoder; and generating a compressed bitstream by entropy encoding the code in which the hidden vector is quantized. Method of operation of an operating server performing artificial neural network-based image encoding.
A non-transient recording medium having a program for executing the method of operation according to claim 7 recorded thereon and readable by a computer.
A computer program recorded on a non-transient recording medium to execute the operation method according to claim 7 on an operating server performing artificial neural network-based image encoding.

Description

Operating server for performing image coding based on artificial neural networks and method of operation thereof The present invention relates to an operating server that performs video encoding using an artificial neural network and a method of operation thereof, and more specifically, to an operating server that selectively determines a block to be encoded based on the characteristics of a frame and the characteristics of a block during the process of encoding video based on an artificial neural network and a method of operation thereof. With the increasing resolution of video content and the surge in demand for real-time streaming today, video encoding technology has established itself as a key element in saving storage space and improving transmission efficiency. In particular, as the generation and consumption of video data explode in various fields such as smartphones, digital cameras, surveillance cameras, autonomous vehicles, augmented reality (AR), and virtual reality (VR), there is a growing demand for higher compression efficiency and intelligent encoding strategies. Existing video encoding technologies have primarily improved compression efficiency based on standards such as MPEG, H.264/AVC, H.265/HEVC, and H.266/VVC by combining spatial and temporal prediction, transformation, quantization, and entropy encoding. These standard encoding methods are designed to divide frames into fixed block units and apply the same algorithm to each block. Recently, Artificial Neural Networks (ANN)-based encoding technologies have been actively researched, and a method that converts block-unit video information into a hidden vector, which is a latent representation, and then compresses it through quantization and entropy encoding based on a probabilistic model is receiving attention. However, existing technologies tend to process the entire video in uniform units or determine importance based on simple criteria such as inter-frame prediction accuracy or pixel differences. This has limitations in that it fails to adequately consider important areas even in frames or blocks containing complex objects or rapid scene changes, and it also hinders compression efficiency by encoding unnecessary areas at high bit rates. Furthermore, if the determination of block importance is not precisely implemented based on artificial neural networks, it may lead to wasted transmission bandwidth or degradation of restored image quality. Therefore, there is a need for artificial neural network-based video encoding technology that can precisely determine importance by comprehensively considering various characteristics at the video frame and block levels (spatial correlation, temporal variation, object presence, attention, distortion sensitivity, reconstructibility, etc.), and selectively encode only the important blocks constituting the frame. FIG. 1 is a schematic diagram showing the environment of an operating server that performs image encoding based on an artificial neural network according to one embodiment. FIG. 2 is a diagram illustrating an exemplary hardware configuration for an operating server that performs image encoding based on an artificial neural network according to one embodiment. FIG. 3 is a diagram showing the functional components of an anamorphic video providing server for forming a sense of space based on viewer tracking according to one embodiment. FIG. 4 is a diagram showing the functional components of an image block generation unit included in an operating server that performs image encoding based on an artificial neural network according to one embodiment. FIG. 5 is a diagram showing the functional components of an important block determination unit included in an operating server that performs image encoding based on an artificial neural network according to one embodiment. FIG. 6 is a diagram showing the functional components of an encoding unit included in an operating server that performs image encoding based on an artificial neural network according to one embodiment. FIG. 7 is a flowchart illustrating an operation method performed in an operating server that performs image encoding based on an artificial neural network according to one embodiment. FIG. 8 is a drawing showing a wireless communication system that can be applied in a communication process according to one embodiment of the present invention. Figure 9 is a diagram showing a base station in a wireless communication system according to Figure 8. FIG. 10 is a diagram showing a terminal in a wireless communication system according to FIG. 8. FIG. 11 is a diagram showing a communication interface in a wireless communication system according to FIG. 8. The present invention is susceptible to various modifications and may have various embodiments; specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the invention to specific embodiments, and it should be