CN-116055729-B - Compression artifact suppression method combining multi-level inter-frame correlation

CN116055729BCN 116055729 BCN116055729 BCN 116055729BCN-116055729-B

Abstract

The invention provides a compression artifact suppression method combining multi-level inter-frame correlation. The method mainly relates to a method for carrying out quality enhancement on HEVC decoded video by combining multi-level motion compensation and deep convolution mapping. The invention mainly utilizes the characteristic of correlation between video frames, and designs a multi-layer motion compensation network to perform feature fusion with the current frame after performing motion compensation on two adjacent frames of the current frame, then respectively performing attention feature extraction on the current frame and a fusion multi-frame feature map, and finally enabling multi-layer feature depth fusion, mapping and reconstruction through deep residual learning and cross-layer hierarchical connection to realize the quality enhancement of the current frame.

Inventors

WU XIAOHONG
TANG BOWEN
XIONG SHUHUA
HE XIAOHAI
CHEN HONGGANG
TENG QIZHI

Assignees

四川大学

Dates

Publication Date: 20260512
Application Date: 20211027

Claims (1)

1. The compression artifact suppression method combining multi-level inter-frame correlation is characterized by comprising the following steps of: (1) Constructing a multi-level motion compensation network to perform inter-frame motion compensation: The motion compensation network adopts a four-layer motion compensation structure which gradually samples from 8 times to the whole picture, and each layer compensates output Expressed as: total output of the multi-level motion compensation network Expressed as: (2) Constructing a multidimensional feature extraction and reconstruction network for feature processing: the network comprises a multi-dimensional feature extraction part and a depth feature mapping and reconstruction part which are connected in series; The multi-dimensional feature extraction part processes the output obtained in the step (1), namely firstly, dimension expansion is carried out on the input feature image, the same pixel position of the expanded feature image is subjected to average pooling in different channels to obtain a channel average value, and the channel average value is subjected to channel weighting on the expanded feature image according to the average value; The depth feature mapping and reconstructing part processes the feature map after multi-dimensional feature extraction, wherein the depth feature mapping and reconstructing part comprises a seven-layer fusion residual error learning and cross-layer dense connection structure, wherein the residual error learning is used for adding convolution results before a current convolution layer and then taking the added result as input of the current layer, each layer of convolution structure is Convolution-ReLU and the convolution kernel size is 3 multiplied by 3; (3) Training the network with a loss function: Objective function of network Can be expressed as: the loss function Loss of loss using average absolute error And mean square error loss The combined strategy is expressed as: Wherein, lambda 1 and lambda 2 are respectively a balanced large weight and a balanced small weight; The seven-frame sequence of Vimeo90K is selected as a data set of a training network, all video sequences are processed according to the standards of ITU-R and BT.601, namely continuous images are processed and converted into YUV format video sequences firstly, then compressed video is obtained through HM16.0 compression, a compression configuration reference configuration file encoder_ lowdelay _P_main.cfg is set to be 37, an inter-frame coding mode of IPPP is adopted, and finally the compressed video and an original video are respectively converted into RGB domain images so as to facilitate experimental input and effect observation.

Description

Compression artifact suppression method combining multi-level inter-frame correlation Technical Field The invention relates to research on a video quality improving method in the field of video coding, in particular to a compression artifact suppression method combining multi-level inter-frame correlation. Background As the amount of internet information data increases, the amount of data transmitted between terminals increases, and how to transmit data under a limited bandwidth becomes a concern. In our daily life, more and more video data is exposed, and it is estimated that in 2022, global video transmission will account for 80% of the whole internet transmission, and high-definition video has become a basic requirement for people to watch daily today. Because of the limitations of video coding and decoding technology, it is difficult to consider the video quality in consideration of the transmission efficiency, a technology focusing on efficient quality improvement of compressed video is important. Conventional codec standards such as h.264 and h.265 have integrated deblocking filtering and sample adaptation, and although they have a certain effect on improving the quality of compressed video, the visual enhancement effect actually achieved is limited. With the development of deep learning, various image video processing technologies based on the deep learning have been developed, and a quality improvement method of compressed image video has also been developed. The conventional image denoising technology mostly adopts linear or nonlinear filters such as Gaussian filters, adopts operations such as weighting each pixel value in an image, and is friendly to some specific noises, but once the noise attribute is changed, the filters can be replaced in many cases. However, the noise removal method based on the convolutional neural network can well avoid the defect. DnCNN proposes a method for implementing a multitasking process such as blind gaussian denoising, SISR or JEPG deblocking, etc. by using a single noise training model such as gaussian noise, wherein the network stability is improved by adopting a residual learning means for avoiding gradient dispersion, and the method reflects the generalization capability of the neural network to a certain extent. Meanwhile, yang designs a network for directly processing HEVC decoded video streams, and realizes targeted denoising of compressed video based on HEVC standards. An ASN (ASN) based on a self-adaptive segmented block transformation neural network and a multi-frame video quality enhancement network MFQE are sequentially proposed, wherein the ASN based on the self-adaptive segmented block transformation neural network utilizes CU segmented block information coded by blocks in H.265 to enhance the effect of removing compression artifacts, and the multi-frame input is utilized to realize the effect of performing motion compensation on adjacent frames to enhance the quality enhancement of the current frame. However, most video quality enhancement methods today have limitations, some focus on single-frame video enhancement methods, emphasize the adoption of image quality enhancement means, so as to ignore the temporal correlation and spatial correlation between video frames, and at the same time, some methods consider CU block information of a video encoder, fuse CU information into a quality enhancement part, but the effect is still not ideal, and some methods use the characteristic of inter-frame correlation, but ignore the low quality characteristic of video decoding frames, and do not consider the influence of compression artifacts of different sizes, so that feature extraction does not conform to the distribution characteristics, resulting in insignificant effect of the final enhancement frame. Disclosure of Invention With the increasing amount of internet information data, the amount of data transmitted between terminals is increasing, which is a concern for how to transmit data under limited bandwidth. In our daily life, more and more video data is exposed, and it is estimated that in 2022, global video transmission will account for 80% of the total internet transmission, and today ultra-clear video has become a basic requirement for people to watch daily, and as 4K or even 8K video increases, this creates a strong demand for efficient video codec technology. However, high compression ratio brings about high distortion, so many related studies are mainly directed to quality improvement of distorted image frames. Conventional codec standards such as h.264 and h.265 have integrated deblocking filtering and sample adaptation, and although they have a certain effect on improving the quality of compressed video, the visual enhancement effect actually achieved is limited. With the development of deep learning, various image video processing technologies based on the deep learning have been developed, and a quality improvement method of compressed image vide