CN-116112681-B - Video decompression method based on multi-scale interactive communication space-time network

CN116112681BCN 116112681 BCN116112681 BCN 116112681BCN-116112681-B

Abstract

The invention discloses a video decompression method based on a multi-scale interactive communication space-time network, which mainly comprises the steps of constructing a multi-scale space-time feature alignment network module, implicitly aligning an input reference frame sequence and a target frame in a multi-scale feature space to obtain aligned multi-scale features, constructing a multi-scale feature enhancement network module, enhancing the multi-scale features obtained in the previous step to obtain enhanced features, constructing a source feature selection enhancement module by taking the target frame and the enhanced features as inputs, constructing the multi-scale space-time feature alignment network module, the multi-scale feature enhancement network module and the source feature selection enhancement module to jointly form the multi-scale space-time network, uniformly training the multi-scale space-time network by utilizing a data set, and finally outputting a video without compression effect. The video decompression method based on the multi-scale space-time network can obtain good subjective and objective effects. Thus, the present invention is an efficient video decompression method.

Inventors

HE XIAOHAI
ZHANG TINGRONG
TENG QIZHI
REN CHAO
XIONG SHUHUA
CHEN HONGGANG
CHEN ZHENGXIN

Assignees

四川大学

Dates

Publication Date: 20260508
Application Date: 20211110

Claims (3)

1. The video decompression method based on the multi-scale interactive communication space-time network is characterized in that after video compression is completed, artifacts introduced in the quantization, prediction and coding processes in the compressed video are repaired and reconstructed by utilizing a depth neural network model, and the implementation process of the method comprises the following steps: The method comprises the steps of constructing a multi-scale space-time characteristic alignment module, namely, forming a basic multi-scale residual block by a multi-scale convolution and multi-scale characteristic interaction communication module, forming the multi-scale space-time characteristic alignment module by utilizing the multi-scale residual block and deformable convolution, and aligning an input reference frame and a target frame in a characteristic space; Constructing a multi-scale feature enhancement module by using a multi-scale residual block, namely constructing the multi-scale feature enhancement module by taking the output of the multi-scale space-time feature alignment module as the input of the module, enhancing and primarily fusing the aligned multi-scale features and outputting enhanced depth features, wherein the multi-scale residual block is formed by carrying out multi-scale convolution with a channel number of C, a correction linear Unit (RECTIFIED LINEAR Unit, reLU) and 2 channels of 0.5C The common convolution and a multi-scale characteristic interaction communication module are formed; Wherein the multi-scale feature interactive communication module firstly uses low-scale features Upsampling 2 times and high-scale features Element summation is carried out to obtain Next, statistics between channels are generated using global information of global average pooling (Global Average Pooling, GAP) coding features And then, will By passing through Is a compact representation of the features obtained by the convolution operation of (a) , Then, the characteristic representation z flows through two parallel convolution layers with the convolution kernel sizes of 1 multiplied by 1 to restore the channel dimension to 0.5C, and the characteristic representations of the two branches are respectively , Normalization operation is performed by using a Softmax function, and a relative attention weight matrix among channels is generated: , Finally, rescaling the input multi-scale features by using the attention weight to obtain the multi-scale features with enhanced interactive communication And input to a subsequent module for further feature fusion or enhancement processing: ; The source feature selection enhancement module comprises a feature selection branch and a feature preservation branch, wherein the feature selection branch generates a channel weighting coefficient matrix according to the statistical information of the shallow layer features of the input target frame, and carries out channel weighting rescaling on the enhanced deep layer features; Sequentially connecting and combining the multi-scale space-time feature alignment module, the multi-scale feature enhancement module and the source feature selection enhancement module to form a final multi-scale interactive communication space-time network, wherein the output of the multi-scale feature enhancement module is used as the input of the source feature selection enhancement module; Training the network in the fourth step by utilizing the training data set; and step six, during testing, taking the compressed video as the input of the network, and outputting the high-quality video with the compression effect removed finally.
2. The method of claim 1, wherein the multi-scale spatio-temporal feature alignment module of step one uses deformable convolution to align the reference frame with the target frame in feature space at different scales.
3. The video decompression method based on multi-scale interactive communication space-time network as claimed in claim 1, wherein the source feature selection enhancement module in the third step enhances deep features, the source feature selection enhancement module consists of feature selection branches and feature retention branches, the feature selection branches extract shallow features of an input target frame by using a CRC network module, wherein the CRC consists of 2 convolution layers and 1 ReLU, then a channel weighting coefficient matrix with a value range between (0, 1) is generated by using an energy function E and a sigmoid activation function for independently modeling each channel, the channel is adaptively enhanced, the absolute importance degree of the channel features is reflected, the matrix is a channel independent attention weight matrix, and the input deep features are then input Channel-by-channel scaling is performed according to the attention weight, and the scaled features are enhanced by using a CRC module, which can be expressed as: , Wherein the method comprises the steps of The output characteristics of the characteristic selection branch are represented, Representing functions of CRC modules, while feature preserving branches adaptively preserve input features through a CRC module, resulting in Finally, adding the output characteristics of the characteristic retaining branch and the output characteristics of the characteristic selecting branch, and obtaining the final enhancement characteristics through a CRC module 。

Description

Video decompression method based on multi-scale interactive communication space-time network Technical Field The invention relates to a compressed video post-processing technology, in particular to a video decompression method based on a multi-scale interactive communication space-time network, and belongs to the field of image/video processing. Background Video compression techniques are often used to reduce the temporal and spatial redundancy of video for better storage and transmission. However, video compression algorithms (e.g., h.264 and h.265) that are currently in common use inevitably introduce various compression artifacts (e.g., blocking artifacts, ringing phenomena, etc.) in the compressed video. Especially in the case of low code rate, compression artifact greatly reduces video quality, seriously influences experience quality, and also influences the precision of subsequent advanced visual task processing. Video decompression, which aims to reconstruct high quality video from low quality compressed video, is a major research hotspot in the field of image/video processing. Video decompression not only can enhance the visual quality of compressed video, but also can improve the performance of subsequent visual tasks (such as identification, detection, tracking, etc.). The video decompression method mainly comprises 3 types, namely a single image decompression method, a single frame video decompression method and a multi-frame video decompression method. The single-frame video decompression method is mainly designed for enhancing JPEG images, the method can adapt to videos by reconstructing each single frame, the single-frame video decompression method is to only utilize high-quality video frames in the middle of information of current video frames, the method does not consider time correlation among the video frames and is suitable for an intra-frame coding mode, the multi-frame video decompression method is to reconstruct high-quality video frames by taking adjacent multi-frames of a target frame as reference frames and utilizing the target frame and the reference frames together, the method utilizes time information of the videos, and a better reconstruction result can be obtained for compressed videos in an inter-frame coding mode. The current video decompression method does not fully mine the rich multi-scale information in the video frames, and the performance is limited to a certain extent. Disclosure of Invention The invention aims to provide an effective video decompression method aiming at compressed video. The invention provides a video decompression method based on a multi-scale interactive communication space-time network, which comprises the following steps: (1) Constructing a multi-scale space-time characteristic alignment module, namely specifically, forming a basic multi-scale residual block by using a multi-scale convolution and a multi-scale characteristic interaction communication module, forming the multi-scale space-time characteristic alignment module by using the multi-scale residual block and the deformable convolution, and aligning an input reference frame and a target frame in a characteristic space; (2) Constructing a multi-scale feature enhancement module, namely specifically, constructing the multi-scale feature enhancement module by a multi-scale residual block, taking the output of the multi-scale space-time feature alignment module as the input of the module, enhancing and preliminarily fusing the multi-scale features, and outputting enhanced depth features; (3) The method comprises the steps of constructing a source feature selection enhancement module, specifically, generating an importance map by a feature selection branch by utilizing statistical information of shallow features of an input target frame to rescale enhanced deep features, and adaptively preserving important information of the deep features by a feature preservation branch; (4) Combining the multi-scale space-time feature alignment module, the multi-scale feature enhancement module and the source feature selection module into a final multi-scale space-time network; (5) Training the network in step (4) with training data; (6) During testing, the compressed video is taken as the input of the network, and the high-quality video with the compression effect removed finally is output. Drawings FIG. 1 is an overview of a decompression method based on a multi-scale interactive communication space-time network, wherein (a) is a multi-scale interactive communication space-time network structure block diagram, (b), (c) and (d) are multi-scale convolution, and (e) is a source feature selection enhancement module structure diagram. FIG. 2 is a block diagram of a multi-scale feature interactive communication module of the present invention. Fig. 3 is a subjective visual effect comparison chart of the present invention and four methods for performing video decompression after compressing a test video "FourPeople