CN-122002040-A - Parallel processing method and system for video decoding and video prediction

CN122002040ACN 122002040 ACN122002040 ACN 122002040ACN-122002040-A

Abstract

The invention discloses a parallel processing method and a system for video decoding and video prediction, and belongs to the technical field of computer vision. The method comprises the steps of obtaining a compressed code stream, reconstructing a current frame image by combining the compressed code stream and a previous decoded video frame to obtain the current decoded video frame, and recursively predicting a video frame image at a future time based on the previous decoded video frame and the current decoded video frame. The invention can reduce the system time delay and the calculation cost, meet the requirements of low-time delay video transmission and prediction, and can be widely applied to low-delay requirement scenes such as automatic driving, remote cooperative control, robot navigation and the like.

Inventors

LUO DINGSHENG
WANG JIA
HU WENFEI
KANG JIALIANG

Assignees

北京大学

Dates

Publication Date: 20260508
Application Date: 20260313

Claims (10)

1. A method of parallel processing of video decoding and video prediction, the method comprising: Obtaining a compressed code stream; Reconstructing the current frame image by combining the compressed code stream and the previously decoded video frame to obtain the current decoded video frame; Video frame images at future times are recursively predicted based on previously decoded video frames and current decoded video frames.
2. The method of claim 1, wherein the generating the compressed code stream comprises: Acquiring a video stream; And encoding the video stream to obtain a compressed code stream, wherein the encoding mode comprises an H.264 encoding mode.
3. The method of claim 1, wherein reconstructing the current frame image in combination with the compressed bitstream and the previously decoded video frame to obtain the current decoded video frame comprises: converting motion and residual information contained in the compressed code stream into tensor form to obtain Frame compression coding ; At the decoding stage to the th When the frame is to be decoded, the first frame is decoded Frame reconstruction image And the first Frame reconstruction image Respectively as reference frames Reference frame ; Based on reference frames Reference frame First, a Frame-corresponding compression coding Obtain the first Frame reconstruction image 。
4. A method according to claim 3, characterized in that it is based on reference frames Reference frame First, a Frame-corresponding compression coding Obtain the first Frame reconstruction image Comprising: Encoding compression based on a first convolutional neural network Extracting multi-layer convolution characteristics; Feature vector reshaping into an initial optical flow field by an anti-flattening operation And an initial optical flow field ; Will initiate an optical flow field Initial optical flow field Reference frame Reference frame Inputting into a second convolutional neural network to obtain an enhanced optical flow field Enhancement of optical flow fields And a mask matrix, wherein the enhanced optical flow field Enhancing optical flow fields Are each composed of two channels, which respectively correspond to the initial optical flow field And an initial optical flow field The horizontal direction component and the vertical direction component after compensation are carried out; Combining enhanced optical flow fields For reference frames Reverse optical flow mapping is carried out to obtain a reference frame ; Combining enhanced optical flow fields For reference frames Reverse optical flow mapping is carried out to obtain a reference frame ; Reference frame by mask matrix Reference frame Weighted fusion to obtain the first Frame reconstruction image 。
5. A method according to claim 3, wherein recursively predicting the video frame image at the future time instant based on the previously decoded video frame and the currently decoded video frame comprises: Constructing a decoding prediction network, wherein the decoding prediction network is composed of The cascade of optical flow prediction modules is composed of (a) The optical flow prediction module is based on the first previously decoded Frame reconstruction image First, the Frame reconstruction image First, a Optical flow increment output by each optical flow prediction module Sum optical flow delta Obtaining the increment of the optical flow Optical flow delta Fusion weight increment matrix , ; Optical flow increment output by each optical flow prediction module Optical flow delta Fusion weight increment matrix Summing to obtain optical flow field Optical flow field Fusion weight matrix ; Combining optical flow fields For the first Frame reconstruction image Reverse optical flow mapping is carried out to obtain a reference frame ; Combining optical flow fields For a pair of Frame reconstruction image Reverse optical flow mapping is carried out to obtain a reference frame ; By fusing weight matrices Will refer to the frame Reference frame Weighted fusion to obtain the first Frame prediction image 。
6. The method of any of claims 1 to 5, wherein after recursively predicting the video frame image at the future time instant based on the previously decoded video frame and the current decoded video frame, the method further comprises: When the compressed code stream transmitted by the network has time delay or data is temporarily lost, compensating or serving as a reference frame based on the predicted video frame image at the future moment so as to decode the video frame.
7. A parallel processing system for video decoding and video prediction, the system comprising: the acquisition module is used for acquiring the compressed code stream; The reconstruction module is used for reconstructing the current frame image by combining the compressed code stream and the previously decoded video frame to obtain the current decoded video frame; a prediction module for recursively predicting video frame images at a future time instant based on a previously decoded video frame and a currently decoded video frame.
8. A computer device comprising a processor and a memory storing computer program instructions that when executed implement the method of parallel processing of video decoding and video prediction as claimed in any one of claims 1 to 6.
9. A computer readable storage medium, wherein computer program instructions are stored on the computer readable storage medium, which when executed by a processor, implement a method of parallel processing of video decoding and video prediction according to any of claims 1-6.
10. A computer program product, which, when run on a computer device, causes the computer device to perform the method of parallel processing of video decoding and video prediction according to any of claims 1-6.

Description

Parallel processing method and system for video decoding and video prediction Technical Field The invention belongs to the technical field of computer vision, and particularly relates to a parallel processing method and system for video decoding and video prediction. Background In the context of rapid advances in information technology, video data has become an integral part of modern communications and entertainment. With the dramatic increase in video content, how to efficiently decode, transmit, and predict video data has become an important topic in the current technology field. The current video coding industry is mainly implemented according to various video coding standards, including h.264, h.265, h.266 (i.e., VERSATILE VIDEO CODING, VVC), AV1, and the like. These standards provide different efficiencies and qualities for compressing and transmitting video data. The process of video decoding includes a number of steps such as transformation, entropy encoding, motion compensation, and deblocking effects. Conventional decoding techniques tend to be processed sequentially, which makes them susceptible to performance bottlenecks when processing large-scale video data. In order to solve this problem, the industry has gradually shifted to a parallel processing framework in recent years, and it is desired to improve decoding performance by multithreading or multi-core processing. However, due to the nature of video data, pure parallel decoding is not yet sufficient to address challenges in all scenarios. The video prediction technology is an effective data compression method, and the core idea is to predict a current frame using a previous frame, thereby reducing redundant data stored and transmitted. The existing methods focus on video prediction by using emerging technical means such as deep learning. However, these methods still face problems of delay and computational resource consumption in processing real-time video streams. Therefore, searching a new parallel processing framework, which can effectively combine video decoding and video prediction, becomes a technical requirement to be solved. Disclosure of Invention The invention provides a parallel processing method and a system for video decoding and video prediction, which can reduce system time delay and calculation cost, meet the requirements of low-time-delay video transmission and prediction, and can be widely applied to low-delay requirement scenes such as automatic driving, remote cooperative control, robot navigation and the like. In order to achieve the above object, the technical scheme of the present invention includes the following. A method of parallel processing of video decoding and video prediction, the method comprising: Obtaining a compressed code stream; Reconstructing the current frame image by combining the compressed code stream and the previously decoded video frame to obtain the current decoded video frame; Video frame images at future times are recursively predicted based on previously decoded video frames and current decoded video frames. Further, the generating process of the compressed code stream includes: Acquiring a video stream; And encoding the video stream to obtain a compressed code stream, wherein the encoding mode comprises an H.264 encoding mode. Further, reconstructing the current frame image by combining the compressed code stream with the previously decoded video frame to obtain the current decoded video frame, including: converting motion and residual information contained in the compressed code stream into tensor form to obtain Frame compression coding; At the decoding stage to the thWhen the frame is to be decoded, the first frame is decodedFrame reconstruction imageAnd the firstFrame reconstruction imageRespectively as reference framesReference frame; Based on reference framesReference frameFirst, aFrame-corresponding compression codingObtain the firstFrame reconstruction image。 Further, based on the reference frameReference frameFirst, aFrame-corresponding compression codingObtain the firstFrame reconstruction imageComprising: Encoding compression based on a first convolutional neural network Extracting multi-layer convolution characteristics; Feature vector reshaping into an initial optical flow field by an anti-flattening operation And an initial optical flow field; Will initiate an optical flow fieldInitial optical flow fieldReference frameReference frameInputting into a second convolutional neural network to obtain an enhanced optical flow fieldEnhancement of optical flow fieldsAnd a mask matrix, wherein the enhanced optical flow fieldEnhancing optical flow fieldsAre each composed of two channels, which respectively correspond to the initial optical flow fieldAnd an initial optical flow fieldThe horizontal direction component and the vertical direction component after compensation are carried out; Combining enhanced optical flow fields For reference framesReverse optical flow mapping is carried out