US-12621486-B2 - Computer-implemented method and system for video coding

US12621486B2US 12621486 B2US12621486 B2US 12621486B2US-12621486-B2

Abstract

A computer-implemented method for processing a video includes: (a) determining, based on one or more rate-distortion models and number of bits for a frame of the video, coding parameters for processing the frame, the coding parameters comprising a rescale parameter r and a video compression model λ, and (b) processing the frame based on the rescale parameter r and the video compression model λ determined in (a) to form at least part of a bitstream of the video.

Inventors

Shiqi Wang
Jiancong Chen

Assignees

CITY UNIVERSITY OF HONG KONG

Dates

Publication Date: 20260505
Application Date: 20230314

Claims (20)

1 . A method for processing a video, comprising: (a) determining, based on one or more rate-distortion models and number of bits for a frame of the video, coding parameters for processing the frame, the coding parameters comprising a rescale parameter r and a video compression model λ, wherein the rescale parameter r comprises a rescale value or ratio for affecting resolution of the frame; (b) employing the rescale parameter r as a controlling parameter with the video compression model λ for frame-level resolution adjustment and rate adaption, thereby obtaining an increased set of rate-distortion operating points; (c) processing the frame by approximating a rate-distortion curve with the increased set of operating points obtained in (b), and selecting an optimal pair of the rescale parameter r and the video compression model λ for rescaling and encoding the frame, thereby forming at least part of a bitstream of the video; wherein the one or more rate-distortion models are updated after the frame is processed, based on the processing of the frame in (c), to adjust at least the rescale parameter r for processing a next frame of the video.
2 . The method of claim 1 , wherein (c) comprises steps in an order of: (c)(i) rescaling the frame based on the rescale parameter r determined in (a) to form a rescaled frame; and (c)(ii) encoding the rescaled frame based on the video compression model λ determined in (a).
3 . The method of claim 1 , wherein the rescale parameter r further comprises a reference frame associated with the frame to facilitate encoding of the frame.
4 . The method of claim 1 , wherein the rescale value or ratio is arranged to match resolution of the frame and resolution of the reference frame.
5 . The method of claim 3 , wherein the reference frame is a frame of the video before the frame of the video; and/or wherein the frame and the reference frame are consecutive and/or continuous frames of the video.
6 . The method of claim 1 , wherein the one or more rate-distortion models comprises a rate model f R and a distortion model f D .
7 . The method of claim 6 , wherein (a) comprises: (a)(i) determining a set of coding parameters based the rate model f R and the number of bits for the frame, the set of coding parameters comprising multiple pairs of rescale parameter r and video compression model λ; and (a)(ii) determining, based on the multiple pairs of rescale parameter r and video compression model λ and the distortion model f D , the rescale parameter r and the video compression model λ for processing the frame of the video.
8 . The method of claim 7 , wherein the determining in (a)(ii) comprises: determining, from the multiple pairs of rescale parameter r and video compression model λ, the rescale parameter r and the video compression model λ arranged to minimize distortion based on the distortion model f D .
9 . The method of claim 1 , further comprising: (d) processing the at least part of the bitstream to reconstruct a reconstructed frame corresponding to the frame.
10 . The method of claim 9 , wherein the processing in (d) comprises: decoding the at least part of the bitstream to form a decoded frame corresponding to the frame or the reconstructed frame corresponding to the frame.
11 . The method of claim 10 , wherein the processing in (d) further comprises: rescaling the decoded frame based on the rescale parameter r to form the reconstructed frame.
12 . The method of claim 9 , further comprising: (e) updating the one or more rate-distortion models based on the processing in (d), for processing the next frame of the video.
13 . The method of claim 1 , wherein the one or more rate-distortion models comprises a rate model f R and a distortion model f D ; and wherein the updating comprises updating the rate model f R based on the processing in (c), for processing the next frame of the video.
14 . The method of claim 13 , wherein updating the rate model f R to comprises: updating one or more parameters of the rate model f R based on an actual coding rate used for coding the frame and an estimated coding rate R determined based on the rescale parameter r, the video compression model λ, and the rate model f R .
15 . The method of claim 14 , wherein the rate model f R is representable as R = f R ( λ , r ) = α 1 · r β 1 where R is coding rate, and α 1 and β 1 are video content dependent parameters of video compression model λ.
16 . The method of claim 15 , wherein the updating of the rate model f R is based on α 1 new = α 1 old + δ a ( ln ⁢ R real - ln ⁢ R est ) × α 1 old ⁢ β 1 new = β 1 old + δ B ( ln ⁢ R real - ln ⁢ R est ) × ln ⁢ r where α 1 old and β 1 old are video content dependent parameters of the rate model f R before the update, α 1 new and β 1 new are video content dependent parameters of the rate model f R after the update, δ α is a constant, δ B is a constant, R real is the actual coding rate, and R est is the estimated coding rate.
17 . The method of claim 12 , wherein the one or more rate-distortion models comprises a rate model f R and a distortion model f D ; and wherein the updating comprises updating the distortion model f D based on the processing in (d), for processing another frame of the video.
18 . The method of claim 17 , wherein updating the distortion model f D comprises: updating one or more parameters of the distortion model f D based on an estimated distortion measure of the frame determined based on the distortion model f D and an actual distortion measure of the frame determined based on the processing in (d).
19 . The method of claim 18 , wherein the actual distortion measure is represented by an actual mean squared error (MSE) of the frame determined based on the processing in (d) and the estimated distortion measure is represented by an estimated mean squared error (MSE) determined based on the rescale parameter r, the video compression model Δ, and the distortion model f D .
20 . The method of claim 19 , wherein the distortion model f D is representable as D = f D ( λ , r ) = α 2 · r β 2 where D is distortion measure, and α 2 and β 2 are video content dependent parameters of video compression model λ.

Description

TECHNICAL FIELD The invention generally relates to processing, in particular coding, of videos. BACKGROUND Various techniques for end-to-end video coding are known. One problem associated with existing end-to-end video coding is a lack of sufficient operational rate-distortion (R-D) points in rate-distortion (R-D) models (functions that describe the relationship between the bitrate and expected level of distortion in the reconstructed video stream). This problem complicates rate control in video coding optimization. SUMMARY OF THE INVENTION In a first aspect, there is provided a computer-implemented method for processing a video (digital video data), comprising: (a) determining, based on one or more rate-distortion models and number of bits for a frame of the video, coding parameters for processing the frame, the coding parameters comprising a rescale parameter r and a video compression model λ; and (b) processing the frame based on the rescale parameter r and the video compression model λ determined in (a) to form at least part of a bitstream of the video. The video compression model generally corresponds to a Lagrange multiplier. Optionally, the video compression model λ is an end-to-end video compression model. Optionally, (b) comprises: (b)(i) rescaling the frame based on the rescale parameter r determined in (a) to form a rescaled frame; and (b)(ii) encoding the rescaled frame based on the video compression model λ determined in (a). Optionally, the rescale parameter r comprises a rescale value or ratio for affecting resolution of the frame. For example, the rescale value or ratio may be arranged to downsample the frame. For example, the rescale value or ratio may be arranged to upsample the frame. Optionally, the rescale parameter r comprises a rescale value or ratio for affecting (e.g., increase or decrease) resolution of the frame and/or a reference frame associated with the frame to facilitate encoding of the frame. Optionally, the rescale value or ratio is arranged to match resolution of the frame and resolution of the reference frame such that the resolutions are substantially the same. Optionally, the reference frame is a frame of the video before the frame of the video. Optionally, the frame and the reference frame are consecutive and/or continuous frames of the video. Optionally, the frame and the reference frame are not consecutive and/or continuous frames of the video. Optionally, the computer-implemented method further comprises: determining the number of bits for the frame. Optionally, determining the number of bits for the frame comprises: determining number of bits for a group of picture structure of the video; and determining, based on the determined number of bits for the group of picture structure of the video, the number of bits for the frame. Optionally, the determining of the number of bits for the group of picture structure of the video is based on: RGOP=RPicAvg×(Ncoded+Φ)-RcodedΦ×NGOP where RGOP denotes number of bits for the group of picture structure, RPicAvg denotes average bits per frame for the video, Rcoded represents number of bits already used for the video, Ncoded represents number of already encoded frames of the video, and Φ denotes size of a sliding window. Optionally, the determining of the number of bits for the frame is based on: RPic=RGOP-Rcur_GOP⁢_coded∑ {uncoded⁢ frames}ωPic×ωcur_pic where RPic denotes the number of bits for the frame, RGOP denotes number of bits for the group of picture structure, Rcur_GOP_coded denotes number of bits already used for the group of picture structure, ωPic denotes weight(s) allocated to individual uncoded frame(s) of the video, and ωcur_pic denotes weight allocated to the frame. Optionally, the one or more rate-distortion models comprises a rate model fR and a distortion model fD. Optionally, the rate model fR for each video compression model λ is respectively representable as a hyperbolic function of the rescale parameter r. Optionally, the distortion model fD for each video compression model λ is respectively representable as a hyperbolic function of the rescale parameter r. Optionally, (a) comprises: (a)(i) determining a set of coding parameters based the rate model fR and the number of bits for the frame, the set of coding parameters comprising multiple pairs of rescale parameter r and video compression model λ, and (a)(ii) determining, based on the multiple pairs of rescale parameter r and video compression model λ and the distortion model fD, the rescale parameter r and the video compression model λ for processing the frame of the video. Optionally, the determining in (a)(ii) comprises: determining, from the multiple pairs of rescale parameter r and video compression model λ, the rescale parameter r and the video compression model λ arranged to minimize distortion based on the distortion model fD. Optionally, the computer-implemented method further comprises: (c) processing the at least part of the bitstream to reconstruct