Search

US-12627814-B2 - Video encoding rate control for intra and scene change frames using machine learning

US12627814B2US 12627814 B2US12627814 B2US 12627814B2US-12627814-B2

Abstract

Techniques related to quantization parameter estimation for coding intra and scene change frames are discussed. Such techniques include generating features based on an intra or scene change frame including a proportion of smooth blocks and one or both of a measure of block variance and a prediction distortion, and applying a machine learning model to generate an estimated quantization parameter for encoding the intra or scene change frame.

Inventors

  • Ximin Zhang
  • Sang-Hee Lee
  • Keith W. ROWE

Assignees

  • INTEL CORPORATION

Dates

Publication Date
20260512
Application Date
20240823

Claims (20)

  1. 1 . A method for video encoding, comprising: inputting a feature vector associated with a frame of a video and a target frame size into a machine learning model, the feature vector comprising one or more features generated based on the frame of the video; generating, by the machine learning model, an estimated quantization parameter based on the feature vector and the target frame size; encoding the frame using the estimated quantization parameter to generate a bitstream for the frame having a number of bits; determining that a difference between the target frame size and the number of bits associated with the bitstream for the frame exceeds a threshold; inputting the feature vector and the number of bits associated with the bitstream for the frame into the machine learning model; generating, by the machine learning model, a further estimated quantization parameter based on the feature vector and the number of bits; and encoding the frame using an encode quantization parameter to generate a further bitstream, wherein the encode quantization parameter is based on the estimated quantization parameter and the further estimated quantization parameter.
  2. 2 . The method of claim 1 , further comprising: in response to determining that the difference does not exceed the threshold, encoding the frame using the estimated quantization parameter.
  3. 3 . The method of claim 1 , wherein the feature vector comprises an average block variance, a proportion of very smooth blocks, and a distortion.
  4. 4 . The method of claim 1 , wherein the feature vector comprises one or more of an average block variance, a proportion of very smooth blocks, a number of encode bits, a proportion of syntax bits, and a distortion.
  5. 5 . The method of claim 1 , wherein the frame is an intra and/or scene change frame.
  6. 6 . The method of claim 1 , wherein the encode quantization parameter is a linear combination of the estimated quantization parameter and a difference between the estimated quantization parameter and the further estimated quantization parameter.
  7. 7 . The method of claim 1 , wherein the encode quantization parameter is defined as the estimated quantization parameter subtracted by a difference between the estimated quantization parameter and the further estimated quantization parameter multiplied by a factor.
  8. 8 . The method of claim 7 , wherein the factor is between 0.6 and 0.8.
  9. 9 . The method of claim 8 , further comprising: updating parameters of the machine learning model using the feature vector and the target frame size as a training input and the encode quantization parameter as a training output, the training input and the training output forming a training input-output pair.
  10. 10 . The method of claim 9 , wherein the updating is performed on the fly.
  11. 11 . An apparatus for video encoding, comprising: one or more processors to execute one or more instructions; and a memory to store a frame of a video and the one or more instructions, wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to: input a feature vector associated with the frame of the video and a target frame size into a machine learning model, the feature vector comprising one or more features generated based on the frame of the video; generate, by the machine learning model, an estimated quantization parameter based on the feature vector and the target frame size; encode the frame using the estimated quantization parameter to generate a bitstream for the frame having a number of bits; determine that a difference between the target frame size and the number of bits associated with the bitstream for the frame exceeds a threshold; input the feature vector and the number of bits associated with the bitstream for the frame into the machine learning model; generate, by the machine learning model, a further estimated quantization parameter based on the feature vector and the number of bits; and encode the frame using an encode quantization parameter to generate a further bitstream, wherein the encode quantization parameter is based on the estimated quantization parameter and the further estimated quantization parameter.
  12. 12 . The apparatus of claim 11 , wherein the one or more instructions further cause the one or more processors to: in response to determining that the difference does not exceed the threshold, encode the frame using the estimated quantization parameter.
  13. 13 . The apparatus of claim 11 , wherein the feature vector comprises an average block variance, a proportion of very smooth blocks, and a distortion.
  14. 14 . The apparatus of claim 11 , wherein the feature vector comprises one or more of an average block variance, a proportion of very smooth blocks, a number of encode bits, a proportion of syntax bits, and a distortion.
  15. 15 . The apparatus of claim 11 , wherein the frame is an intra and/or scene change frame.
  16. 16 . One or more non-transitory machine readable medium having instructions stored thereon, wherein the instructions, in response to being executed on a computing device, cause the computing device to perform video coding by: inputting a feature vector associated with a frame of a video and a target frame size into a machine learning model, the feature vector comprising one or more features generated based on the frame of the video; generating, by the machine learning model, an estimated quantization parameter based on the feature vector and the target frame size; encoding the frame using the estimated quantization parameter to generate a bitstream for the frame having a number of bits; determining that a difference between the target frame size and the number of bits associated with the bitstream for the frame exceeds a threshold; inputting the feature vector and the number of bits associated with the bitstream for the frame into the machine learning model; generating, by the machine learning model, a further estimated quantization parameter based on the feature vector and the number of bits; and encoding the frame using an encode quantization parameter to generate a further bitstream, wherein the encode quantization parameter is based on the estimated quantization parameter and the further estimated quantization parameter.
  17. 17 . The one or more non-transitory machine readable medium of claim 16 , wherein the encode quantization parameter is a linear combination of the estimated quantization parameter and a difference between the estimated quantization parameter and the further estimated quantization parameter.
  18. 18 . The one or more non-transitory machine readable medium of claim 16 , wherein the encode quantization parameter is defined as the estimated quantization parameter subtracted by a difference between the estimated quantization parameter and the further estimated quantization parameter multiplied by a factor.
  19. 19 . The one or more non-transitory machine readable medium of claim 18 , wherein the factor is in a range between 0.6 and 0.8.
  20. 20 . The one or more non-transitory machine readable medium of claim 16 , further comprising: updating parameters of the machine learning model using the feature vector and the target frame size as a training input and the encode quantization parameter as a training output, the training input and the training output forming a training input-output pair.

Description

CROSS REFERENCE TO RELATED APPLICATION This application is a continuation (and claims benefit of priority under 35 U.S.C. § 120) of U.S. application Ser. No. 16/950,367, filed Nov. 17, 2020, entitled “VIDEO ENCODING RATE CONTROL FOR INTRA AND SCENE CHANGE FRAMES USING MACHINE LEARNING,” the disclosure of which is considered part of (and is incorporated by reference herein) the disclosure of the present application. BACKGROUND In compression/decompression (codec) systems, compression efficiency and video quality are important performance criteria. Visual quality is an important aspect of the user experience in many video applications and compression efficiency impacts the amount of memory storage needed to store video files and/or the amount of bandwidth needed to transmit and/or stream video content. For example, a video encoder compresses video information so that more information can be sent over a given bandwidth or stored in a given memory space or the like. The compressed signal or data may then be decoded via a decoder that decodes or decompresses the signal or data for display to a user. In most implementations, higher visual quality with greater compression is desirable. In the context of video encoding, bit rate control (BRC) is a key factor for the differentiation of one video solution from other solutions. Under many circumstances, controlling the frame size to a predictable value is important especially for the network related applications. Given a target frame size, BRC techniques adjust the quantization parameter (QP) value of each frame to control the number of bits generated from the frames. An ongoing challenge in BRC is handling intra frames and scene change frames. Since those frames are the reference anchors for subsequent frames, the optimal target size selection can provide substantial subjective and objective improvement. However, such frames have no correlation with previous frames, which makes it difficult to predict the QP value. When the predicted QP value is substantially different with respect to the target value, poor quality of the remaining frames in the same group of pictures (GOP) or beyond results, sometimes even causing video buffering verifier (VBV) buffer overflow/underflow with single pass encoding. Even when second pass encoding is allowed, BRC requires many first pass encoding statistics and collecting the statistics is computationally expensive particularly for hardware solutions that must use large amounts of additional gate counts to collect the statistics. In some contexts, more than two passes are needed to obtain an accurate QP for the target frame size. It may be advantageous to improve the accuracy and efficiency of QP selection for intra frames and scene change frames for improved compression efficiency and/or video quality. It is with respect to these and other considerations that the present improvements have been needed. Such improvements may become critical as the desire to compress and transmit video data becomes more widespread. BRIEF DESCRIPTION OF THE DRAWINGS The material described herein is illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not necessarily drawn to scale. For example, the dimensions of some elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements. In the figures: FIG. 1 is an illustrative diagram of an example system for video coding including machine learning based quantization parameter selection for intra and scene change frames; FIG. 2 is an illustrative diagram of example feature vectors for use in quantization parameter prediction for video coding; FIG. 3 is an illustrative diagram of exemplary processing to generate features for feature vectors for use in quantization parameter prediction for video coding; FIG. 4 illustrates an example deep neural network for determination of a quantization parameter for an intra or scene change frame; FIG. 5 is an illustrative diagram of an example training corpus generator for generating ground truth training data to train a machine learning model to generated estimated quantization parameters for intra or scene change frames; FIG. 6 is an illustrative diagram of example data structures for providing an example ground truth mapping for training a machine learning model for quantization parameter estimation; FIG. 7 is a flow diagram illustrating an example process for training a machine learning model for quantization parameter estimation; FIG. 8 is a flow diagram illustrating an example process for video coding including determination of a quantization parameter for an intra or scene change frame; FIG. 9 is an illustrative diagram of an example system for video coding including determination of a quantization parameter fo