CN-116157824-B - Image prediction for HDR imaging in open loop codec

CN116157824BCN 116157824 BCN116157824 BCN 116157824BCN-116157824-B

Abstract

Given an input HDR and an SDR image representing the same scene, a predictive model for predicting the HDR image from a compressed representation of the input SDR image is generated in a manner that a) noise data is generated based at least on features of the HDR image, b) a noisy SDR image is generated by adding noise data to the SDR image, c) an enhanced HDR data set and an enhanced SDR data set are generated by using the input HDR and SDR images and the noisy SDR image, d) a predictive model is generated for predicting the enhanced HDR data set based on the enhanced SDR data set, and e) the predictive model is solved according to a minimized error criterion to generate a set of predictive parameters that will be transmitted to a decoder together with the compressed representation of the input SDR image to reconstruct an approximation of the input HDR image.

Inventors

SU GUANMING
H. Kadu

Assignees

杜比实验室特许公司

Dates

Publication Date: 20260512
Application Date: 20210621
Priority Date: 20200624

Claims (15)

1. A method for generating prediction coefficients using a processor, the method comprising: Accessing a first input image (120) in a first dynamic range and a second input image (125) in a second dynamic range, wherein the first input image and the second input image represent the same scene; calculating a dynamic range of one or more chroma color components of the first input image; generating noise data having noise intensity based on the calculated dynamic range of the one or more chrominance color components of the first input image; Generating a noise input data set by adding the noise data to the second input image; generating a first enhanced input data set based on the first input image; Combining the second input image and the noise input data set to generate a second enhanced input data set as training data; Generating a predictive model to predict the first enhanced input data set based on the second enhanced input data set; Solving the predictive model according to a minimized error criterion to generate a set of predictive model parameters; compressing the second input image to generate a compressed bitstream, and An output bitstream is generated comprising the compressed bitstream and the prediction model parameters.
2. The method of claim 1, further comprising, in a decoder: Receiving the output bitstream including the compressed bitstream and the prediction model parameters; decoding the output bitstream to generate a first output image in the second dynamic range, and The prediction model parameters are applied to the first output image to generate a second output image in the first dynamic range.
3. The method of claim 1 or 2, wherein the first dynamic range comprises a high dynamic range and the second dynamic range comprises a standard dynamic range.
4. The method of claim 1 or 2, wherein generating the noise data comprises: calculating statistical data based on pixel values of the first input image; calculating a noise standard deviation based on the statistical data, and Noise samples of the noise data are generated using a gaussian distribution having a zero mean and the noise standard deviation.
5. The method of claim 4, wherein calculating the noise standard deviation is further based on a target bitrate used to generate the compressed bitstream and/or characteristics of the second input image.
6. The method of claim 4, wherein calculating the statistics comprises calculating one or more of a total number of pixel values in the first input image, a range of pixel values in a luminance component of the first input image, a range of pixel values in a chrominance component of the first input image, or a number of bins characterizing a packet representing an average pixel value of the first input image.
7. The method of claim 1 or 2, wherein the prediction model comprises a single channel predictor, a multi-channel multiple regression (MMR) predictor.
8. The method of claim 1 or 2, wherein solving the predictive model comprises minimizing an error metric between an output of the predictive model and the first input image.
9. The method of claim 8, wherein generating the set of predictive model parameters includes calculating , Wherein, the A vector representation representing the prediction model parameters, Representing the first enhanced input data set, an Representing a matrix based on the second enhanced input dataset.
10. The method of claim 9, wherein, for color component ch, And , Wherein, the Pixel values representing the first enhanced input data set, Comprising pixel values of said first input image, and Comprising pixel values of said first input image, wherein Or pixel values of the first input image with added noise.
11. The method of claim 1 or 2, further comprising: generating a first modified dataset based on a modified representation of the first input image; generating a second modified dataset based on the modified representation of the second input image; generating the noise input data set by adding the noise data to the second modified data set; Generating the first enhanced input data set based on the first modified data set, and The second modified data set and the noise input data set are combined to generate the second enhanced input data set.
12. The method of claim 11, wherein the first modified dataset comprises a sub-sampled version of the first input image or a three-dimensional table map (3 DMT) representation of the first input image.
13. The method of claim 11, wherein the second modified dataset comprises a sub-sampled version of the second input image or a three-dimensional table map (3 DMT) representation of the second input image.
14. A non-transitory computer readable storage medium having stored thereon computer executable instructions for performing the method of any of claims 1 to 13 using one or more processors.
15. An apparatus for generating prediction coefficients comprising a processor and configured to perform any of the methods of claims 1 to 13.

Description

Image prediction for HDR imaging in open loop codec Cross Reference to Related Applications The present application claims priority from european patent application number 20182014.9 and U.S. provisional application number 63/043,198, both filed 24 at 6/2020, each of which is incorporated herein by reference in its entirety. Technical Field The present invention relates generally to images. More particularly, embodiments of the invention relate to image prediction for High Dynamic Range (HDR) imaging in open loop codecs. Background As used herein, the term 'Dynamic Range (DR)' may relate to the ability of the Human Visual System (HVS) to perceive a range of intensities (e.g., luminance, brightness) in an image, such as from darkest gray (black) to brightest white (highlight). In this sense, DR is related to the (scene-referred) 'intensity of the' reference scene. DR may also relate to the ability of a display device to adequately or approximately render an intensity range of a particular breadth (breadth). In this sense, DR is related to the (display-referred) 'intensity of the' reference display. Unless a specific meaning is explicitly specified to have a specific meaning at any point in the description herein, it should be inferred that the terms can be used interchangeably in either sense, for example. As used herein, the term "High Dynamic Range (HDR)" relates to DR broadness of 14 to 15 orders of magnitude across the Human Visual System (HVS). Indeed, DR, which is widely broady in the range of human perceived intensities at the same time, may be slightly truncated relative to HDR. In practice, an image includes one or more color components (e.g., luminance Y and chrominance Cb and Cr), where each color component is represented by an accuracy of n bits per pixel (e.g., n=8). Linear or gamma luma coding is used, where an image with n≤8 (e.g., a color 24-bit JPEG image) is considered a standard dynamic range image, and where an image with n >8 can be considered an enhanced or high dynamic range image. HDR images may also be stored and distributed using a high precision (e.g., 16 bit) floating point format, such as the OpenEXR document format developed by Industrial optical magic LIGHT AND MAGIC. Most consumer desktop displays currently support light brightness of 200 to 300 cd/m 2 or nit. Most consumer HDTV ranges from 300 to 500 nits, with the new model number reaching 1000 nits (cd/m 2). Thus, such conventional displays represent a Lower Dynamic Range (LDR), also known as Standard Dynamic Range (SDR), associated with HDR. As the availability of HDR content increases due to the development of both capture devices (e.g., cameras) and HDR displays (e.g., the dolby laboratory PRM-4200 professional reference monitor), the HDR content may be color graded and displayed on HDR displays supporting a higher dynamic range (e.g., from 1,000 nits to 5,000 nits or higher). As used herein, the term "shaping (reshaping)" or "remapping (remapping)" refers to the process of sample-to-sample mapping or codeword-to-codeword mapping of a digital image from its original bit depth and original codeword distribution or representation (e.g., gamma, PQ, HLG, etc.) to an image of the same or different bit depths and different codeword distributions or representations. Shaping allows improving the compressibility or improving the image quality at a fixed bit rate. For example, without limitation, forward shaping may be applied to HDR video encoded with 10-bit or 12-bit PQ to improve coding efficiency in a 10-bit video coding architecture. In the receiver, after decompressing (or possibly not shaping) the received signal, the receiver may apply a reverse (or backward) shaping function to restore the signal to its original codeword distribution and/or achieve a higher dynamic range. In HDR encoding, image prediction (or shaping) allows an HDR image to be reconstructed using a baseline Standard Dynamic Range (SDR) image and a set of prediction coefficients representing a backward shaping function. The legacy device may simply decode the SDR image, however, the HDR display may reconstruct the HDR image by applying a backward shaping function to the SDR image. In video coding, such image prediction can be used to improve coding efficiency while maintaining backward compatibility. Such a system may be referred to as "closed loop" where the encoder includes a decoding path and the prediction coefficients are derived based on the original and decoded SDR and HDR data, or "open loop" where there is no such decoding loop and the prediction coefficients are derived based on only pairs of original data. As appreciated by the inventors herein, there is a need for improved techniques for efficient image prediction for open loop codecs. The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Thus, unless otherwise indicated, any approaches descr