Search

EP-4736444-A1 - METADATA FOR SIGNALING SOURCE PICTURE TIMING INFORMATION

EP4736444A1EP 4736444 A1EP4736444 A1EP 4736444A1EP-4736444-A1

Abstract

Methods, systems, and bitstream syntax are described for video coding and decoding using source picture timing information which is captured by an encoder and is signaled as metadata to a decoder to assist in decoding. The proposed methods include example syntax for signaling source picture timing metadata as supplemental enhancement information (SEI) messaging for both single-layer and multi-layer video sequences.

Inventors

  • MCCARTHY, Sean Thomas
  • YIN, PENG
  • SULLIVAN, GARY J.

Assignees

  • Dolby Laboratories Licensing Corporation

Dates

Publication Date
20260506
Application Date
20240617

Claims (20)

  1. 1. A method for decoding a video bitstream, the method comprising: receiving a coded video bitstream comprising an encoded picture section including an encoding of a sequence of video pictures and a signaling section including source picture timing parameters, wherein the source picture timing parameters comprise: a source-picture time-scale parameter indicating the number of time units passing in one second; and a source-picture number-of units-in-source-picture interval parameter indicating a number of time units of a clock operating at the frequency of the source-picture time-scale parameter; and decoding the sequence of video pictures based on the source picture timing parameters.
  2. 2. The method of claim 1 , further comprising computing a source-picture-interval (SourcePicturelnterval) value as the quotient of the source-picture number-of units-in sourcepicture interval divided by the source-picture time-scale.
  3. 3. A method for decoding a video bitstream, the method comprising: receiving a coded video bitstream comprising an encoded picture section including an encoding of a sequence of video pictures and a signaling section including source picture timing (SPT) parameters, wherein the source picture timing parameters comprise: a source-picture time-scale parameter indicating the number of time units passing in one second; a source-picture number-of units-in-elemental-source-picture interval parameter indicating a number of time units of a clock operating at the frequency of the source-picture time-scale parameter corresponds to an indicated elemental source picture interval of consecutive output pictures; and a source-picture interval scale factor parameter specifying a scale factor; and decoding the sequence of video pictures based on the source picture timing parameters.
  4. 4. The method of claim 3, further comprising: computing a source-picture-interval (SourcePicturelnterval) value as the multiplication product of an elemental source picture interval with the source-picture interval scale factor, wherein the elemental source picture interval is computed as the quotient of the source-picture number-of units-in-elemental-source- picture interval parameter divided by the source-picture time-scale parameter.
  5. 5. The method of claim 2 or claim 4, further comprising computing a source-picture time for picture n as: SourcePictureTimef n ] = SourcePictureTimef previousPicInOutputOrder ] + SourcePicturelnterval, wherein picture n, n > 0, denotes an index of an output picture that is not the first output picture in the video bitstream, and variable previousPicInOutputOrder denotes the last picture that is output that precedes picture n in output order (if any).
  6. 6. The method of claim 3, wherein the source picture timing parameters further comprise a source picture sublayer maximum temporal ID identifying a maximum sublayer for which the SPT parameters apply.
  7. 7. The method of claim 3, wherein the source picture timing parameters further comprise: a source picture temporal sublayer ID for which SPT parameters apply; and a source picture sublayer delay factor which specifies a scale factor used in determining a temporal distance between a source picture corresponding to the first decoded output picture of a temporal sublayer having a value of Temporalld equal to 0 and a source picture corresponding to the first decoded output picture of temporal sublayer having a value of TemporallD equal to the source picture temporal sublayer ID.
  8. 8. The method of claim 7, further comprising computing a source-picture time for picture n as: SourcePictureTimef n ] = SourcePictureTimef previousPicInOutputOrder ] + SourcePicturelnterval +SublayerSourcePictureDelay, wherein picture n, n > 0, denotes an index of an output picture that is not the first output picture in the video bitstream and variable previousPicInOutputOrder denotes the last picture that is output that precedes picture n in output order (if any), and Sublayers ourcePictureDelay indicates the source picture sublayer delay factor.
  9. 9. The method of claim 3, wherein the source-picture interval scale factor parameter is specified for each one of N sublayers, wherein N > 0.
  10. 10. The method of claim 9, wherein the source picture timing parameters further comprise a source picture sublayer dyadic flag which if set to 1 indicates that temporal sublayers are coded in a dyadic relatationship and that the source-picture interval scale factor parameter specifying a scale factor syntax element is not present.
  11. 11. The method of claim 9, wherein the source picture timing parameters further include a source picture sublayer implicit timing flag which when set to 1 indicates that sublayer implicit timing type information is present for each of the sublayers.
  12. 12. The method of claim 9, wherein the source picture timing parameters further include a source picture sublayer synthesized picture flag for each one of the N sublayers, wherein, when set to 1 indicate that decoded output pictures belonging to the i-th temporal sublayer are synthesized and do not correspond to unmodified original source pictures.
  13. 13. The method of claim 9, wherein the source picture timing parameters further include a source picture timing type parameter which indicates a timing relationship between source pictures and corresponding decoded output pictures according to a mapping table.
  14. 14. The method of claim 9, wherein the source picture timing parameters further include a source picture-timing-equals-output-timing flag, which when set to 1 indicates that timing of source pictures is the same as the timing of corresponding decoded output pictures.
  15. 15. The method of claim 3, wherein the source-picture interval scale factor parameter is specified using an absolute magnitude value of the scale factor and a sign flag of the scale factor.
  16. 16. The method of claim 9, wherein the source picture timing parameters further include a source picture timing discontinuity flag, when set to 1 indicates that timing of source pictures corresponding to decoded output pictures is discontinuous.
  17. 17. The method of claim 16, wherein the source picture timing parameters further include a source picture timing discontinuity type parameter and a source picture transition type parameter, wherein the source picture timing discontinuity type parameter indicates a discontinuity in source pictures according to a first table and the source picture transition type parameter indicates a transition type in the sources pictures according to a second table.
  18. 18. The method of claim 4 wherein the source picture timing parameters further include a source type parameter, wherein the source type parameter indicates the timing relationship between source pictures and corresponding decoded output pictures.
  19. 19. The method of claim 18, wherein the source type parameter may indicate one or more of: slow motion, sped-up motion, high-speed imaging, time-lapse imaging, temporal reversal, a still image, or sporadic imaging.
  20. 20. The method of claim 19, wherein the source type parameter may not indicate both high speed imaging and time-lapse imaging.

Description

METADATA FOR SIGNALING SOURCE PICTURE TIMING INFORMATION CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This patent application claims the benefit of priority from U.S. Provisional Patent application Ser. No. 63/511,150, filed on Jun 29, 2023, and U.S. Provisional Patent application Ser. No 63/587,233, filed on 2 October 2023, each of which is incorporated by reference herein in its entirety. TECHNOLOGY [0002] The present document relates generally to images and video coding and decoding. More particularly, embodiments of the present invention relate to metadata for signaling source picture timing information. BACKGROUND [0003] In 2020, the MPEG group in the International Standardization Organization (ISO), jointly with the International Telecommunications Union (ITU), released the first version of the Versatile Video Coding Standard (VVC), also known as H.266 (Ref. [1]). More recently, the same group has been working on the development of the next generation coding standard that provides improved coding performance over existing video coding technologies. As part of this investigation, new coding techniques are also examined. [0004] In many applications, given a sequence of decoded pictures, it is of interest to determine the actual temporal distance between corresponding source pictures prior to encoding. For example, for camera-captured content, the temporal distance between source pictures is the difference between the time at which an image sensor was exposed to produce a source picture associated with the current decoded picture and the time at which the image sensor was exposed to produce the source picture associated with a previous decoded picture in output order. [0005] As appreciated by the inventors here, improved techniques for signaling such source picture timing information (SPTI) are needed and are presented herein. [0006] The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated. BRIEF DESCRIPTION OF THE DRAWINGS [0007] An embodiment of the present invention is illustrated by way of example, and not in way by limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which: [0008] FIGs. 1A through IE depict example scenarios in which the output timing of decoded pictures is different from the timing with which source pictures were captured or otherwise created, thus in need of source picture timing information (SPTI); and [0009] FIGs. 2A through 2C depict example encoding and decoding processes using SPTI messaging according to embodiments of this invention. DESCRIPTION OF EXAMPLE EMBODIMENTS [00010] Example embodiments that relate to signaling source picture timing information in video coding are described herein. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various embodiments of present invention. It will be apparent, however, that the various embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating embodiments of the present invention. SUMMARY Example embodiments described herein relate to signaling source picture timing information in image and video coding which is captured by an encoder and is signaled as metadata to a decoder to assist in decoding. The proposed methods include example syntax for signaling source picture timing metadata as supplemental enhancement information (SEI) messaging for both single-layer and multi-layer video sequences. SOURCE PICTURE TIMING INFORMATION (SPTI) INTRODUCTION [00011] Refs. [2-3] represent earlier proposals to provide some sort of picture timing information via supplemental enhancement information (SEI) messaging. In both contributions, the proposed SEI message was intended to indicate the actual motion speed of the content at capture time for the case in which the video bitstream contains slow motion scenes. Both contributions failed to indicate or signal the timing scale factor between the actual capture timing and the output timing. The proposed messaging could also interfere with conformance issues in a hypothetical reference decoder (HRD). [00012] Embodiments of the proposed SPTI signaling presented herein intend to address a similar but broader range of use cases as Refs [2-3], whilst avoiding all HRD confor