EP-4398488-B1 - TECHNIQUES FOR MULTIPLE CONFORMANCE POINTS IN MEDIA CODING

EP4398488B1EP 4398488 B1EP4398488 B1EP 4398488B1EP-4398488-B1

Inventors

WENGER, STEPHAN
LIU, SHAN

Dates

Publication Date: 20260506
Application Date: 20191016

Claims (10)

A method for media encoding by an encoder, characterized by comprising: encoding, in a syntax structure which includes a syntax element that indicates a number of sub-profiles, and includes a list of the sub-profiles in a form of a loop with a fixed number of iterations, a first indication indicative of a first sub-profile that identifies a first defined set of tools capable of being used by the encoder to encode a video sequence to generate a coded video sequence that conforms to the first sub-profile and capable of being used by a decoder to decode the coded video sequence that conforms to the first sub-profile; encoding, in the syntax structure, a second indication indicative of a second sub-profile that identifies a second defined set of tools capable of being used by the encoder to encode the video sequence to generate the coded video sequence that conforms to the second sub-profile and capable of being used by the decoder to decode the coded video sequence that conforms to the second sub-profile; and providing the syntax structure inside or outside of the coded video sequence to a decoder which is capable of selectively decode the coded video sequence based on the first indication and the second indication.
The method of claim 1, wherein the first indication is coded as an octet string of at least three octets in length, and the first indication is a sub-profile of the second indication.
The method of claim 1 or 2, wherein a media bitstream conforms to both of the first sub-profile and the second sub-profile.
A device for media encoding, characterized by comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code, the program code including: first encoding code configured to cause the at least one processor to encode, in a syntax structure which includes a syntax element that indicates a number of sub-profiles, and includes a list of the sub-profiles in a form of a loop with a fixed number of iterations, a first indication, said first indication is indicative of a first sub-profile that identifies a first defined set of tools capable of being used by an encoder to encode a video sequence to generate a coded video sequence that conforms to the first sub-profile and capable of being used by a decoder to decode the coded video sequence that conforms to the first sub-profile; second encoding code configured to cause the at least one processor to encode, in the syntax structure, a second indication, said second indication is indicative of a second sub-profile that identifies a second defined set of tools capable of being used by the encoder to encode the video sequence to generate the coded video sequence that conforms to the second sub-profile and capable of being used by the decoder to decode the coded video sequence that conforms to the second sub-profile; and providing code configured to cause the at least one processor to provide the syntax structure inside or outside of the coded video sequence to a decoder which is capable of selectively decode the coded video sequence based on the first indication and the second indication.
The device of claim 4, wherein the first indication is coded as an octet string of at least three octets in length, and the first indication is a sub-profile of the second indication.
The device of claim 4 or 5, wherein a media bitstream conforms to both of the first sub-profile and the second sub-profile.
A non-transitory computer-readable medium storing instructions, the instructions characterized by comprising: one or more instructions that, when executed by one or more processors of a device, cause the one or more processors to: encode, in a syntax structure which includes a syntax element that indicates a number of sub-profiles, and includes a list of the sub-profiles in a form of a loop with a fixed number of iterations, a first indication, said first indication being indicative of a first sub-profile that identifies a first defined set of tools capable of being used by an encoder to encode a video sequence to generate a coded video sequence that conforms to the first sub-profile and capable of being used by a decoder to decode the coded video sequence that conforms to the first sub-profile; encode, in the syntax structure, a second indication, said second indication being indicative of a second sub-profile that identifies a second defined set of tools capable of being used by the encoder to encode the video sequence to generate the coded video sequence that conforms to the second sub-profile and capable of being used by the decoder to decode the coded video sequence that conforms to the second sub-profile; and provide the syntax structure inside or outside of the coded video sequence to a decoder which is capable of selectively decode the coded video sequence based on the first indication and the second indication.
The non-transitory computer-readable medium of claim 7, wherein the first indication is coded as an octet string of at least three octets in length, and the first indication is a sub-profile of the second indication.
The non-transitory computer-readable medium of claim 7 or 8, wherein a media bitstream conforms to both of the first sub-profile and the second sub-profile.
A coded video sequence obtained by the method of any of claims 1 to 3.

Description

Field The disclosed subject matter relates to media coding and decoding, and more specifically, to the representation of multiple conformance points such as profiles, sub-profiles, tiers, or levels, in a bitstream. Background Video coding and decoding using inter-picture prediction with motion compensation has been known for decades. Uncompressed digital video can consist of a series of pictures, each picture having a spatial dimension of, for example, 1920 x 1080 luminance samples and associated chrominance samples. The series of pictures can have a fixed or variable picture rate (informally also known as frame rate), of, for example 60 pictures per second or 60 Hz. Uncompressed video has significant bitrate requirements. For example, 1080p60 4:2:0 video at 8 bit per sample (1920x1080 luminance sample resolution at 60 Hz frame rate) requires close to 1.5 Gbit/s bandwidth. An hour of such video requires more than 600 GB of storage space. One purpose of video coding and decoding can be the reduction of redundancy in the input video signal, through compression. Compression can help reducing aforementioned bandwidth or storage space requirements, in some cases by two orders of magnitude or more. Both lossless and lossy compression, as well as a combination thereof can be employed. Lossless compression refers to techniques where an exact copy of the original signal can be reconstructed from the compressed original signal. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between original and reconstructed signal is small enough to make the reconstructed signal useful for the intended application. In the case of video, lossy compression is widely employed. The amount of distortion tolerated depends on the application; for example, users of certain consumer streaming applications may tolerate higher distortion than users of television contribution applications. The compression ratio achievable can reflect that: higher allowable/tolerable distortion can yield higher compression ratios. A video encoder and decoder can utilize techniques from several broad categories, including, for example, motion compensation, transform, quantization, and entropy coding, some of which will be introduced below. In order to help a decoder or an underlying system to determine whether a given coded media bitstream is decodable, and also to assist in tasks such as capability negotiation, conformance points have been introduced. For example, in MPEG and similar standards, a profile may indicate a defined subset of a collection of tools that may be present in a bitstream. For example, in H.264, the baseline profile does not include tools related to interlace coding, whereas the main profile includes such tools. Similarly, a level may indicate an upper bound of bitstream complexity. Similarly, a tier may indicate a bitstream complexity (maximum bitrate for a given temporal-spatial resolution) of a given standard. Until around 2003, standards often introduced profiles that were onion-shaped. Levels and tiers are defined as onion-shaped even today. Onion-shaped here implies that all tools defined for "lower" profiles (usually though not always indicated by a numerically lower profile indicator value) were included in a higher profile. Referring to FIG. 1, a baseline profile (101) is shown as a small circle, fully enclosed by a larger circle indicating a main profile (102). This figure illustrates that all tools of the baseline profile (101) are also included in the main profile (102). Assume the baseline profile would be represented by a profile ID of 0, and the main profile by a profile ID of 1. As a result, comparing a single value of the profile ID as coded in the video bitstream against the profile ID the decoder or underlying system is capable of decoding was sufficient to establish whether or not a given bitstream is decodable from a profile viewpoint. For example, if a decoder were able to decode a main profile (with ID = 1), then, when exposed to a baseline bitstream with ID = 0, the decoder is able to decode the bitstream. Levels offer an additional dimension of bitstream complexity, often measured in a combination of processing requirements (such as: samples per second) and memory requirements (such as: maximum number of samples in a picture, or bit depth, ...). Levels in MPEG standards are generally onion-shaped. In order to decode a given bitstream, both profile and level of the bitstream have to be lower or equal than the profile and level of the decoder. With the finalization of H.264 in 2003, profiles were introduced that are not onion-shaped. For example, (unconstrained) baseline of H.264 (201) includes a tool known as Flexible Macroblock Ordering (FMO) (202), whereas a main profile (203) does not include that tool. Similarly, a main profile includes tools to support interlace coding (204), that are not included in the baseline profile. Many other tools are