EP-4738852-A2 - FILE PARSER, FILE GENERATOR, ENCODER, DECODER, CLIENT, SERVER, AND METHODS USING PARAMETER SETS FOR CODED VIDEO SEQUENCES
Abstract
The present invention concerns file parsers, file generators encoders, decoders, clients, servers and methods using parameter sets for coded video sequences. Said parameter sets may comprise decoding parameter information and may be conveyed in-band or out-of band. Some embodiments provide solutions to problems being related with correctly initializing a decoder for decoding coded video sequences in open GOP switching scenarios. Some embodiments may provide for hierarchical track switching levels.
Inventors
- SÁNCHEZ DE LA FUENTE, Yago
- SKUPIN, Robert
- HELLGE, CORNELIUS
- SCHIERL, THOMAS
- GRÜNEBERG, Karsten
- WIEGAND, THOMAS
Assignees
- Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Dates
- Publication Date
- 20260506
- Application Date
- 20211220
Claims (12)
- A method for reconstructing a video bitstream (310) from a video file (201), wherein the video file (201) comprises one or more tracks, each having one or more input video bitstreams (211, 212) embedded therein, wherein each of said one or more input video bitstreams (211, 212) comprises one or more coded video sequences (203), wherein the method comprises steps of retrieving from the video file (201) one or more Sample Groups (ParameterSetSamplegroup) to which one or more samples of said one or more tracks belong, and an indication which samples belong to which Sample Group (ParameterSetSamplegroup), wherein each Sample Group (ParameterSetSamplegroup) is associated with at least one initialization parameter set (204a, ..., 204h) to be inserted into a reconstructed bitstream (310) and to be referenced by samples thereof, and wherein the method further comprises a step of providing a decoder with a predetermined initialization parameter set (204h) of a predetermined Sample Group (ParameterSetSamplegroup).
- The method according to claim 1, further comprising a step of selecting said predetermined initialization parameter set (204h) of said predetermined Sample Group based on a desired operation point and/or a desired decoding behavior of the decoder for decoding said one or more coded video sequences (203).
- The method according to claim 2, wherein said desired operation point and/or decoding behavior of the decoder depends on at least one of the following attributes, and wherein at least one of said attributes is contained in the video file (201) and associated with the respective Sample Group: • a predetermined codec profile, • a predetermined codec level, • a predetermined picture width of the pictures contained in said one or more coded video sequences (203) to be decoded, • a predetermined picture height of the pictures contained in said one or more coded video sequences (203) to be decoded, • a predetermined bitdepth of the pictures contained in said one or more coded video sequences (203) to be decoded, • a predetermined color format of the pictures contained in said one or more coded video sequences (203) to be decoded, and • a predetermined size of a decoded picture buffer (DPB) for the pictures contained in said one or more coded video sequences (203) to be decoded.
- The method according to claim 1, further comprising a step of selecting said predetermined initialization parameter set (204h) of said predetermined Sample Group based on a desired operation point and/or maximum decoding capability of the decoder for decoding said one or more coded video sequences (203).
- The method according to claim 4, wherein said desired operation point and/or maximum decoding behavior of the decoder depends on at least one of the following attributes, and wherein at least one of said attributes is contained in the video file (201) and associated with the respective Sample Group: • a maximum codec profile, • a maximum codec level, • a maximum picture width of the pictures contained in said one or more coded video sequences (203) to be decoded, • a maximum picture height of the pictures contained in said one or more coded video sequences (203) to be decoded, • a maximum bitdepth of the pictures contained in said one or more coded video sequences (203) to be decoded, • a maximum color format of the pictures contained in said one or more coded video sequences (203) to be decoded, and • a maximum size of a decoded picture buffer (DPB) for the pictures contained in said one or more coded video sequences (203) to be decoded.
- The method according to one of claims 1 to 5, wherein if one or more samples within a track refer to a default in-band initialization parameter set (204g) that is associated by default with said track, and if said one or more samples are marked as belonging to at least one of the one or more Sample Groups, then the method further comprises steps of deriving a predetermined one (204h) of the at least one initialization parameter sets (204a, ..., 204h) being associated with the one or more Sample Groups, and replacing the otherwise in-band transmitted initialization parameter set (204g) by said predetermined one initialization parameter set (204h) of the respective Sample Group, and providing this predetermined one initialization parameter set (204h) of the Sample Group, which replaced the otherwise in-band transmitted initialization parameter set (204g), as the predetermined initialization parameter set within the reconstructed bitstream (310) to the decoder.
- The method according to one of claims 1 to 6, wherein if a track is not associated, by default, with any default in-band initialization parameter set (204a, ..., 204g), and if one or more of the samples which are contained in said track are marked as belonging to at least one of the one or more Sample Groups, then the method comprises further steps of inserting a predetermined one (204h) of the at least one initialization parameter sets which are associated with the one or more Sample Groups into the reconstructed bitstream (310), and providing said predetermined one initialization parameter set (204h) of the one or more Sample Groups as the predetermined initialization parameter set within the reconstructed bitstream (310) to the decoder.
- The method according to claim 7, further comprising a step of deriving a signal for a track, indicating that said track is not associated, by default, with any default in-band initialization parameter set (204a, ..., 204g), and/or that an initialization parameter set (204h) for this track has to be taken from at least one of the available one or more Sample Groups, and that this taken initialization parameter set (204h) has to be provided as the predetermined initialization parameter set to the decoder.
- The method according to claim 7 or 8, further comprising a step of deriving a signal for a track, indicating that those samples of the track, which are marked as belonging to one or more Sample Groups, do not refer, by default, to any default in-band initialization parameter set (204a, ..., 204g), and/or that for those samples an initialization parameter set (204h) from at least one of the available one or more Sample Groups, to which said those samples belong, has to be inserted into the reconstructed bitstream (310) and to be provided as the predetermined initialization parameter set to the decoder.
- The method according to any one of claims 1 to 9, wherein one sample can be assigned to more than one Sample Group.
- The method according to any one of claims 1 to 10, wherein the predetermined initialization parameter set (204a, ..., 204h) is at least one of a picture parameter set (PPS), a sequence parameter set (SPS) and a video parameter set (VPS).
- A computer program for implementing the method of any one of the preceding claims when being executed on a computer or signal processor.
Description
The present invention is concerned with video coding. Embodiments of the present disclosure relate to a file parser, a file generator, an encoder, a decoder, a client, a server and corresponding methods for selecting parameter sets for coded video sequences. Some embodiments concern a sample entry selection for stream switching. In the field of video coding, parameter sets may be used, for instance, for initializing a decoding behaviour of a decoder. These parameter sets may comprise decoding parameter information to be used at the start of a coded video sequence, in order to properly decode the pictures contained in said coded video sequence. In some cases, the change of a parameter set may trigger the start of a new coded video sequence. ISOBMFF allows storage of parameter sets in two different ways. The first one consists of the so called out-of-band parameter set integration, which means that the parameter sets are not stored together with the other non-VCL and VCL NAL units of the AUs (samples in terms of ISOBMFF) but in the sample entry of a track. Basically, the sample entry gives detailed information about the coding type used, and any initialization information needed for that coding. This information includes parameter sets. A particular track may have several sample entries that apply to different parts of the bitstream. For instance, if a bitstream consists of two Coded Video Sequences (CVSs) that each refer to different SPS (with different content) with the same ID, two sample entries are required, in which SPS that have the same ID are stored. Samples within the track point to the proper sample entry that is used, either using the SampleToChunkBox 'stcs' or when the samples come in a fragmented track, e.g. when MPEG-DASH segments are used, the track fragment header of the track contained in e.g. the DASH segment ('tfhd') contain the sample_description_entry that points to which sample entry is used. Thus, the correct out-of-band parameters can be used for decoding the samples as identified by the proper sample description index. The other alternative is to convey the parameter sets together with the samples as so-called in-band parameter sets and thus they are already present in the bitstream within the AUs as required. However, particularly for track switching or representation switching in DASH some issues may arise if not tackled properly. In particular, VVC allows for open GOP resolution switching which is advantageous as a higher efficiency can be achieved compared to closed GOP encoding but a couple of issues need to be taken into account in an adaptive HTTP streaming environment. Thus, it is an object of the present invention to provide solutions for parameter selection in video coding environments, for example in cases where switching between bitstreams, tracks or representations occurs. According to the invention, these solutions are provided by means of the file parser, file generator, encoder, decoder, client, server and corresponding methods according to the independent claims. Advantageous implementations and embodiments are the subject of the dependent claims. A first aspect concerns a hierarchical track grouping for initialization parameter selection (e.g. in sample entries), as well as a hierarchical entity grouping for initialization parameter selection (e.g. in sample entries), and track grouping with additional/alternative initialization parameters (e.g. in sample entries). In this first aspect, the initialization parameters may be conveyed out-of-band, e.g. in sample entries. According to this first aspect, a file parser is suggested for reconstructing a video bitstream from a video file, wherein the video file may comprise different tracks having at least two input video bitstreams embedded therein, wherein a video content may be coded differently in said at least two input video bitstreams, and wherein each input video bitstream may comprise one or more coded video sequences (CVS) with random access points. The file parser is configured to retrieve from the video file (e.g. for each representation) switching information that indicates whether the different tracks comprise, and which of the different tracks are, one or more switch-to candidates for the respective track. The term "for each representation" means each DASH representation that may be currently processed by the file parser or a client. That means, the file parser may perform the herein described steps for each representation that is currently processed by the file parser, wherein the file parser may only process one single representation at a time. For example, if representation switching is performed, it may be switched from a first representation to a second representation. Accordingly, the file parser processes the first representation at a first time instance and performs all the steps as described herein on said first representation. After the switch, the file parser processes the second representation at