CN-121985137-A - Video coding related to sub-images

CN121985137ACN 121985137 ACN121985137 ACN 121985137ACN-121985137-A

Abstract

The present application relates to video coding associated with sub-images. Concepts are described that include encoding, processing, and decoding a data stream having video encoded therein, wherein the video includes a plurality of pictures, wherein the data stream includes a plurality of pictures in at least two layers, wherein a picture of at least one layer is partitioned into a predetermined layer-specific number of sub-pictures, one or more of the sub-pictures or pictures of one layer correspond to the sub-pictures or one picture in one or more other layers, and at least one of the sub-pictures includes boundaries for boundary extension for motion compensation, and an indication that at least one of the boundaries of the corresponding pictures or corresponding sub-pictures in different layers are aligned with each other.

Inventors

Y. Sanchez de la FAENTE
K. Xu Lin
C. Helleger
T. Schiller
R. Scuping
T. Wigan

Assignees

弗劳恩霍夫应用研究促进协会

Dates

Publication Date: 20260505
Application Date: 20201218
Priority Date: 20191220

Claims (20)

1. A data encoding method, comprising: Dividing a specific image into a plurality of sub-images; Determining a binary bit constraint such that the number of binary numbers for encoding of the sub-image is less than or equal to (32/3) NumByteslnVclNalUnits + (RawMinCuBits/PicSizelnMinCbsY)/(32); Encoding the particular image into a bitstream via context-adaptive binary arithmetic coding, CABAC, the encoding including inserting one or more zero words in the bitstream such that a binary bit constraint corresponding to one of the plurality of sub-images is satisfied, and An indication is provided to treat the sub-image as an image into which one or more zero words have been inserted by CABAC encoding.
2. The method of claim 1, wherein the indication comprises: sps_ subpic _treated_as/u pic_flag is set to 1; or (b) It is inferred that sps_ subpic _treated_as_pic_flag is set to 1 when sps_ subpic _treated_as_pic_flag is omitted from the bitstream.
3. The method of claim 1, wherein the indication to treat the sub-image as an image comprises an indication of a boundary extension to apply motion compensation to the sub-image.
4. The method of claim 1, wherein the zero word is inserted at an end of one or more slices of the sub-image.
5. The method according to claim 1, Wherein the plurality of images comprises at least two layers; Wherein the method further comprises segmenting the image of at least one layer into a predetermined layer-specific number of sub-images; One or more of the images or sub-images of one of the layers corresponds to one of the images or sub-images of one or more other layers; Wherein at least one of the sub-images comprises a boundary for motion compensated boundary extension, and Wherein the method further comprises encoding the indication such that at least one of the corresponding sub-images or the boundaries of the corresponding images in the different layers are aligned with each other.
6. A non-transitory computer-readable medium having instructions that, when executed, cause at least one processor to perform the method of claim 1.
7. An encoder, comprising: at least one processor configured to perform operations comprising: Dividing a specific image into a plurality of sub-images; Determining a binary bit constraint such that the number of binary numbers for encoding of the sub-image is less than or equal to (32/3) NumByteslnVclNalUnits + (RawMinCuBits/PicSizelnMinCbsY)/(32); Encoding the particular image into a bitstream via context-adaptive binary arithmetic coding, CABAC, the encoding including inserting one or more zero words in the bitstream such that a binary bit constraint corresponding to one of the plurality of sub-images is satisfied, and An indication is provided to treat the sub-image as an image into which one or more zero words have been inserted by CABAC encoding.
8. The encoder of claim 7, wherein the indication comprises: sps_ subpic _treated_as/u pic_flag is set to 1; or (b) It is inferred that sps_ subpic _treated_as_pic_flag is set to 1 when sps_ subpic _treated_as_pic_flag is omitted from the bitstream.
9. The encoder of claim 7, wherein the indication to treat the sub-picture as a picture comprises an indication of a boundary extension to apply motion compensation to the sub-picture.
10. The encoder of claim 7, wherein the zero words are inserted at the end of one or more slices of the sub-image.
11. An encoder according to claim 7, Wherein the plurality of images comprises at least two layers; the image of at least one layer is segmented into a predetermined layer-specific number of sub-images; One or more of the images or sub-images of one of the layers corresponds to one of the images or sub-images of one or more other layers; Wherein at least one of the sub-images comprises a boundary for motion compensated boundary extension, and The at least one processor is further configured to encode the indication such that at least one of the corresponding sub-images or the boundaries of the corresponding images in the different layers are aligned with each other.
12. A data decoding method, comprising: receiving a bitstream comprising a plurality of images, wherein a particular image of the plurality of images is divided into a plurality of sub-images; Decoding or inferring, based on the bitstream, an indication to treat a sub-image as an image into which one or more zero words have been inserted in the bitstream by context adaptive binary arithmetic coding, CABAC, wherein the sub-image corresponds to one of a plurality of sub-images; determining a binary bit constraint such that the number of binary numbers for encoding of said sub-image is less than or equal to (32/3)/(NumByteslnVclNalUnits + (RawMinCuBits x PicSizelnMinCbsY)/(32), and The particular image is decoded from a bitstream via context-adaptive binary arithmetic coding, CABAC, the decoding comprising parsing one or more zero words from the bitstream such that a binary bit constraint corresponding to the sub-image is satisfied.
13. The method of claim 12, wherein the indication comprises: sps_ subpic _treated_as/u pic_flag is set to 1.
14. The method of claim 12, wherein the indication to treat the sub-image as an image comprises an indication of a boundary extension to apply motion compensation to the sub-image.
15. The method of claim 12, wherein the zero word is parsed from an end of one or more slices of the sub-image.
16. The method according to claim 12, Wherein the plurality of images comprises at least two layers; the image of at least one layer is segmented into a predetermined layer-specific number of sub-images; One or more of the images or sub-images of one of the layers corresponds to one of the images or sub-images of one or more other layers; Wherein at least one of the sub-images comprises a boundary for motion compensated boundary extension, and Wherein the method further comprises interpreting the indication such that at least one of the corresponding sub-images or the boundaries of the corresponding images in the different layers are aligned with each other.
17. A non-transitory computer-readable medium having instructions that, when executed, cause at least one processor to perform the method of claim 12.
18. A decoder, comprising: at least one processor configured to perform operations comprising: receiving a bitstream comprising a plurality of images, wherein a particular image of the plurality of images is divided into a plurality of sub-images; Decoding or inferring, based on the bitstream, an indication to treat a sub-image as an image into which one or more zero words have been inserted in the bitstream by context adaptive binary arithmetic coding, CABAC, wherein the sub-image corresponds to one of a plurality of sub-images; determining a binary bit constraint such that the number of binary numbers for encoding of said sub-image is less than or equal to (32/3)/(NumByteslnVclNalUnits + (RawMinCuBits x PicSizelnMinCbsY)/(32), and The particular image is decoded from a bitstream via context-adaptive binary arithmetic coding, CABAC, the decoding comprising parsing one or more zero words from the bitstream such that a binary bit constraint corresponding to the sub-image is satisfied.
19. The decoder of claim 18, wherein the indication comprises: sps_ subpic _treated_as/u pic_flag is set to 1.
20. The decoder according to claim 18, Wherein the indication to treat the sub-image as an image comprises an indication of a boundary extension to apply motion compensation to the sub-image.

Description

Video coding related to sub-images Technical Field The application is a split application of 202080088665.6 title of video coding related to sub-images. The present application relates to video coding concepts and in particular to sub-pictures. Background There are certain video-based applications in which multiple encoded video bitstreams or data streams are to be jointly decoded (e.g., combined into a joint bitstream and fed into a single decoder), such as multiparty conferencing in which encoded video streams from multiple participants are processed on a single endpoint or tile-based streaming for video playback, e.g., 360 degree tiles in VR (virtual reality) applications. For Efficient Video Coding (HEVC), a motion constrained tile set is defined in which motion vectors are constrained to a non-reference tile set (or tile) that is different from the current tile set (or tile). Thus, the set of tiles (or tiles) in question may be extracted from the bitstream or incorporated into another bitstream without affecting the outcome of decoding, e.g., the decoded samples may be exact matches, whether the set of tiles (or tiles) in question is decoded alone or as part of a bitstream with more sets of tiles (or tiles) for each image. In the latter, an example is shown of 360 degree video, where such techniques are useful. The video is spatially segmented and each spatial segment (SPATIAL SEGMENT) is provided to the streaming client in multiple representations of varying spatial resolutions as illustrated in fig. 1. The figure shows a cube map of a 360 degree video projection divided into 6 x 4 spatial segments at two resolutions. For simplicity, these independent decodable spatial segments are referred to as tiles in this description. When using the state-of-the-art head mounted display as illustrated at the left hand side of fig. 2, a user typically views only a subset of tiles that make up the entire 360 degree video, through a blue solid viewport boundary representing a 90 x 90 degree field of view. The corresponding tiles shaded with green in fig. 2 are downloaded at the highest resolution. However, the client application will also have to download and decode representations of other tiles outside the current viewport (shaded in red in fig. 2) in order to handle the abrupt change in orientation of the user. A client in such an application would therefore download tiles that cover its current viewport at the highest resolution and download tiles outside its current viewport at a relatively lower resolution, while the choice of tile resolution is constantly adapted to the orientation of the user. After client-side downloading, merging the downloaded tiles into a single bitstream to process with a single decoder is a means of constraint for typical mobile devices with limited computing and power resources. Fig. 3 illustrates a possible tile arrangement in a joint bitstream for the above example. The merging operation for generating the joint bit stream has to be performed by compressed domain processing, e.g. avoiding processing the pixel domain by transcoding. Because the HEVC bitstream is being encoded following some constraints that mainly involve inter-layer (inter) coding tools, such as constrained motion vectors as described above, a merging process may be performed. The emerging codec VVC provides a more efficient other means for achieving the same goal, namely sub-pictures. By means of sub-images, regions smaller than the full image can be considered similar to the image in the sense that their boundaries are treated as if they were images, e.g. applying a boundary extension for motion compensation, e.g. if the motion vector points outside the region, the last sample of the region (the crossed boundary) is repeated to generate samples in the reference block for prediction as if it were done at the image boundary. Thus, the motion vectors are not constrained at the encoder by a corresponding loss of efficiency as in HEVC MCTS. Emerging VVC coding standards also contemplate providing scalable coding tools for multi-layer support in their primary profiles. Thus, by encoding the entire low resolution content with less frequent RAPs, a further efficient configuration for the above application scenario can be achieved. However, this may require the use of a layer coding structure that contains low resolution content always in the base layer and some high resolution content in the enhancement layer. The layered coding structure is shown in fig. 1. However, some use cases may still be of interest for allowing extraction of a single region of the bitstream. For example, a user with a higher end-to-end delay would download an entire 360 degree video (all tiles) at low resolution, while a user with a lower end-to-end delay would download tiles of less low resolution content, e.g., the same number as downloaded high resolution tiles. Thus, the extraction of layered sub-images should be properly handled by the video