CN-115606181-B - Video processing method, device and recording medium

CN115606181BCN 115606181 BCN115606181 BCN 115606181BCN-115606181-B

Abstract

Several techniques for video encoding and video decoding are described. An example method includes performing a transition between a sub-picture in a video picture of a video and a bitstream of the video according to a rule. The rule specifies that cross-layer alignment restrictions are applied to fewer than all of the multiple layers including a current layer including the sub-picture and a subset of the layers associated with the current layer, in the case where the sub-picture is considered a video picture for conversion.

Inventors

WANG YEKUI
ZHANG LI
ZHANG KAI
DENG ZHIPIN

Assignees

抖音视界有限公司
字节跳动有限公司

Dates

Publication Date: 20260505
Application Date: 20210322
Priority Date: 20200321

Claims (20)

1. A video processing method, comprising: Conversion between a video including a plurality of layers and a bitstream of the video is performed according to a rule, Wherein the rule specifies that, in a case where a first syntax element included in a sequence parameter set indicates that a number of sub-pictures in a video picture is greater than 1 and a sub-picture having a first sub-picture index is considered for the converted video picture, a cross-layer alignment constraint is applied to a current layer including the sub-picture and a subset of layers associated with the current layer, wherein the subset of layers associated with the current layer includes one or more higher layers that are dependent on the current layer, Wherein the cross-layer alignment restriction includes restricting a current picture including the sub-picture and a first picture in the subset of layers to have the same value of at least one of the following syntax elements: The value of the first syntax element, A value of a second syntax element indicating a dimension of the video picture, A value of a third syntax element indicating a dimension of an i-th sub-picture, A value of a fourth syntax element indicating a position of an i-th sub-picture, or The value of the first sub-picture index.
2. The method of claim 1, wherein a subset of the layers associated with the current layer excludes all higher layers that are not dependent on the current layer.
3. The method of claim 1, wherein the subset of layers associated with the current layer excludes all lower layers of the current layer.
4. The method of claim 1, wherein the subset of layers associated with the current layer is a subset of a dependency tree associated with the current layer, Wherein the dependency tree associated with the current layer includes the current layer, all layers having the current layer as a reference layer, and all reference layers of the current layer.
5. The method of claim 4, wherein the subset of layers associated with the current layer is the subset of the dependency tree regardless of whether any of the subset of the dependency tree is an output layer in an output layer set.
6. The method of claim 1, wherein the cross-layer alignment limit further comprises a limit on a value of a fifth syntax element, and the current layer and the subset of layers associated with the current layer have the same value of the fifth syntax element, Wherein the fifth syntax element specifies whether the sub-picture of each coded picture in a coding layer video sequence is considered a picture in a decoding process that excludes a loop filtering operation.
7. The method of claim 1, wherein the cross-layer alignment limit excludes a value of a sixth syntax element, The sixth syntax element specifies whether loop filtering operations are enabled across sub-picture boundaries.
8. The method of claim 1, wherein the rule further specifies that the cross-layer alignment restriction is not to be applied if the first syntax element indicates that the video picture comprises a single sub-picture.
9. The method of claim 1, wherein the rule further specifies that the cross-layer alignment restriction is not applied in the event that a seventh syntax element included in the sequence parameter set indicates that sub-picture information is not present.
10. The method of claim 1, wherein the rule further specifies that the cross-layer alignment restriction is applied to pictures in a target set of access units.
11. The method of claim 10, wherein, for each CLVS of the current layer of reference sequence parameter sets, the target set of access units includes, according to a decoding order, all access units starting from a first access unit including a first picture of the CLVS to a second access unit including a last picture of the CLVS.
12. The method of claim 1, wherein the converting comprises encoding the video into the bitstream.
13. The method of claim 1, wherein the converting comprises decoding the bitstream to generate the video.
14. A device for processing video data, comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to: Conversion between a video including a plurality of layers and a bitstream of the video is performed according to a rule, Wherein the rule specifies that, in a case where a first syntax element included in a sequence parameter set indicates that a number of sub-pictures in a video picture is greater than 1 and a sub-picture having a first sub-picture index is considered for the converted video picture, a cross-layer alignment constraint is applied to a current layer including the sub-picture and a subset of layers associated with the current layer, wherein the subset of layers associated with the current layer includes one or more higher layers that are dependent on the current layer, Wherein the cross-layer alignment restriction includes restricting a current picture including the sub-picture and a first picture in the subset of layers to have the same value of at least one of the following syntax elements: The value of the first syntax element, A value of a second syntax element indicating a dimension of the video picture, A value of a third syntax element indicating a dimension of an i-th sub-picture, A value of a fourth syntax element indicating a position of an i-th sub-picture, or The value of the first sub-picture index.
15. The apparatus of claim 14, wherein a subset of the layers associated with the current layer excludes all higher layers that are independent of the current layer.
16. The apparatus of claim 14, wherein the subset of layers associated with the current layer excludes all lower layers of the current layer.
17. A non-transitory computer-readable storage medium storing instructions that cause a processor to: Conversion between a video including a plurality of layers and a bitstream of the video is performed according to a rule, Wherein the rule specifies that, in a case where a first syntax element included in a sequence parameter set indicates that a number of sub-pictures in a video picture is greater than 1 and a sub-picture having a first sub-picture index is considered for the converted video picture, a cross-layer alignment constraint is applied to a current layer including the sub-picture and a subset of layers associated with the current layer, wherein the subset of layers associated with the current layer includes one or more higher layers that are dependent on the current layer, Wherein the cross-layer alignment restriction includes restricting a current picture including the sub-picture and a first picture in the subset of layers to have the same value of at least one of the following syntax elements: The value of the first syntax element, A value of a second syntax element indicating a dimension of the video picture, A value of a third syntax element indicating a dimension of an i-th sub-picture, A value of a fourth syntax element indicating a position of an i-th sub-picture, or The value of the first sub-picture index.
18. A non-transitory computer readable recording medium storing a bitstream of video and instructions generated by a method performed by a video processing device, wherein the instructions, when executed by the video processing device, implement the method, wherein the method comprises: generating the bitstream of the video comprising a plurality of layers according to rules, Wherein the rule specifies that, in a case where a first syntax element included in a sequence parameter set indicates that a number of sub-pictures in a video picture is greater than 1 and a sub-picture having a first sub-picture index is considered for the converted video picture, a cross-layer alignment constraint is applied to a current layer including the sub-picture and a subset of layers associated with the current layer, wherein the subset of layers associated with the current layer includes one or more higher layers that are dependent on the current layer, Wherein the cross-layer alignment restriction includes restricting a current picture including the sub-picture and a first picture in the subset of layers to have the same value of at least one of the following syntax elements: The value of the first syntax element, A value of a second syntax element indicating a dimension of the video picture, A value of a third syntax element indicating a dimension of an i-th sub-picture, A value of a fourth syntax element indicating a position of an i-th sub-picture, or The value of the first sub-picture index.
19. A method for storing a bitstream of video, comprising: a bitstream of video comprising a plurality of layers is generated according to a rule, The bit stream is stored in a non-transitory computer readable recording medium, Wherein the rule specifies that, in a case where a first syntax element included in a sequence parameter set indicates that a number of sub-pictures in a video picture is greater than 1 and a sub-picture having a first sub-picture index is considered for the generated video picture, a cross-layer alignment constraint is applied to a current layer including the sub-picture and a subset of layers associated with the current layer, wherein the subset of layers associated with the current layer includes one or more higher layers that are dependent on the current layer, Wherein the cross-layer alignment restriction includes restricting a current picture including the sub-picture and a first picture in the subset of layers to have the same value of at least one of the following syntax elements: The value of the first syntax element, A value of a second syntax element indicating a dimension of the video picture, A value of a third syntax element indicating a dimension of an i-th sub-picture, A value of a fourth syntax element indicating a position of an i-th sub-picture, or The value of the first sub-picture index.
20. A video processing method, comprising: Conversion between a video including a plurality of layers and a bitstream of the video is performed according to a rule, Wherein the rule specifies that, in the case where a sub-picture is considered a video picture for the conversion, a cross-layer alignment constraint is applied to less than all of the plurality of layers including a subset of a current layer including the sub-picture and a layer associated with the current layer, the cross-layer alignment constraint including a constraint on at least one dimension of the video picture, a number of sub-pictures within the video picture, a position of at least one sub-picture, or an identification of a sub-picture.

Description

Video processing method, device and recording medium Cross Reference to Related Applications The present application is based on international patent application number PCT/CN 2021/081029 filed on day 22 of 3 in 2021, which claims priority and benefit of international patent application number PCT/CN2020/080533 filed on day 21 of 3 in 2020. All of the above patent applications are incorporated herein by reference in their entirety. Technical Field This patent document relates to image and video encoding and decoding. Background In the internet and other digital communication networks, digital video occupies the greatest bandwidth. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for digital video usage are expected to continue to increase. Disclosure of Invention Techniques are disclosed herein that may be used by video encoders and decoders to process a codec representation of video using control information useful for decoding the codec representation. In one example aspect, a video processing method is disclosed. The method includes performing a transition between a current picture of the video and a bitstream of the video according to a rule. The rule specifies a plurality of syntax elements for specifying use of a reference picture resampling tool for resampling a reference picture having a different resolution than a current picture for conversion. In another example aspect, a video processing method is disclosed. The method includes performing a transition between a current picture of the video and a bitstream of the video according to a rule. The rule specifies that syntax elements having non-binary values are used to specify (1) a reference picture resampling tool that resamples a reference picture having a different resolution than a current picture, and (2) use of a change in picture resolution within a Codec Layer Video Sequence (CLVS). In another example aspect, a video processing method is disclosed. The method includes performing conversion between video including a plurality of layers and a bitstream of the video according to a rule. The rule specifies that, in the case where the sub-picture is considered a video picture for conversion, a cross-layer alignment constraint is applied to less than all of the plurality of layers including a current layer including the sub-picture and a subset of layers associated with the current layer, wherein the cross-layer alignment constraint includes a constraint on at least one dimension of the video picture, a number of sub-pictures within the video picture, a location of the at least one sub-picture, or an identification of the sub-picture. In another example aspect, a video processing method is disclosed. The method includes performing a transition between a current layer of the video and a bitstream of the video according to a rule. The rule specifies that cross-layer alignment constraints are applied to all layers in the dependency tree associated with the current layer, regardless of whether any of the layers are output layers in the output layer set. The cross-layer alignment restrictions include restrictions on at least one dimension of the video picture, a number of sub-pictures within the video picture, a location of at least one sub-picture, or an identification of the sub-picture. All layers in the dependency tree include the current layer, all layers having the current layer as a reference layer, and all reference layers of the current layer. In another example aspect, a video processing method is disclosed. The method includes performing a transition between a current picture of the video and a bitstream of the video according to a rule. The rule specifies that resampling of a reference picture in the same layer as the current picture is enabled regardless of the value of a syntax element specifying whether a change in picture resolution is allowed in a Codec Layer Video Sequence (CLVS). In another example aspect, a video processing method is disclosed. The method includes performing a conversion between a current picture of a video including a plurality of layers and a bitstream of the video according to a rule. The rule specifies one of (1) not allowing the reference picture of the current picture to be collocated, or (2) using a motion vector pointing to the reference picture during a transition where the current picture is not scaled, if the reference picture of the current picture is collocated. In another example aspect, a video processing method is disclosed. The method includes performing a conversion between the video and the bit stream of the video according to a rule. The rule specifies that the scaling window offset parameter is the same for any two video pictures in the same Codec Layer Video Sequence (CLVS) or Codec Video Sequence (CVS) having the same dimension expressed in terms of the number of luma samples. In another example aspect, a video processing m