US-12621487-B2 - Matrix-based intra prediction using filtering

US12621487B2US 12621487 B2US12621487 B2US 12621487B2US-12621487-B2

Abstract

Devices, systems and methods for digital video coding, which includes matrix-based intra prediction methods for video coding, are described. In a representative aspect, a method for video processing includes performing a conversion between a current video block of a video and a bitstream representation of the current video block using a matrix based intra prediction (MIP) mode in which a prediction block of the current video block is determined by performing, on reference boundary samples located to a left of the current video block and located to a top of the current video block, a boundary downsampling operation, followed by a matrix vector multiplication operation, and selectively followed by an upsampling operation, where instead of reduced boundary samples calculated from the reference boundary samples of the current video block in the boundary downsampling operation, the reference boundary samples are directly used for a prediction process in the upsampling operation.

Inventors

Zhipin Deng
Kai Zhang
Li Zhang
Hongbin Liu
Jizheng Xu

Assignees

BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.
BYTEDANCE INC.

Dates

Publication Date: 20260505
Application Date: 20220824
Priority Date: 20190501

Claims (20)

1 . A method of processing video data, comprising: determining, for a conversion between a video block of a video and a bitstream of the video, that a first intra mode is applied on the video block of the video, wherein process in the first intra mode includes a one-stage boundary downsampling operation, followed by a matrix vector multiplication operation and selectively followed by an upsampling operation to generate prediction samples for the video block of the video; and performing the conversion based on the prediction samples; wherein, in the one-stage boundary downsampling operation, when a downsampled boundary size of the video block is less than a size of the of the video block, reduced boundary samples are generated from reference boundary samples of the video block of the video by the one-stage boundary downsampling operation and are used to generate inputs to the matrix vector multiplication operation, and the reduced boundary samples are generated directly from the reference boundary samples of the video block and a downscaling factor without deriving intermediate samples, wherein the downscaling factor is calculated only once for a horizontal direction and a vertical direction, respectively, wherein the reference boundary samples include left and above reference boundary samples of the video block, wherein a first syntax element indicating whether to apply the first intra mode is included in the bitstream, wherein at least one bin of the first syntax element is context coded, and an increasement value of the context is determined based on characteristics of a neighboring block of the video block, wherein the increasement value of the context is determined further based on a size of the video block, wherein in response to a width-height ratio of the video block being greater than 2, a context with a first predefined increasement value is used for coding the at least one bin of the first syntax element, and wherein in response to a width-height ratio of the video block being smaller than or equal to 2, a context with a second increasement value is used for coding the at least one bin of the first syntax element, wherein the second increasement value is not identical to the first predefined increasement value.
2 . The method of claim 1 , wherein the left and above reference boundary samples are derived without an intra reference sample filtering process.
3 . The method of claim 1 , wherein the downscaling factor is calculated based on a size of the video block.
4 . The method of claim 1 , wherein inputting samples to the upsamping operation include upsampling boundary samples of the video block, and the upsampling boundary samples are not computed by averaging the reference boundary samples of the video block.
5 . The method of claim 4 , wherein at least one of the upsampling boundary samples are copied from the reference boundary samples.
6 . The method of claim 1 , wherein the matrix vector multiplication operation is followed by a transposing operation prior to the upsampling operation.
7 . The method of claim 6 , wherein the transposing operation converts a block having a width of a first value and a height of a second value to a block having the width of the second value and a height of the first value.
8 . The method of claim 1 , wherein the conversion includes encoding the video block into the bitstream.
9 . The method of claim 1 , wherein the conversion includes decoding the video block from the bitstream.
10 . An apparatus for processing video data comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to: determine, for a conversion between a video block of a video and a bitstream of the video, that a first intra mode is applied on the video block of the video, wherein process in the first intra mode includes a one-stage boundary downsampling operation, followed by a matrix vector multiplication operation and selectively followed by an upsampling operation to generate prediction samples for the video block of the video; and perform the conversion based on the prediction samples; wherein, in the one-stage boundary downsampling operation, when a downsampled boundary size of the video block is less than a size of the of the video block, reduced boundary samples are generated from reference boundary samples of the video block of the video by the one-stage boundary downsampling operation and are used to generate inputs to the matrix vector multiplication operation, and the reduced boundary samples are generated directly from the reference boundary samples of the video block and a downscaling factor without deriving intermediate samples, wherein the downscaling factor is calculated only once for a horizontal direction and a vertical direction, respectively, wherein the reference boundary samples include left and above reference boundary samples of the video block, wherein a first syntax element indicating whether to apply the first intra mode is included in the bitstream, wherein at least one bin of the first syntax element is context coded, and an increasement value of the context is determined based on characteristics of a neighboring block of the video block, wherein the increasement value of the context is determined further based on a size of the video block, wherein in response to a width-height ratio of the video block being greater than 2, a context with a first predefined increasement value is used for coding the at least one bin of the first syntax element, and wherein in response to a width-height ratio of the video block being smaller than or equal to 2, a context with a second increasement value is used for coding the at least one bin of the first syntax element, wherein the second increasement value is not identical to the first predefined increasement value.
11 . The apparatus of claim 10 , wherein the left and above reference boundary samples of the video block are derived without an intra reference sample filtering process.
12 . The apparatus of claim 10 , wherein the downscaling factor is calculated based on a size of the video block.
13 . The apparatus of claim 10 , wherein inputting samples to the upsamping operation include upsampling boundary samples of the video block, wherein at least one of the upsampling boundary samples are copied from the reference boundary samples.
14 . The apparatus of claim 10 , wherein inputting samples to the upsamping operation include upsampling boundary samples of the video block, and the upsampling boundary samples are not computed by averaging the reference boundary samples of the video block.
15 . The apparatus of claim 10 , wherein the matrix vector multiplication operation is followed by a transposing operation prior to the upsampling operation, and wherein the transposing operation converts a block having a width of a first value and a height of a second value to a block having the width of the second value and a height of the first value.
16 . A non-transitory computer-readable storage medium storing instructions that cause a processor to: determine, for a conversion between a video block of a video and a bitstream of the video, that a first intra mode is applied on the video block of the video, wherein process in the first intra mode includes a one-stage boundary downsampling operation, followed by a matrix vector multiplication operation and selectively followed by an upsampling operation to generate prediction samples for the video block of the video; and perform the conversion based on the prediction samples; wherein, in the one-stage boundary downsampling operation, when a downsampled boundary size of the video block is less than a size of the of the video block, reduced samples are generated from reference boundary samples of the video block of the video by the one-stage boundary downsampling operation and are used to generate inputs to the matrix vector multiplication operation, and the reduced boundary samples are generated directly from the reference boundary samples of the video block and a downscaling factor without deriving intermediate samples, wherein the downscaling factor is calculated only once for a horizontal direction and a vertical direction, respectively, wherein the reference boundary samples include left and above reference boundary samples of the video block, wherein a first syntax element indicating whether to apply the first intra mode is included in the bitstream, wherein at least one bin of the first syntax element is context coded, and an increasement value of the context is determined based on characteristics of a neighboring block of the video block, wherein the increasement value of the context is determined further based on a size of the video block, wherein in response to a width-height ratio of the video block being greater than 2, a context with a first predefined increasement value is used for coding the at least one bin of the first syntax element, and wherein in response to a width-height ratio of the video block being smaller than or equal to 2, a context with a second increasement value is used for coding the at least one bin of the first syntax element, wherein the second increasement value is not identical to the first predefined increasement value.
17 . The non-transitory computer-readable storage medium of claim 16 , wherein the left and above reference boundary samples are derived without an intra reference sample filtering process.
18 . The non-transitory computer-readable storage medium of claim 16 , wherein the matrix vector multiplication operation is followed by a transposing operation prior to the upsampling operation, and wherein the transposing operation converts a block having a width of a first value and a height of a second value to a block having the width of the second value and a height of the first value.
19 . A method for storing a bitstream of a video, comprising: determining, that a first intra mode is applied on a video block of the video, wherein process in the first intra mode includes a one-stage boundary downsampling operation, followed by a matrix vector multiplication operation and selectively followed by an upsampling operation to generate prediction samples for the video block of the video; and generating the bitstream based on the determining; wherein, in the one-stage boundary downsampling operation, when a downsampled boundary size of the video block is less than a size of the of the video block, reduced samples are generated from reference boundary samples of the video block of the video by the one-stage boundary downsampling operation and are used to generate inputs to the matrix vector multiplication operation, and the reduced boundary samples are generated directly from the reference boundary samples of the video block and a downscaling factor without deriving intermediate samples, wherein the downscaling factor is calculated only once for a horizontal direction and a vertical direction, respectively, wherein the reference boundary samples include left and above reference boundary samples of the video block, wherein a first syntax element indicating whether to apply the first intra mode is included in the bitstream, wherein at least one bin of the first syntax element is context coded, and an increasement value of the context is determined based on characteristics of a neighboring block of the video block, wherein the increasement value of the context is determined further based on a size of the video block, wherein in response to a width-height ratio of the video block being greater than 2, a context with a first predefined increasement value is used for coding the at least one bin of the first syntax element, and wherein in response to a width-height ratio of the video block being smaller than or equal to 2, a context with a second increasement value is used for coding the at least one bin of the first syntax element, wherein the second increasement value is not identical to the first predefined increasement value.
20 . The method of claim 19 , wherein the matrix vector multiplication operation is followed by a transposing operation prior to the upsampling operation, and wherein the transposing operation converts a block having a width of a first value and a height of a second value to a block having the width of the second value and a height of the first value.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 17/478,511, filed on Sep. 17, 2021, which is a continuation of International Patent Application No. PCT/CN2020/088584, filed on May 5, 2020, which claims the priority to and benefits of International Patent Application No. PCT/CN2019/085399, filed on May 1, 2019, and International Patent Application No. PCT/CN2019/087047, filed on May 15, 2019. All the aforementioned patent applications are hereby incorporated by reference in their entireties. TECHNICAL FIELD This patent document relates to video coding techniques, devices and systems. BACKGROUND In spite of the advances in video compression, digital video still accounts for the largest bandwidth use on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth demand for digital video usage will continue to grow. SUMMARY Devices, systems and methods related to digital video coding, and specifically, matrix-based intra prediction methods for video coding are described. The described methods may be applied to both the existing video coding standards (e.g., High Efficiency Video Coding (HEVC)) and future video coding standards (e.g., Versatile Video Coding (VVC)) or codecs. A first example method for video processing includes performing a conversion between a current video block of a video and a bitstream representation of the current video block using a matrix based intra prediction (MIP) mode in which a prediction block of the current video block is determined by performing, on reference boundary samples located to a left of the current video block and located to a top of the current video block, a boundary downsampling operation, followed by a matrix vector multiplication operation, and selectively followed by an upsampling operation, where instead of reduced boundary samples calculated from the reference boundary samples of the current video block in the boundary downsampling operation, the reference boundary samples are directly used for a prediction process in the upsampling operation. A second example method for video processing includes performing, during a conversion between a current video block of a video and a bitstream representation of the current video block, at least two filtering stages on samples of the current video block in an upsampling operation associated with a matrix based intra prediction (MIP) mode in which a prediction block of the current video block is determined by performing, on previously coded samples of the video, a boundary downsampling operation, followed by a matrix vector multiplication operation, and selectively followed by the upsampling operation, where a first precision of the samples in a first filtering stage of the at least two filtering stages is different from a second precision of the samples in a second filtering stage of the at least two filtering stages; and performing the conversion between the current video block and the bitstream representation of the current video block. A third example video encoding method includes encoding a current video block of a video using a matrix intra prediction (MIP) mode in which a prediction block of the current video block is determined by performing, on previously coded samples of the video, a boundary downsampling operation, followed by a matrix vector multiplication operation, and selectively followed by an upsampling operation; and adding, to a coded representation of the current video block, a syntax element indicative of applicability of the MIP mode to the current video block using arithmetic coding in which a context for the syntax element is derived based on a rule. A fourth example video decoding method includes parsing a coded representation of a video comprising a current video block for a syntax element indicating whether the current video block is coded using a matrix intra prediction (MIP) mode, wherein the syntax element is coded using arithmetic coding in which a context for the syntax element is derived based on a rule; and decoding the coded representation of the current video block to generate a decoded current video block, wherein in a case that the current video block is coded using the MIP mode, the decoding includes determining a prediction block of the current video block by performing, on previously coded samples of the video, a boundary downsampling operation, followed by a matrix vector multiplication operation, and selectively followed by an upsampling operation. In one representative aspect, the disclosed technology may be used to provide a method for video processing. This exemplary method includes determining that a current video block is coded using an affine linear weighted intra prediction (ALWIP) mode, constructing, based on the determining, at least a portion of a most probable mode (MPM) list for the ALWIP mode b