WO-2026016986-A9 - METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING

WO2026016986A9WO 2026016986 A9WO2026016986 A9WO 2026016986A9WO-2026016986-A9

Abstract

Embodiments of the present disclosure provide a solution for video processing. In a method for video processing, for a conversion between a current video block of a video and a bitstream of the video, a plurality of predictions for the current video block are determined. A regression blending is applied to the plurality of predictions to obtain a blended prediction. The conversion is performed based on the blended prediction.

Inventors

DENG, Zhipin
ZHANG, KAI
ZHANG, LI
ZHAO, LEI

Assignees

Douyin Vision Co., Ltd.
BYTEDANCE INC.

Dates

Publication Date: 20260507
Application Date: 20250711
Priority Date: 20240713

Claims (20)

A method for video processing, comprising: determining, for a conversion between a current video block of a video and a bitstream of the video, a plurality of predictions for the current video block; applying a regression blending to the plurality of predictions to obtain a blended prediction; and performing the conversion based on the blended prediction.
The method of claim 1, wherein the plurality of predictions comprise an intra prediction and an intra block copy (IBC) prediction for the current video block.
The method of claim 2, wherein the blended prediction comprises at least one of: a combined inter and intra prediction (CIIP) -intra-IBC prediction, a geometric partitioning mode (GPM) -intra-IBC prediction, or a spatial GPM (SGPM) -intra-IBC prediction.
The method of claim 1, wherein the plurality of predictions comprise an intra prediction and an inter prediction.
The method of claim 4, wherein the blended prediction comprises at least one of: a CIIP-intra-inter prediction, a CIIP-position dependent intra prediction combination (PDPC) -inter Merge prediction, a CIIP-template-based intra mode derivation (TIMD) -template matching (TM) Merge prediction, a CIIP-intra-affine prediction, a CIIP-intra-subblock-based temporal motion vector prediction (SbTMVP) prediction, or a GPM-intra-inter prediction.
The method of claim 1, wherein the plurality of predictions comprise an IBC prediction and an inter prediction.
The method of claim 6, wherein the blended prediction comprises at least one of: a CIIP-IBC-inter prediction, or a GPM-IBC-inter prediction.
The method of claim 1, wherein the plurality of predictions comprise at least two IBC predictions.
The method of claim 8, wherein the blended prediction comprises at least one of: a bi-IBC prediction, or a GPM-IBC-IBC prediction.
The method of claim 1, wherein the plurality of predictions comprise at least two intra predictions.
The method of claim 10, wherein the blended prediction comprises at least one of: an intra template matching prediction (intraTMP) fusion, a decoder-side intra mode derivation (DIMD) fusion, a template-based intra mode derivation (TIMD) fusion, an intra luma fusion, an intra chroma fusion, or an SGPM intra-intra prediction.
The method of claim 1, wherein the plurality of predictions comprise at least two inter predictions.
The method of claim 12, wherein the blended prediction comprises at least one of: a bi-predictive inter prediction, a bi-prediction with coding unit level weight (BCW) , an GPM-inter-inter prediction, or a multi-hypothesis prediction (MHP) .
The method of claim 1, wherein the plurality of predictions comprise a cross-component prediction (CCP) and at least one of: an intra prediction, an inter prediction, or an IBC prediction.
The method of claim 14, wherein the blended prediction comprises at least one of: an inter convolutional cross-component model (CCCM) blending inter and CCP, an inter CCM Merge blending intra and CCP, an intra CCCM fusion blending intra and CCP, or an intra chroma fusion blending intra and CCP.
The method of claim 1, wherein the plurality of predictions comprise a CCP and a further CCP.
The method of claim 16, wherein the blended prediction comprises an intra CCCM fusion blending the CCP and the further CCP.
The method of any of claims 1 to 17, wherein applying the regression blending to the plurality of predictions comprises: determining a plurality of weights based on a model, the model being linear or non-linear; and applying the regression blending to the plurality of predictions based on the plurality of weights.
The method of claim 18, wherein an input of the model comprises at least one of: sample values of reference samples, or sample coordinators of the reference samples, the reference samples being determined based on at least one of: a motion vector or a block vector of the current video block, and wherein an output of the model comprises the plurality of weights.
The method of claim 18, wherein an input of the model comprises at least one of: sample values of candidate samples, or sample coordinators of the candidate samples, the candidate samples comprising at least one of: adjacent neighboring reconstructed samples, non-adjacent neighboring reconstructed samples, adjacent neighboring prediction samples, non-adjacent neighboring prediction samples, adjacent neighboring residual samples of the current video block, or non-adjacent neighboring residual samples of the current video block, and wherein an output of the model comprises the plurality of weights.

Description

METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING FIELDS Embodiments of the present disclosure relate generally to video processing techniques, and more particularly, to prediction fusion for video coding. BACKGROUND In nowadays, digital video capabilities are being applied in various aspects of peoples’ lives. Multiple types of video compression technologies, such as motion picture expert group (MPEG) -2, MPEG-4, international telecommunication union -telecommunication standardization sector (ITU-T) H.263, ITU-T H. 264/MPEG-4 Part 10 advanced video coding (AVC) , ITU-T H. 265 high efficiency video coding (HEVC) standard, versatile video coding (VVC) standard, have been proposed for video encoding/decoding. However, the coding efficiency of video coding techniques is generally expected to be further improved. SUMMARY Embodiments of the present disclosure provide a solution for video processing. In a first aspect, a method for video processing is proposed. The method comprises: determining, for a conversion between a current video block of a video and a bitstream of the video, a plurality of predictions for the current video block; applying a regression blending to the plurality of predictions to obtain a blended prediction; and performing the conversion based on the blended prediction. The method in accordance with the first aspect of the present disclosure enables applying a regression blending to predictions to obtain a final prediction. The coding performance can thus be improved. In a second aspect, another method for video processing is proposed. The method comprises: determining, for a conversion between a current video block of a video and a bitstream of the video, a plurality of predictions for the current video block; updating a plurality of weights based on a value range; blending the plurality of predictions based on the plurality of updated weights to obtain a blended prediction; and performing the conversion based on the blended prediction. The method in accordance with the second aspect of the present disclosure updates the weights within a value range. The coding performance can thus be improved. In a third aspect, another method for video processing is proposed. The method comprises: determining, for a conversion between a current video block of a video and a bitstream of the video, that a universal merge mode is enabled for the current video block; determining a final prediction for the current video block by blending a plurality of prediction candidates of the current video block; and performing the conversion based on the final prediction. The method in accordance with the third aspect of the present disclosure enables fusing prediction candidates for universal merge mode. The coding performance can thus be improved. In a fourth aspect, an apparatus for video processing is proposed. The apparatus comprises a processor and a non-transitory memory with instructions thereon. The instructions upon execution by the processor, cause the processor to perform a method in accordance with the first, second, or third aspect of the present disclosure. In a fifth aspect, a non-transitory computer-readable storage medium is proposed. The non-transitory computer-readable storage medium stores instructions that cause a processor to perform a method in accordance with the first, second, or third aspect of the present disclosure. In a sixth aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing. The method comprises: determining a plurality of predictions for a current video block of the video; applying a regression blending to the plurality of predictions to obtain a blended prediction; and generating the bitstream based on the blended prediction. In a seventh aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing. The method comprises: determining a plurality of predictions for a current video block of the video; updating a plurality of weights based on a value range; blending the plurality of predictions based on the plurality of updated weights to obtain a blended prediction; and generating the bitstream based on the blended prediction. In an eighth aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing. The method comprises: determining that a universal merge mode is enabled for a current video block of the video; determining a final prediction for the current video block by blending a plurality of prediction candidates of the current video block; and g