JP-2026076350-A - Picture/video coding that supports variable resolution and/or efficiently handles area-based packing.

JP2026076350AJP 2026076350 AJP2026076350 AJP 2026076350AJP-2026076350-A

Abstract

[Problem] To provide a video encoding/decoding method, electronic device, and non-temporary computer-readable medium that support fluctuating resolution and/or efficiently handle region-based packing. [Solution] The video decoding method includes decoding a flag from the bitstream of encoded video data that indicates whether reference picture resampling is possible and whether the reference picture has parameters that can differ from the corresponding parameters of the current picture; determining, based on the flag, that reference picture resampling is possible and that the parameters of the reference picture are different from the corresponding parameters of the current picture; resampling a portion of the reference picture to generate a resampled portion of the reference picture; and predicting the current picture using the resampled portion of the reference picture. [Selection Diagram] Figure 4a

Inventors

ローベルト・スクピン
コルネリウス・ヘルゲ
ヴァレリー・ゲオルク
トーマス・シエル
ヤゴ・サンチェス・デ・ラ・フエンテ
カルステン・ズーリング
トーマス・ヴィーガント

Assignees

フラウンホファーゲセルシャフトツールフェールデルンクダーアンゲヴァンテンフォルシュンクエー．ファオ．

Dates

Publication Date: 20260511
Application Date: 20260218
Priority Date: 20180220

Claims (20)

A method for video decoding, wherein the method is Decoding from a bitstream of encoded video data a flag indicating whether reference picture resampling is possible and whether the reference picture has parameters that may differ from the corresponding parameters of the current picture, wherein the portion of the current picture refers to the portion of the reference picture, and the parameters include at least one of the height of the picture in the luma sample and the width of the picture in the luma sample. Based on the aforementioned flags, it is determined that reference picture resampling is possible, that the parameters of the reference picture may differ from the corresponding parameters of the current picture, and that it is permissible for the spatial resolution to change within the encoded video sequence. To generate the resampled portion of the aforementioned reference picture, the portion of the aforementioned reference picture is resampled, This includes predicting the current picture using the resampled portion of the reference picture, A method in which the distance between a random access frame and the current picture is greater than the distance between the reference picture and the current picture.
Determining that the parameter of the reference picture is different from the corresponding parameter of the current picture, In accordance with the determination that the parameters of the reference picture differ from the corresponding parameters of the current picture, a portion of the reference picture is resampled. The method according to claim 1, further comprising:
The aforementioned flag is the first flag, The method further includes decoding a second flag from the bitstream indicating whether the spatial resolution changes. The method according to claim 1.
To determine whether the size of the reference picture and the size of the current picture are different, Depending on whether the size of the reference picture and the size of the current picture are different, a ratio representing the size of the reference picture and the size of the current picture is derived. Based on the ratio, the portion of the reference picture is resampled, The method according to claim 1, further comprising:
Resampling the portion of the aforementioned reference picture is When the spatial resolution of the current picture is greater than the spatial resolution of the reference picture, the portion of the reference picture is upsampled to match the spatial resolution of the current picture. When the spatial resolution of the current picture is smaller than the spatial resolution of the reference picture, the portion of the reference picture is downsampled to match the spatial resolution of the current picture. The method according to claim 1, including the method described in claim 1.
Determine that the temporal distance between the reference picture and the current picture is less than the random access distance, so that no random access points are placed between the reference picture and the current picture. The method according to claim 1, further comprising:
An electronic device for video decoding, wherein the electronic device is Decoding from a bitstream of encoded video data a flag indicating whether reference picture resampling is possible and whether the reference picture has parameters that may differ from the corresponding parameters of the current picture, wherein the portion of the current picture refers to the portion of the reference picture, and the parameters include at least one of the height of the picture in the luma sample and the width of the picture in the luma sample. Based on the aforementioned flags, it is determined that reference picture resampling is possible, that the parameters of the reference picture may differ from the corresponding parameters of the current picture, and that it is permissible for the spatial resolution to change within the encoded video sequence. To generate the resampled portion of the aforementioned reference picture, the portion of the aforementioned reference picture is resampled, Using the resampled portion of the reference picture, predict the current picture, A processor configured to perform the following: An electronic device in which the distance between a random access frame and the current picture is greater than the distance between the reference picture and the current picture.
The aforementioned processor, Determining that the parameter of the reference picture is different from the corresponding parameter of the current picture, In accordance with the determination that the parameters of the reference picture differ from the corresponding parameters of the current picture, a portion of the reference picture is resampled. The electronic device according to claim 7, further configured to perform the following:
The aforementioned flag is the first flag, The processor is further configured to decode a second flag indicating whether the spatial resolution changes. The electronic device according to claim 7.
The aforementioned processor, To determine whether the size of the reference picture and the size of the current picture are different, Depending on whether the size of the reference picture and the size of the current picture are different, a ratio representing the size of the reference picture and the size of the current picture is derived. Based on the ratio, the portion of the reference picture is resampled, The electronic device according to claim 7, further configured to perform the following:
In order to resample the portion of the reference picture, the processor: When the spatial resolution of the current picture is greater than the spatial resolution of the reference picture, the portion of the reference picture is upsampled to match the spatial resolution of the current picture. When the spatial resolution of the current picture is smaller than the spatial resolution of the reference picture, the portion of the reference picture is downsampled to match the spatial resolution of the current picture. The electronic device according to claim 7, further configured to perform the following:
The aforementioned processor, Determine that the temporal distance between the reference picture and the current picture is less than the random access distance, so that no random access points are placed between the reference picture and the current picture. The electronic device according to claim 7, further configured to perform the following:
When executed, at least one processor of the electronic device, Decoding from a bitstream of encoded video data a flag indicating whether reference picture resampling is possible and whether the reference picture has parameters that may differ from the corresponding parameters of the current picture, wherein the portion of the current picture refers to the portion of the reference picture, and the parameters include at least one of the height of the picture in the luma sample and the width of the picture in the luma sample. Based on the aforementioned flag, it is determined that reference picture resampling is possible, that the parameters of the reference picture may differ from the corresponding parameters of the current picture, and that it is permissible for the spatial resolution to change within the encoded video sequence. To generate the resampled portion of the aforementioned reference picture, the portion of the aforementioned reference picture is resampled, Using the resampled portion of the reference picture, predict the current picture, Store the command to perform the action, A non-temporary computer-readable medium in which the distance between a random access frame and the current picture is greater than the distance between the reference picture and the current picture.
A method for video encoding, wherein the method is The ability to resample reference pictures, The reference picture has parameters that may differ from the corresponding parameters of the current picture, wherein the portion of the current picture refers to the portion of the reference picture, and the parameters include at least one of the height of the picture in the luma sample and the width of the picture in the luma sample. It is permissible for spatial resolution to change within the encoded video sequence, To generate a flag having a value indicating, To generate the resampled portion of the aforementioned reference picture, the portion of the aforementioned reference picture is resampled, The process includes encoding the current picture using the resampled portion of the reference picture, A method in which the distance between a random access frame and the current picture is greater than the distance between the reference picture and the current picture.
Determine that the temporal distance between the reference picture and the current picture is less than the random access distance, so that no random access points are placed between the reference picture and the current picture. The method according to claim 14, further comprising:
The aforementioned flag is the first flag, The method further includes generating a second flag indicating whether each picture included in the sequence parameter set is allowed to be different with respect to its respective spatial resolution. The method according to claim 14.
An electronic device for video coding, wherein the electronic device is The ability to resample a reference picture, The reference picture has parameters that may differ from the corresponding parameters of the current picture, wherein the portion of the current picture refers to the portion of the reference picture, and the parameters include at least one of the height of the picture in the luma sample and the width of the picture in the luma sample. It is permissible for spatial resolution to change within the encoded video sequence, To generate a flag having a value indicating, To generate the resampled portion of the aforementioned reference picture, the portion of the aforementioned reference picture is resampled, Encoding the current picture using the resampled portion of the reference picture, A processor configured to perform the following: An electronic device in which the distance between a random access frame and the current picture is greater than the distance between the reference picture and the current picture.
The aforementioned processor, Determine that the temporal distance between the reference picture and the current picture is less than the random access distance, so that no random access points are placed between the reference picture and the current picture. The electronic device according to claim 17, further configured to perform the following:
The aforementioned flag is the first flag, The processor is further configured to generate a second flag indicating whether each picture included in the sequence parameter set is allowed to be different with respect to its respective spatial resolution. The electronic device according to claim 17.
When executed, at least one processor of the electronic device, The ability to resample a reference picture, The reference picture has parameters that may differ from the corresponding parameters of the current picture, wherein the portion of the current picture refers to the portion of the reference picture, and the parameters include at least one of the height of the picture in the luma sample and the width of the picture in the luma sample. It is permissible for spatial resolution to change within the encoded video sequence, To generate a flag having a value indicating, To generate the resampled portion of the aforementioned reference picture, the portion of the aforementioned reference picture is resampled, Encoding the current picture using the resampled portion of the reference picture, Store the command to perform the action, A non-temporary computer-readable medium in which the distance between a random access frame and the current picture is greater than the distance between the reference picture and the current picture.

Description

This application relates to video/picture coding with improved coding efficiency, including support for varying resolutions and/or efficient handling of region-based packing. The single-layer-based version of HEVC does not allow changes in picture resolution within an encoded video sequence. This is only permitted in a Random Access Point (RAP) (for example, in an IDR RAP that completely resets the decoded picture buffer), where resolution changes can occur at the start of a new encoded video sequence, including flushing the decoded picture buffer (DPB). However, flushing the DPB significantly reduces achievable coding efficiency due to the interruption of leveraging previously coded references. Therefore, there is a need for improved video/picture codecs that can more efficiently utilize resolution variations to improve coding efficiency. There are also tools that enable the transport of scenes, such as panoramic scenes, via coded pictures or coded videos, a technique called region-based packing, where the picture/video codec does not care how this scene is mapped onto one or more pictures on a region-by-region basis. Mapping is merely pre-processing on the encoder side and post-processing on the decoder side, but the codec operates on the packed one or more pictures without awareness of region-based packing. MPEG OMAF provides a framework for such transport of scenes via packed one or more pictures, for example. For instance, scene divisions/mappings defining the divisions of coded one or more pictures to picture regions, each mapped onto its respective scene region, are signaled as secondary information via SEI messages, in order to control post-processing on the decoder side to remap the coded one or more pictures onto scene regions. Such a solution would be efficient in terms of reusing existing coding tools, i.e., existing picture/video codecs, but it would be advantageous to have a temporary concept that allows for more efficient handling of region-based packing. This is a schematic block diagram of a video encoder according to one embodiment, which uses video coding that depends on the temporal variation of the target spatial resolution.This is a block diagram of a decoder compatible with the video encoder shown in Figure 1a, according to one embodiment.This is a schematic diagram showing the difference in target spatial resolution according to the embodiments of Figure 1a and Figure 1b.Figure 1b is a schematic diagram of the reconstructed video output by the video decoder shown.This is a schematic diagram of coded video to illustrate the time scale at which the target spatial resolution varies compared to the random access distance/pitch.Figure 1b is a schematic diagram of a decoder that operates according to different modified forms, where the decoder can operate in different modified forms with varying resolutions in which pictures are buffered within the DPB.Figure 1b is a schematic diagram of a decoder that operates according to different modified forms, where the decoder can operate in different modified forms with varying resolutions in which pictures are buffered within the DPB.This is a block diagram of an encoder that forms one possible implementation of the encoder in Figure 1a and operates according to a different variant in which the picture is buffered in a decoded picture buffer at the reference spatial resolution, thereby fitting the decoder in Figure 4b, but with a different side or region where predictive correction is performed, i.e., at a fluctuating target spatial resolution or reference spatial resolution.This is a block diagram of an encoder that forms one possible implementation of the encoder in Figure 1a and operates according to a different variant in which the picture is buffered in a decoded picture buffer at the reference spatial resolution, thereby fitting the decoder in Figure 4b, but with a different side or region where predictive correction is performed, i.e., at a fluctuating target spatial resolution or reference spatial resolution.This figure shows possible syntaxes for signaling a fluctuating target spatial resolution.This figure shows an example of SPS syntax that includes a flag to toggle on and off the possibility of varying the target spatial resolution.This figure shows a syntax example for SPS for sending different target spatial resolution settings at a larger scale, such as for a sequence of pictures, allowing the picture to reference one of these settings in the data stream by indexing.Figure 8 shows an example of slice segment header syntax, where default target spatial resolution settings, such as those signaled, are referenced by index syntax elements.This figure shows an example syntax for signaling the output aspect ratio to a certain resolution, such as a reference spatial resolution or one of the target spatial resolutions.This figure shows an example of syntax for sending output aspect ratios for all possible target spatial resolution settings.