EP-4740467-A1 - ENCODING A BLOCK OF A PICTURE

EP4740467A1EP 4740467 A1EP4740467 A1EP 4740467A1EP-4740467-A1

Abstract

There is provided a method for encoding a picture. The method comprises obtaining the picture. The method comprises determining a reference cost of encoding a block of the picture without splitting the block, wherein the block of the picture is associated with a certain region of the picture. The method comprises, based on the determined reference cost, determining whether to evaluate a group of split modes, wherein the evaluation of the group of split modes is for selecting a split mode to use for encoding the block of the picture. The method comprises encoding the picture based on the determination of whether to evaluate the group of split modes.

Inventors

AHMAD, WAQAS
WENNERSTEN, PER
ANDERSSON, KENNETH

Assignees

Telefonaktiebolaget LM Ericsson (publ)

Dates

Publication Date: 20260513
Application Date: 20240521

Claims (20)

1. A method (500) for encoding a picture, the method comprising: obtaining (s502) the picture; determining (s504) a reference cost of encoding a block of the picture without splitting the block, wherein the block of the picture is associated with a certain region of the picture; based on the determined reference cost, determining (s506) whether to evaluate a group of split modes, wherein the evaluation of the group of split modes is for selecting a split mode to use for encoding the block of the picture; and encoding (s508) the picture based on the determination of whether to evaluate the group of split modes.
2. The method of claim 1, wherein, in each split mode included in the group of split modes, a block splitting is performed based on a multi-type tree, MTT, structure.
3. The method of claim 2, wherein the block splitting based on the MTT structure is any one of: a vertical binary splitting, BTV, a horizontal binary splitting, BTH, a vertical ternary splitting, TTV, and a horizontal ternary splitting, TTH.
4. The method of any one of claims 1-3, wherein determining the reference cost of encoding the block of the picture without splitting the block comprises: calculating a cost of encoding the block of the picture without splitting the block when each of a plurality of prediction modes is used for encoding; and selecting one of the calculated costs as the reference cost.
5. The method of claim 4, wherein the selected cost is the minimum cost among the calculated costs.
6. The method of any one of claims 1-3, wherein determining the reference cost of encoding the block of the picture without splitting the block comprises: calculating a cost of encoding the block of the picture without splitting the block when an intra prediction mode is used for encoding; and selecting the calculated cost as the reference cost.
7. The method of any one of claims 4-6, wherein calculating the cost of encoding the block of the picture without splitting the block comprises: (i) encoding the block of the picture without splitting the block, thereby obtaining encoded block of the picture; and (ii) calculating the cost based on one or more of: a number of bits corresponding to the encoded block of the picture, a difference between the block of the picture and the encoded block of the picture, and/or an encoder constant.
8. The method of claim 7, wherein the cost of encoding the block of the picture is calculated based on b X A + d, where b is the number of bits corresponding to the encoded block of the picture, A is the encoder constant, and d is the difference between the block of the picture and the encoded block of the picture.
9. The method of any one of claims 1-8, wherein determining whether to evaluate the group of split modes comprises: comparing the reference cost to a threshold value; and determining whether to evaluate the group of split modes based on a result of the comparison.
10. The method of claim 9, wherein the threshold value is determined based on one or more of: a type of the picture, and/or a value of a quantization parameter used for determining the reference cost of encoding the block of the picture without splitting the block.
11. The method of claim 10, wherein TQ P — (120 — (QP — 22) * 4) X 10 6 where TQ P is the threshold value, and QP is the value of the quantization parameter.
12. The method of any one of claims 9-11, the method comprising: determining that the reference cost is less than the threshold value; based on determining that the reference cost is less than the threshold value, evaluating the group of split modes; and based on the evaluation, selecting from the group of split modes a split mode to use for encoding the picture data, wherein the picture data is encoded using the selected split mode.
13. The method of any one of claims 9-11, the method comprising: determining that the reference cost is greater than the threshold value; and based on determining that the reference cost is greater than the threshold value, encoding the block of the picture (i) without splitting the block or (ii) by splitting the block using a quadtree, QT, structure, wherein the picture data is encoded using the encoded block.
14. The method of any one of claims 9-11, the method comprising: determining whether the reference cost is less than the threshold value; based on determining that the reference cost is less than the threshold value, encoding the block of the picture (i) without splitting the block or (ii) by splitting the block using a quad-tree, QT, structure, wherein the picture is encoded using the encoded block.
15. The method of any one of claims 9-11, the method comprising: determining whether the reference cost is less than the threshold value; and based on determining that the reference cost is greater than the threshold value, evaluating the group of split modes; and based on the evaluation, selecting from the group of split modes a split mode to use for encoding the picture data, wherein the picture is encoded using the encoded block.
16. The method of any one of claims 1-15, wherein determining whether to evaluate the group of split modes is for encoding the block of the picture when the block of the picture corresponds to a luma channel coding unit, CT, belonging to an intra slice having a certain size or an inter slice having a certain size.
17. The method of any one of claims 1-15, wherein determining whether to evaluate the group of split modes is for encoding the block of the picture when the block of the picture corresponds to a chroma channel coding unit, CT, belonging to an intra slice having a certain size or an inter slice having a certain size.
18. The method of any one of claims 1-17, the method comprising: determining whether the block of the picture is obtained as a result of splitting a parent block using a split mode included in the group of split modes, wherein determining whether to evaluate the group of split modes is performed only when the block of the picture is not obtained as a result of splitting the parent block using a split mode included in the group of split modes.
19. The method of claim 18, wherein determining whether to evaluate the group of split modes is performed only when the bock of the picture is obtained as a result of (i) splitting the parent block using a split mode based on a quad tree, QT, structure.
20. The method of any one of claims 1-19, the method comprising: determining whether a split mode that is not included in the group of split modes is evaluated as a candidate split mode for encoding the block of the picture, wherein determining whether to evaluate the group of split modes is performed only after the split mode that is not included in the group of split modes is evaluated as a candidate split mode for encoding the block of the picture.

Description

ENCODING A BLOCK OF A PICTURE TECHNICAL FIELD [0001] This disclosure relates to methods and apparatus for encoding a block of a picture. BACKGROUND [0002] Versatile Video Coding (VVC) and Enhanced Coding Model (ECM) [0003] VVC is a block-based video codec standardized by International Telecommunication Union Telecommunication Standardization Sector (ITU-T) and Moving Picture Experts Group (MPEG). ECM is an exploratory codec which is currently under development. The aim of ECM is to demonstrate and try providing evidence of video coding capabilities beyond VVC. The current ECM version is ECM-9.0. [0004] In ECM, spatial prediction is achieved using intra (I) prediction from within a current picture while temporal prediction is achieved using uni-directional (P) or bi-directional inter (B) prediction on a block level from previously decoded reference pictures. In the encoder, the difference between the original sample data and the predicted sample data, referred to as the residual, is transformed into the frequency domain, quantized, and then entropy coded before being transmitted together with necessary prediction parameters such as prediction mode and motion vectors, which are also entropy coded. The decoder performs entropy decoding, inverse quantization, and inverse transformation to obtain the residual and then adds the residual to intra or inter predicted data to reconstruct a picture. [0005] Video and Picture [0006] A video (a.k.a., “video sequence”) comprises of a series of pictures. In VVC, each picture is identified with a picture order count (POC) value. The POC value also represents the display order of the picture. A picture with a smaller POC value is displayed before another picture with a larger POC value. Each picture may consist of one or more components. [0007] Components [0008] It is common that a picture consists of three components; one luma component Y where the sample values are luma values and two chroma components Cb and Cr, where the sample values are chroma values. Each component can be described as a two-dimensional rectangular array of sample values. It is also common that the dimensions of the chroma components are smaller than the luma components by a factor of two in each dimension. For example, the size of the luma component of an HD picture would be 1920x1080 and the chroma components would each have the dimension of 960x540. Components are sometimes referred to as color components. [0009] Coding Unit and Coding Block [0010] A block is a two-dimensional (2D) matrix of sample values (or “samples” for short). In video coding, each component of a picture is split into blocks and the coded video bitstream consists of a series of coded blocks. It is common in video coding that pictures are split into units that cover a specific area of the picture. [0011] Each unit consists of all blocks from all components that make up that specific area of the picture and each block belongs fully to one unit. The Coding Unit (CU) in VVC is an example of a unit In VVC, the CUs may be split recursively to smaller CUs. The CU at the top level is referred to as the coding tree unit (CTU). A CU usually contains three coding blocks, i.e., one coding block for luma and two coding blocks for chroma. The size of the luma coding block is same as the size of the coding unit. In VVC, the CUs can have size of 4x4 up to 128x128. In current ECM, the CUs can have size of 4x4 up to 256x256. In this disclosure, a coding unit (CU) may refer to a CU itself or may refer to a block (e.g., a luma channel CU, a chroma channel CU, etc.) included in the CU. [0012] Parameter sets, slice headers, and picture headers [0013] VVC specifies three types of parameter sets: the picture parameter set (PPS), the sequence parameter set (SPS), and the video parameter set (VPS). The PPS contains data that is common for a whole picture, the SPS contains data that is common for a coded layer video sequence (CLVS), and the VPS contains data that is common for multiple CLVSs, e.g., data for multiple layers in the bitstream. [0014] The concept of slices divides the picture into independently coded slices, where decoding of one slice in a picture is independent of other slices of the same picture. Each slice has a slice header comprising syntax elements. Decoded slice header values from these syntax elements are used when decoding the slice. In VVC, a coded picture contains a picture header. The picture header contains parameters that are common for all slices of the coded picture. [0015] Intra prediction [0016] In intra prediction, also known as spatial prediction, a current block is predicted using previous decoded blocks within the same picture. The samples from the previously decoded blocks within the same picture are used to predict the samples inside the current block. A picture consisting of only intra-predicted blocks is referred to as an intra picture. [0017] Inter prediction and motion compensation [0018] In inter prediction,