CN-121985133-A - Signaling decoding using partition information

CN121985133ACN 121985133 ACN121985133 ACN 121985133ACN-121985133-A

Abstract

The invention also relates to a method and apparatus for decoding data (for still image or video processing) from a bitstream. Two or more sets of split information elements are obtained from the code stream. Then, each of the two or more sets of split information elements is input into two or more split information processing layers of the plurality of concatenation layers, respectively. In each of the two or more partitioned information processing layers, a respective set of partitioned information is processed. Decoded data for image or video processing is acquired from the segmentation information processed by the plurality of concatenated layers. Thus, the data can be decoded from the code stream in a hierarchical structure in an efficient manner.

Inventors

Sergei yuriyevich ikonen
Mikhail Viacheslavovich Sosovnikov
Alexander alexandrovic calabutov
Timofi Mikhailovich Soloviev
WANG BIAO
Irene Alexandrovna Alsina

Assignees

华为技术有限公司

Dates

Publication Date: 20260505
Application Date: 20201224

Claims (15)

1. A method of decoding data for image or video processing from a bitstream, the method comprising: acquiring two or more split information element sets from the code stream; Inputting each of the two or more sets of split information elements into two or more split information processing layers of a plurality of concatenation layers, respectively; Processing a respective set of partitioned information in each of the two or more partitioned information processing layers; Wherein acquiring decoded data for image or video processing is performed according to the segmentation information of the plurality of cascade layer processes; Wherein resolutions of the division information respectively processed in the two or more division information processing layers are different; Wherein the processing of the partition information in the two or more partition information processing layers includes upsampling, the upsampling of the partition information including nearest neighbor upsampling.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises, The acquisition of the set of segmentation information elements is based on segmentation information processed by at least one segmentation information processing layer of the plurality of concatenation layers.
3. A method according to claim 1 or 2, characterized in that, The inputting of the set of segmentation information elements is based on processed segmentation information output by at least one of the plurality of concatenation layers.
4. The method of claim 1, wherein the upsampling of the segmentation information comprises a transpose convolution.
5. The method of claim 1, wherein the step of determining the position of the substrate comprises, For each of the N split information handling layers j of the plurality of concatenation layers, The input includes inputting initial division information from the code stream if j=1, otherwise inputting a division message processed through a (j-1) -th division information processing layer; The processed segmentation information is output.
6. The method of claim 5, wherein the processing of the input segmentation information by each of the N segmentation information processing layers j < N further comprises: and analyzing the segmentation information element from the code stream, and associating the analyzed segmentation information element with segmentation information output by a previous layer, wherein the position of the analyzed segmentation information element in the associated segmentation information is determined according to the segmentation information output by the previous layer.
7. The method of claim 6, wherein the step of providing the first layer comprises, The number of split information elements parsed from the code stream is determined based on the split information output from the previous layer.
8. The method according to claim 6 or 7, wherein, The parsed segmentation information element is represented by a set of binarized flags.
9. The method of any of claims 1,2, 4 to 7, wherein obtaining decoded data for image or video processing comprises determining at least one of the following from segmentation information: intra or inter prediction modes; An image reference index; single-reference or multi-reference prediction; The presence or absence of prediction residual information; Quantization step length; Motion information prediction type; Motion vector length; motion vector resolution; Motion vector prediction index; Motion vector difference magnitude; motion vector differential resolution; A motion interpolation filter; In-loop filter parameters; post-filter parameters.
10. The method of any one of claims 1,2, 4 to 7, further comprising: Acquiring a feature map element set from the code stream, and respectively inputting the feature map element set into feature map processing layers in the plurality of cascade layers according to segmentation information processed by a segmentation information processing layer; The decoded data for image or video processing is acquired from a feature map of the plurality of cascaded layer processes.
11. The method of claim 10, wherein at least one of the plurality of concatenation layers is a segmentation information processing layer and a feature map processing layer.
12. The method of claim 10, wherein each of the plurality of concatenation layers is a segmentation information processing layer or a feature map processing layer.
13. A computer program product stored in a non-transitory medium, the computer program product, when executed on one or more processors, performing a method of decoding data for image or video processing from a bitstream, the method comprising: acquiring two or more split information element sets from the code stream; Inputting each of the two or more sets of split information elements into two or more split information processing layers of a plurality of concatenation layers, respectively; Processing a respective set of partitioned information in each of the two or more partitioned information processing layers; Wherein acquiring decoded data for image or video processing is performed according to the segmentation information of the plurality of cascade layer processes; Wherein resolutions of the division information respectively processed in the two or more division information processing layers are different; Wherein the processing of the partition information in the two or more partition information processing layers includes upsampling, the upsampling of the partition information including nearest neighbor upsampling.
14. An apparatus for decoding an image or video, comprising processing circuitry for performing a method of decoding data for image or video processing from a bitstream, the method comprising: acquiring two or more split information element sets from the code stream; Inputting each of the two or more sets of split information elements into two or more split information processing layers of a plurality of concatenation layers, respectively; Processing a respective set of partitioned information in each of the two or more partitioned information processing layers; Wherein acquiring decoded data for image or video processing is performed according to the segmentation information of the plurality of cascade layer processes; Wherein resolutions of the division information respectively processed in the two or more division information processing layers are different; Wherein the processing of the partition information in the two or more partition information processing layers includes upsampling, the upsampling of the partition information including nearest neighbor upsampling.
15. An apparatus for decoding data for image or video processing from a bitstream, the apparatus comprising: an acquisition unit configured to acquire two or more split information element sets from the code stream; an input unit for inputting each of two or more divided information element sets into two or more divided information processing layers of the plurality of concatenation layers, respectively; A processing unit configured to process, in each of the two or more divided information processing layers, a respective divided information set; a decoded data acquisition unit configured to acquire the decoded data for image or video processing based on the division information processed in the plurality of cascade layers; Wherein resolutions of the division information respectively processed in the two or more division information processing layers are different; Wherein the processing of the partition information in the two or more partition information processing layers includes upsampling, the upsampling of the partition information including nearest neighbor upsampling.

Description

Signaling decoding using partition information The present application is a divisional application, the application number of the original application is 202080108181.3, the original application date is 12 months 24 days 2020, and the whole content of the original application is incorporated by reference into the present application. Technical Field Embodiments of the present invention relate generally to the field of decoding data for image or video processing from a bitstream using multiple processing layers. In particular, some embodiments relate to methods and apparatus for such decoding. Background Hybrid image and video codecs have been used for decades to compress image and video data. In such codecs, a signal is typically encoded block by predicting a block and by coding the differences between the original block and its predicted block. In particular, such coding may include transforming, quantizing, and generating a bitstream, typically including some entropy coding. Typically, the three components of the hybrid coding method, transform, quantization and entropy coding, are optimized separately. Modern video compression standards, such as High-EFFICIENCY VIDEO CODING, HEVC, common video coding (VERSATILE VIDEO CODING, VVC) and base video coding (ESSENTIAL VIDEO CODING, EVC), also use transform representations to code the predicted residual signal. Recently, machine learning has been applied to image and video coding. In general, machine learning can be applied to image and video coding in a variety of different ways. For example, some end-to-end optimized image or video coding schemes have been discussed. In addition, machine learning has been used to determine or optimize certain portions of end-to-end coding, such as selection or compression of prediction parameters, and the like. Common to these applications is that some profile data is generated that will be sent between the encoder and the decoder. The efficient structure of the code stream can greatly contribute to a reduction in the number of bits encoding the image/video source signal. Neural networks typically include two or more layers. The feature map is the output of the layer. In a neural network divided between devices (e.g., between an encoder and a decoder, between a device and a cloud, or between different devices), feature maps at the output of the division location (e.g., a first device) are compressed and transmitted to the remaining layers of the neural network (e.g., to a second device). Further improvements in encoding and decoding may be required using a trained network architecture. Disclosure of Invention Some embodiments of the present invention provide methods and apparatus for decoding images and adapting some scalability to desired parameters and content in an efficient manner. The above and other objects are achieved by the subject matter as claimed in the independent claims. Other implementations are apparent from the dependent claims, the description and the drawings. According to one aspect there is provided a method for decoding data for image or video processing from a bitstream, wherein the method comprises obtaining two or more sets of segmentation information elements from the bitstream, inputting each set of segmentation information elements of the two or more sets of segmentation information elements into two or more segmentation information processing layers of a plurality of concatenated layers, respectively, processing the respective sets of segmentation information in each of the two or more segmentation information processing layers, wherein obtaining the decoded data for image or video processing is based on the segmentation information processed in the plurality of concatenated layers. This approach may improve efficiency because it is capable of decoding data in various partitions that may be based on layer configuration in a hierarchy. Providing the partitioning may take into account the characteristics of the decoded data. For example, the obtaining the set of segmentation information elements is based on segmentation information processed by at least one segmentation information processing layer of the plurality of concatenation layers. In some exemplary embodiments, the inputting the set of split information elements is based on processed split information output by at least one of the plurality of concatenation layers. The cascade segmentation information processing can efficiently parse the segmentation information. For example, resolutions of the division information respectively processed in the two or more division information processing layers are different. In some embodiments and examples, the processing of the segmentation information in the two or more segmentation information processing layers includes upsampling. The hierarchical structure of the partition information may provide a small amount of side information to be inserted into the code stream, thereby improving efficiency and/