CN-121986496-A - Video data processing apparatus and video data processing method

CN121986496ACN 121986496 ACN121986496 ACN 121986496ACN-121986496-A

Abstract

In order to be able to align a processing sequence of entropy encoding when combining video encoding based on predictive encoding in an encoding unit and video encoding based on a neural network, a video data processing apparatus includes a neural network, quantization means, and entropy encoding means, and includes reordering means for reordering quantized values output by the quantization means.

Inventors

CHONO KEIICHI
IIDA KENTA
Deman Kenta
Moriyoshi Kinji

Assignees

日本电气株式会社

Dates

Publication Date: 20260505
Application Date: 20240819
Priority Date: 20230921

Claims (20)

1. A video data processing apparatus includes a neural network, quantization means, and entropy encoding means, The video data processing apparatus further includes: reordering means for reordering the quantized values output from the quantization means.
2. The video data processing apparatus of claim 1, further comprising: a first control means for preventing a neural network based encoding process from crossing slice boundaries.
3. The video data processing apparatus according to claim 1 or 2, further comprising: And a predictive coding device for performing predictive coding in units of coding units.
4. A video data processing apparatus includes a neural network, an inverse quantization device, and an entropy decoding device, Wherein the video data processing apparatus further comprises: Inverse reordering means for reordering the quantized values output from the entropy decoding means.
5. The video data processing apparatus of claim 4, further comprising: a first control means for preventing a decoding process based on the neural network from crossing a slice boundary.
6. The video data processing apparatus according to claim 4 or 5, further comprising: and predictive decoding means for performing predictive decoding in units of encoding units.
7. A video data processing method for performing a neural network-based encoding process, a quantization process, and entropy encoding, The video data processing method comprises the following steps: The quantized values created by the quantization process are reordered.
8. The video data processing method of claim 7, further comprising: control is performed so that the encoding process does not cross slice boundaries.
9. The video data processing method according to claim 7 or 8, further comprising: A predictive coding process in units of coding units is performed.
10. A video data processing method for performing a neural network-based decoding process, an inverse quantization process, and an entropy decoding process, The video data processing method comprises the following steps: The quantized values obtained by the entropy decoding process are reordered.
11. The video data processing method of claim 10, further comprising: control is performed so that the decoding process does not cross slice boundaries.
12. The video data processing method according to claim 10 or 11, further comprising: a predictive decoding process is performed in units of coding units.
13. A video data processing program for causing a computer to: performing a neural network-based encoding process, a quantization process, and entropy encoding, and A process of reordering quantized values created by the quantization process is performed.
14. The video data processing program of claim 13, Wherein the video data processing program further causes the computer to: control is performed so that the encoding process does not cross slice boundaries.
15. The video data processing program according to claim 13 or 14, Wherein the video data processing program further causes the computer to: A predictive coding process in units of coding units is performed.
16. A video data processing program for causing a computer to: Performing a neural network-based decoding process, an inverse quantization process, and an entropy decoding process, and And a process of reordering quantized values obtained by the entropy decoding process.
17. The video data processing program of claim 16, Wherein the video data processing program further causes the computer to: control is performed so that the decoding process does not cross slice boundaries.
18. The video data processing program according to claim 16 or 17, Wherein the video data processing program further causes the computer to: a predictive decoding process is performed in units of coding units.
19. A storage medium storing a bitstream generated by a video data processing apparatus comprising a neural network, quantization means and entropy encoding means, the video data processing apparatus comprising reordering means for reordering quantized values output from the quantization means.
20. A storage medium storing a bitstream generated by a video data processing method for performing a neural network-based encoding process, a quantization process, and entropy encoding, the video data processing method reordering quantized values created by the quantization process.

Description

Video data processing apparatus and video data processing method Technical Field The present invention relates to a video data processing apparatus and a video data processing method. Background In order to efficiently transmit or record video, a video encoding apparatus that generates an encoded representation (hereinafter referred to as a bitstream) obtained by encoding an input video and a video decoding apparatus that decodes the bitstream to generate a decoded video are used. [ Video coding based on predictive coding in coding Unit ] Examples of video coding scheme standards include h.264/AVC (advanced video coding), h.265/HEVC (high efficiency video coding), and h.266/VVC (versatile video coding), which are standardized by ITU-T SG16 and ISO/IEC/SC 29. As a recent video encoding technique, there is a technique described in NPL 1. In these video coding schemes, video data is managed in a hierarchical structure, and is encoded and decoded. The hierarchical structure includes, for example, a picture constituting video data, a slice obtained by dividing (dividing) the picture (or a tile), a Coding Tree Unit (CTU) obtained by dividing the slice, and a Coding Unit (CU) obtained by dividing the coding tree unit. The input image of the processing target CU is generally encoded with respect to the processing target CU in the past, and is predictively encoded based on a prediction image generated based on the decoded image. That is, a prediction error image obtained by subtracting a prediction image from an input image is encoded and decoded. The prediction encoding includes intra prediction (intra prediction) using a decoded image included in a picture at the same display time as that of the processing target CU and inter prediction (inter prediction) using a decoded image included in a picture at a different display time from that of the processing target CU. The prediction error image is encoded based on frequency transform, quantization, and entropy coding. The prediction error image is decoded based on entropy decoding, inverse quantization, and inverse frequency transform. The frequency transformed value of the quantized prediction error image is called a quantized value. The functions provided by the slices include independent decoding and data segmentation. Independent decoding is a function for decoding without using the decoding result of another slice in the same picture. Data partitioning is a function used to partition a bit stream into any size. Video coding based on neural networks NPL2 discloses a new video coding technique combining automatic encoder, quantization and entropy coding, which is one of the neural networks. The automatic encoder compresses input data into a low-dimensional feature vector so as to include only important features. Thereafter, the automatic encoder generates reconstruction data obtained by reconstructing the low-dimensional feature quantity vector to the original dimension. Fig. 1 is an explanatory diagram showing an algorithm of an automatic encoder. In fig. 1, the circular portion is referred to as a node, and the arrow is referred to as an edge. The process of discarding low-dimensional feature vector (first half) is called encoding. The process of generating the reconstructed data (second half) is called decoding. Learning of the automatic encoder proceeds to minimize reconstruction errors (differences between the input data and the reconstruction data). To obtain meaningful feature quantities, the auto-encoder is designed to add constraints to the encoded structure or to add regularization terms to the loss function of the network. List of references Non-patent literature NPL 1 algorithmic description of enhanced compression model 9 (ECM 9) (Algorithm description of Enhanced Compression Model (ECM 9))), JVET-AD2025, JVET of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29 conference 30, antalya, TR,2023, month 4, 21 to 28. NPL 2:J. Ball' e, v.larpeara and e.p. simocelli, "End-to-End optimized image compression (End-to-End Optimized Image Compression)", published as conference paper in ICLR 2017. Disclosure of Invention Technical problem In the case of combining video encoding based on predictive encoding in the encoding unit with video encoding based on a neural network, there is a problem in that the processing order of entropy encoding of quantized values cannot be integrated. Fig. 2 is an explanatory diagram for explaining the above-described problem. The left side of fig. 2 shows the processing sequence of entropy encoding in video encoding based on a neural network. The right side of fig. 2 shows the processing sequence of entropy encoding in video encoding based on predictive encoding in the encoding unit. In fig. 2, w_ tensor indicates the size of the feature vector in the row direction. H_ tensor indicates the size of the feature vector in the column direction. W_img indicates the width (horizontal size) of the input picture. H_img indicates the height (vertical