CN-116668723-B - 3D-HEVC depth map intra-frame coding unit division method and device based on convolutional neural network

CN116668723BCN 116668723 BCN116668723 BCN 116668723BCN-116668723-B

Abstract

The invention discloses a method and a device for dividing coding units in a 3D-HEVC depth map frame based on a convolutional neural network, wherein a coding unit dividing prediction model is constructed and trained, a 3D-HEVC encoder is adopted to encode a current block to be encoded, and the current size and current encoding quantization parameters of the coding unit are determined in the encoding process; the method comprises the steps of determining that a speed mode or a performance mode is adopted in the encoding process according to the current size and/or current encoding quantization parameters of an encoding unit, using a predicted value as a dividing result of a current block to be encoded in the speed mode, using a 3D-HEVC encoder to predict the dividing result of the current block to be encoded in the performance mode, judging whether the current size of the encoding unit is larger than a fourth size, if so, adjusting the size of the current block to be encoded to reduce by one level, and repeating the steps until all dividing results of the current block to be encoded are obtained.

Inventors

CHEN JING
Zhou Tingkai
ZENG HUANQIANG
ZHU JIANQING
SHI YIFAN
LIN QI

Assignees

华侨大学

Dates

Publication Date: 20260512
Application Date: 20230516

Claims (9)

1. The 3D-HEVC depth map intra-frame coding unit division method based on the convolutional neural network is characterized by comprising the following steps of: s1, constructing a coding unit division prediction model based on a convolutional neural network and training to obtain a trained coding unit division prediction model; s2, obtaining a depth map sequence to be encoded, dividing the depth map sequence to be encoded to obtain a plurality of current blocks to be encoded under a first size, inputting the current blocks to be encoded into the trained encoding unit division prediction model, wherein the output prediction values are a plurality of tag values of whether encoding units with different sizes in the current blocks to be encoded need to be continuously divided into encoding units with a size reduced by one level in the encoding process, encoding the current blocks to be encoded by adopting a 3D-HEVC encoder, and determining the current sizes and current encoding quantization parameters of the encoding units in the encoding process; S3, determining to adopt a speed mode or a performance mode in the encoding process according to the current size and/or the current encoding quantization parameter of the encoding unit, wherein the determining to adopt the speed mode or the performance mode in the encoding process according to the current size and/or the current encoding quantization parameter of the encoding unit in the step S3 specifically comprises the following steps: Responsive to determining that the current size of the coding unit is either a first size or a second size, or that the current coding quantization parameter is 25 or 40, determining to employ a speed mode in the encoding process; In response to determining that the current size of the coding unit is a third size or a fourth size and the current coding quantization parameter is 30 or 35, determining to employ a performance mode in a coding process, wherein in the speed mode, a predicted value of the trained coding unit partition prediction model is taken as a partition result of the current block to be coded; And S4, judging whether the current size of the coding unit is larger than a fourth size, if so, adjusting the size of the current block to be coded to be reduced by one level, and repeating the steps S3-S4, otherwise, obtaining all the division results of the current block to be coded.
2. The convolutional neural network-based 3D-HEVC depth map intra coding unit partitioning method of claim 1, wherein the different sizes include a first size, a second size, a third size, and a fourth size, the first size is 64 x 64, the second size is 32 x 32, the third size is 16 x 16, the fourth size is 8 x 8, and for sequentially shrinking one level of size, the current coding quantization parameter includes qp= (25, 34), (30, 39), (35, 42), (40, 45).
3. The 3D-HEVC depth map intra coding unit partitioning method based on a convolutional neural network of claim 2, wherein the coding unit partitioning prediction model includes a first branch, a second branch, and a third branch, which correspond to prediction values of the first size, the second size, and the third size, respectively, each of the first branch, the second branch, and the third branch includes a first de-averaging layer, a first pooling layer, a first convolutional layer, a first ReLU active layer, a second convolutional layer, a second ReLU active layer, a third convolutional layer, a third ReLU active layer, and a full-connection layer, which are sequentially connected, the convolutional kernel size of the first convolutional layer is 4×4, the step size is 4, the filling is 0, the convolutional kernel size of the second convolutional layer is 2×2, the filling is 0, the convolutional kernel size of the third convolutional layer is 2×2, the step size of the first pooling layer of the first branch is 4×4, the step size of the first branch is 2×2, the first pooling layer is 1×2, and the first branching pool size is 1×2.
4. The 3D-HEVC depth map intra coding unit partitioning method based on convolutional neural network of claim 3, wherein the training process of the coding unit partitioning prediction model is as follows: Acquiring training data; training the coding unit division prediction model by adopting the training data, wherein in the training process, Assuming that the total number of samples of the training data is T, T represents a single sample therein, and the true value of the sample is 、、 Wherein 、、 Representing the outputs of the first branch, the second branch and the third branch in the coding unit partition prediction model, respectively, l represents the label value of each branch, and therefore 、、 Loss of a single sample The cross entropy of all elements in the sample is accumulated, and the calculation formula is as follows: ; Wherein, the 、、 Representing predicted values of three branches obtained by predicting samples through the coding unit partition prediction model, The cross entropy between the true value and the predicted value is represented, the total loss value of all T samples is represented by L, and the calculation formula is as follows: 。
5. The method for partitioning coding units in a 3D-HEVC depth map frame based on a convolutional neural network according to claim 4, wherein the input of the coding unit partitioning prediction model is a current block to be coded, and the output is a number of tag values indicating whether the coding units of different sizes of the current block to be coded need to be partitioned continuously to the coding units of a size smaller by one level in the coding process, the tag values are flag: ; ; ; Wherein i represents the 1 st tag value of the predicted value, j represents whether the coding unit of the first size is divided, j represents the 2 nd to 5 th tags of the predicted value, 4 coding units of the second size are divided, and k represents the 6 th to 21 st tags of the predicted value, and 16 coding units of the third size are divided.
6. The method for partitioning intra-frame coding units of a 3D-HEVC depth map based on convolutional neural network of claim 4, wherein the acquiring training data specifically includes: Obtaining a depth map sequence, and carrying out data enhancement on the depth map sequence, wherein the data enhancement comprises overturning, mirroring and mirroring after overturning, so as to obtain the depth map sequence after data enhancement; And encoding the depth map sequence after data enhancement under the configuration of full frames by adopting a 3D-HEVC encoder to obtain the division results of the encoding units with different sizes anchored under encoding quantization parameters QP (25, 34), (30, 39), (35, 42), (40, 45) as tag values, and obtaining a plurality of encoding units with a first size by dividing the depth map sequence to be associated with the encoding units to obtain the training data.
7. A convolutional neural network-based 3D-HEVC depth map intra-coding unit partitioning apparatus, comprising: the model construction module is configured to construct a coding unit division prediction model based on a convolutional neural network and train the coding unit division prediction model to obtain a trained coding unit division prediction model; The prediction module is configured to acquire a depth map sequence to be encoded, divide the depth map sequence to be encoded to obtain a plurality of current blocks to be encoded under a first size, input the current blocks to be encoded into the trained coding unit division prediction model, and output prediction values are a plurality of tag values of whether the coding units with different sizes in the current blocks to be encoded need to be continuously divided into the coding units with a level of size reduced or not in the encoding process, and encode the current blocks to be encoded by adopting a 3D-HEVC encoder, so as to determine the current size and the current encoding quantization parameters of the coding units in the encoding process; The mode determining module is configured to determine that a speed mode or a performance mode is adopted in the encoding process according to the current size and/or the current encoding quantization parameter of the encoding unit, and the mode determining module determines that the speed mode or the performance mode is adopted in the encoding process according to the current size and/or the current encoding quantization parameter of the encoding unit, and specifically comprises the following steps: Responsive to determining that the current size of the coding unit is either a first size or a second size, or that the current coding quantization parameter is 25 or 40, determining to employ a speed mode in the encoding process; In response to determining that the current size of the coding unit is a third size or a fourth size and the current coding quantization parameter is 30 or 35, determining to employ a performance mode in a coding process, wherein in the speed mode, a predicted value of the trained coding unit partition prediction model is taken as a partition result of the current block to be coded; And the judging module is configured to judge whether the current size of the coding unit is larger than the fourth size, if so, the size of the current block to be coded is adjusted to be reduced by one level, and the mode determining module is repeatedly executed to the judging module, otherwise, all the dividing results of the current block to be coded are obtained.
8. An electronic device, comprising: one or more processors; storage means for storing one or more programs, When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.

Description

3D-HEVC depth map intra-frame coding unit division method and device based on convolutional neural network Technical Field The invention relates to the field of video coding, in particular to a method and a device for dividing 3D-HEVC depth map intra-frame coding units based on a convolutional neural network. Background With the increasing maturity of multimedia technology, there is an increasing demand for 3D video that more reflects real scene information. However, at the same time, the 3D video needs to code multiple views, resulting in a very large data volume, and in order to effectively compress video data on the premise of ensuring the coding quality, the international video organization proposes a coding standard 3D-HEVC (3D High Efficiency Video Coding) for 3D video, wherein a large number of new coding tools are added, so that high-quality synthesized view results can be obtained on the premise of only coding 3 views, but the addition of the algorithms greatly increases the time complexity of coding. Therefore, how to accelerate the coding process of the depth map on the premise of ensuring the coding quality is a problem to be solved urgently. Traditional fast algorithms are limited by the choice of features, which correspond to different features in the video sequence, and manual extraction of a feature can result in bias in the final result. The existing depth learning-based method can remarkably improve potential performance in the video coding field, but needs to rely on a large amount of abundant training data as a support, in the texture video coding field, more work is involved in combination with depth learning, because a texture map can make pictures into a sequence for coding to obtain abundant training data, but in 3D-HEVC, the coding of the depth map needs to use a configuration file of camera parameters, and needs to rely on official test sequences, so that related researches are less. Disclosure of Invention The technical problems mentioned above are solved. The embodiment of the application aims to provide a method and a device for dividing a 3D-HEVC depth map intra-frame coding unit based on a convolutional neural network, which solve the technical problems mentioned in the background art section. In a first aspect, the present invention provides a method for dividing a 3D-HEVC depth map intra-frame coding unit based on a convolutional neural network, including the steps of: s1, constructing a coding unit division prediction model based on a convolutional neural network and training to obtain a trained coding unit division prediction model; S2, obtaining a depth map sequence to be encoded, dividing the depth map sequence to be encoded to obtain a plurality of current blocks to be encoded under a first size, inputting the current blocks to be encoded into a trained encoding unit division prediction model, outputting prediction values which are a plurality of tag values of whether encoding units with different sizes in the current blocks to be encoded need to be continuously divided into encoding units with a size reduced by one level in the encoding process, encoding the current blocks to be encoded by adopting a 3D-HEVC encoder, and determining the current size and current encoding quantization parameters of the encoding units in the encoding process; S3, determining that a speed mode or a performance mode is adopted in the encoding process according to the current size and/or the current encoding quantization parameter of the encoding unit, wherein in the speed mode, a predicted value of a trained encoding unit division prediction model is used as a division result of a current block to be encoded; S4, judging whether the current size of the coding unit is larger than the fourth size, if so, adjusting the size of the current block to be coded to be reduced by one level, and repeating the steps S3-S4, otherwise, obtaining all division results of the current block to be coded. Preferably, the different sizes include a first size, a second size, a third size, and a fourth size, the first size is 64×64, the second size is 32×32, the third size is 16×16, the fourth size is 8×8, and the current coding quantization parameter includes qp= (25, 34), (30, 39), (35, 42), (40, 45) for sequentially reducing the sizes by one level. Preferably, in step S3, the speed mode or the performance mode is determined to be adopted in the encoding process according to the current size of the encoding unit and/or the current encoding quantization parameter, which specifically includes: responsive to determining that the current size of the coding unit is the first size or the second size, or that the current coding quantization parameter is 25 or 40, determining to employ a speed mode in the encoding process; In response to determining that the current size of the coding unit is the third size or the fourth size and the current coding quantization parameter is 30 or 35, it is determined that a performance