JP-2026514486-A - Picture encoding method and apparatus, and picture decoding method and apparatus

JP2026514486AJP 2026514486 AJP2026514486 AJP 2026514486AJP-2026514486-A

Abstract

Picture encoding methods and apparatuses, as well as picture decoding methods and apparatuses, are provided, offering encoding and decoding schemes for the fields of artificial intelligence and picture compression, thereby meeting the requirements of different application scenarios. According to the encoding and decoding methods provided herein, the encoder and decoder network used may be determined based on profile information (or identification information). That is, the codec may select corresponding profile information based on the capabilities of the decoding device in order to select or indicate different encoder and decoder networks. In this way, the network may have the ability to adapt not only to terminals with low computing power, but also to terminals with higher computing power.

Inventors

ユイ，ドーァチュエン
ジャオ，イン
アルシナ，エレナアレクサンドロブナ

Assignees

華為技術有限公司

Dates

Publication Date: 20260511
Application Date: 20240423
Priority Date: 20230424

Claims (20)

Picture encoding method: The step involves encoding identification information indicating the decoder network to be used into a bitstream, The identification information is a first value indicating that the decoder network used to decode the bitstream to obtain the picture to be processed is the first decoder network, or the identification information is a second value indicating that the decoder network used to decode the bitstream to obtain the picture to be processed is the second decoder network, and the processing resources required by the first decoder network are higher than those required by the second decoder network; A method comprising the step of transmitting the bitstream.
The method according to claim 1, wherein the first decoder network and the second decoder network are entirely different decoder networks, or the first decoder network and the second decoder network share a portion of a subnet, or the second decoder network is a subnet of the first decoder network.
The method in question is: The step of obtaining the aforementioned identification information; The process includes the steps of: when the identification information is the first value, encoding residual information obtained by encoding the picture to be processed based on the first encoder network into the bitstream; or when the identification information is the second value, encoding residual information obtained by encoding the picture to be processed based on the second encoder network into the bitstream, wherein the processing resources required by the first encoder network are higher than those required by the second encoder network. The method according to claim 1 or 2.
The method according to claim 3, wherein the first encoder network and the second encoder network are two different encoder networks, or the first encoder network and the second encoder network share a portion of a subnet, or the second encoder network is a subnet of the first encoder network.
The first encoder network includes a first feature extraction network, an autoregressive network, a side information extraction network, and a probability estimation network. The residual information obtained by encoding the picture to be processed using the first encoder network is: The step of extracting a three-dimensional feature map of the picture to be processed by using the first feature extraction network, wherein the three-dimensional feature map includes multiple feature elements; The steps include: extracting side information of the feature element to be encoded from the 3D feature map using the side information extraction network described above; Based on the aforementioned side information, the step of estimating the first probability distribution mean of the feature element to be encoded by using the probability estimation network; The steps include: inputting the encoded feature elements and the first probability distribution mean into the autoregressive network to obtain a second probability distribution mean of the feature elements to be encoded; The method according to claim 4, further comprising the step of obtaining residual information of the feature element to be encoded based on the feature element to be encoded and the second probability distribution mean of the feature element to be encoded.
The second encoder network includes a second feature extraction network, a side information extraction network, and a probability estimation network. The residual information obtained by encoding the picture to be processed using the second encoder network is: The step of extracting a three-dimensional feature map of the picture to be processed by using the second feature extraction network, wherein the three-dimensional feature map includes a plurality of feature elements; The steps include: extracting side information of the feature element to be encoded from the 3D feature map using the side information extraction network described above; Based on the aforementioned side information, the step of estimating the mean of the probability distribution of the feature element to be encoded by using the aforementioned probability estimation network; The method according to claim 5, further comprising the step of obtaining residual information of the feature element to be encoded based on the feature element to be encoded and the mean of the probability distribution.
The method according to claim 6, wherein the second feature extraction network is a subnet of the first feature extraction network, or the second feature extraction network and the first feature extraction network are two completely different subnets.
The method in question is: The step further includes encoding the side information into the bitstream, The method according to any one of claims 5 to 7.
The method according to any one of claims 1 to 8, wherein the identification information is located in the header of the bitstream.
Picture decoding method: The stage of receiving the bitstream; The steps include: decoding the bitstream to obtain identification information indicating the decoder network to be used; A method comprising the steps of: when the identification information is a first value, decoding the bitstream to obtain a picture to be processed by using a first decoder network; or when the identification information is a second value, decoding the bitstream to obtain a picture to be processed by using a second decoder network, wherein the processing resources required by the first decoder network are higher than the processing resources required by the second decoder network.
The method according to claim 10, wherein the first decoder network and the second decoder network are completely different decoder networks, or the first decoder network and the second decoder network share a portion of a subnet, or the second decoder network is a subnet of the first decoder network.
The first decoder network includes an entropy decoder network, a probability estimation network, an autoregressive network, and a first picture reconstruction network. The step of decoding the bitstream using the first decoder network to obtain the picture to be processed is: The steps include: decoding the bitstream using the entropy decoder network to obtain side information of a three-dimensional feature map of the picture to be processed, wherein the three-dimensional feature map includes a plurality of feature elements; Based on the aforementioned side information, the step of estimating the first probability distribution mean of the feature element to be decoded by using the aforementioned probability estimation network; The steps include: determining a second probability distribution mean of the decoded feature element by using the autoregressive network based on the first probability distribution mean and the decoded feature element; The steps include: decoding the bitstream using the entropy decoder network based on the second probability distribution mean to obtain residual information of the feature element to be decoded, and obtaining the feature element to be decoded based on the residual information and the second probability distribution mean; The method according to claim 10 or 11, further comprising the step of restoring the picture to be processed by using the first picture restoration network based on the three-dimensional feature map obtained through decoding.
The second decoder network includes the entropy decoder network, the probability estimation network, and the second picture reconstruction network. The step of decoding the bitstream using the second decoder network to obtain the picture to be processed is: The steps include: decoding the bitstream using the entropy decoder network to obtain side information of a three-dimensional feature map of the picture to be processed, wherein the three-dimensional feature map includes a plurality of feature elements; Based on the aforementioned side information, the step of estimating the first probability distribution mean of the feature element to be decoded by using the aforementioned probability estimation network; The steps include: decoding the bitstream using the entropy decoder network based on the first probability distribution mean to obtain residual information of the feature element to be decoded, and obtaining the feature element to be decoded based on the residual information and the first probability distribution mean; The steps include: restoring the picture to be processed by using the second picture restoration network based on the three-dimensional feature map obtained through decoding; The method according to claim 12, including the method described in claim 12.
The method according to claim 13, wherein the second picture restoration network is a subnet of the first picture restoration network, or the picture restoration network and the first picture restoration network share a portion of a subnet, or the second picture restoration network and the first picture restoration network are two different networks.
A picture encoding device having memory and a video encoder, The memory is configured to store video data, and the video data includes the picture to be processed. The video encoder is configured to encode identification information into the bitstream indicating the decoder network to be used, wherein the identification information is a first value indicating that the decoder network used to decode the bitstream to obtain the picture to be processed is the first decoder network, or the identification information is a second value indicating that the decoder network used to decode the bitstream to obtain the picture to be processed is the second decoder network, and the processing resources required by the first decoder network are higher than those required by the second decoder network. Device.
A picture decoding device having memory and a video decoder, The memory is configured to store video data in the form of a bitstream, and the video data includes the picture to be processed. The video decoder decodes the bitstream to obtain identification information indicating the decoder network to be used. The system is configured such that, when the identification information is a first value, the bitstream is decoded using a first decoder network to obtain the picture to be processed, or, when the identification information is a second value, the bitstream is decoded using a second decoder network to obtain the picture to be processed, and the processing resources required by the first decoder network are higher than those required by the second decoder network. Device.
A video decoding device having coupled non-volatile memory and a processor, wherein the processor invokes program code stored in the memory to perform the method according to any one of claims 10 to 14.
A video encoding device having coupled non-volatile memory and a processor, wherein the processor invokes program code stored in the memory to perform the method according to any one of claims 1 to 14.
A computer-readable storage medium, wherein the computer-readable storage medium stores program code, and when the computer program is executed on the computer, the computer is enabled to perform the method according to any one of claims 10 to 14.
A computer-readable storage medium, wherein the computer-readable storage medium stores program code, and when the computer program is executed on the computer, the computer is enabled to perform the method according to any one of claims 1 to 9.

Description

Cross-reference to Related Applications This application claims priority to Chinese Patent Application No. 202310476967.6 entitled “Picture Encoding and Decoding Method and Apparatus,” filed with the China National Intellectual Property Administration on 24 April 2023, and Chinese Patent Application No. 202310956879.6 entitled “Picture Encoding and Decoding Method and Apparatus,” filed with the China National Intellectual Property Administration on 28 July 2023. Both applications are incorporated herein by reference in their entirety. Technical Field This application relates to the field of picture compression technology and the field of artificial intelligence technology, and more particularly to methods and apparatus for picture encoding and decoding. Many consumer applications (such as news, social, and shopping networking applications) require picture decoding to be completed on terminal devices with low computing power (such as mobile phones, personal PCs, and televisions). In some other industrial applications, picture decoding may be completed on terminal devices with higher computing power (such as GPU workstations with dedicated graphics cards), imposing higher requirements on picture compression ratios. Current neural network-based picture encoding and decoding methods typically have a fixed network structure and cannot meet the requirements of different application scenarios. This is an illustrative block diagram of a coding system according to one embodiment of the present invention. This is a diagram showing the structure of a convolutional neural network according to one embodiment of the present invention. This is a diagram of a deep learning-based video encoder and decoder network according to one embodiment of the present invention. This is a diagram showing the structure of a deep learning-based end-to-end video encoder and decoder network according to one embodiment of the present invention. This is a schematic flowchart of an encoding and decoding method according to one embodiment of the present invention. This is a diagram showing the structure of a first encoder network according to one embodiment of the present invention. This is a diagram showing the structure of a second encoder network according to one embodiment of the present invention. This is a diagram of an encoding process according to one embodiment of the present invention. This is a diagram of another encoding process according to one embodiment of the present invention. This is a diagram showing the structure of a decoder network according to one embodiment of the present invention. This is a diagram of a possible decoding process using a first decoder network according to one embodiment of the present invention. This is a diagram of a possible decoding process using a second decoder network according to one embodiment of the present invention. This is a diagram showing the execution process of an encoder network according to one embodiment of the present invention. This is a diagram showing the execution process of a decoder network according to one embodiment of the present invention. Figures 11A and 11B show the structure of an encoder network according to one embodiment of the present invention.Figures 11A and 11B show the structure of an encoder network according to one embodiment of the present invention. This is a diagram showing the structure of a ResAU 3x3 network without tanh according to one embodiment of the present invention. This is a diagram of an RNAB structure according to one embodiment of the present invention. This is a diagram showing the structure of the residual block layer according to one embodiment of the present invention. This is a diagram showing the network structure of a hyperdecoder network according to one embodiment of the present invention. This is a diagram showing the network structure of a hyperscale decoder network according to one embodiment of the present invention. Figures 17A and 17B show the execution process of a decoder network according to one embodiment of the present invention.Figures 17A and 17B show the execution process of a decoder network according to one embodiment of the present invention. This is a diagram showing the network structure of a LightResBlock according to one embodiment of the present invention. This is a diagram showing the structure of a decoder network according to Example 2, based on one embodiment of the present invention. This is a diagram showing the execution process of an encoder network according to Example 3, based on one embodiment of the present invention. This is a diagram showing the structure of an encoder and decoder network according to Example 3, based on one embodiment of the present application. Figures 21A and 21B show the structure of an encoder network according to Example 3, in one embodiment of the present application.Figures 21A and 21B show the structure of an encoder network according to Example 3, in one embodiment of the p