US-12626415-B2 - Video frame compression method, video frame decompression method, and apparatus

US12626415B2US 12626415 B2US12626415 B2US 12626415B2US-12626415-B2

Abstract

A video frame compression method includes determining a target neural network from a plurality of neural networks according to a network selection policy; and generating, by using the target neural network, compression information corresponding to a current video frame. If the compression information is obtained by using a first neural network, the compression information includes first compression information of a first feature of the current video frame, and a reference frame of the current video frame is used for a compression process of the first feature of the current video frame. If the compression information is obtained by using a second neural network, the compression information includes second compression information of a second feature of the current video frame, and a reference frame of the current video frame is used for a generation process of the second feature of the current video frame.

Inventors

Yibo Shi
Jing Wang
Yunying Ge

Assignees

HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20230512
Priority Date: 20201113

Claims (20)

1 . An encoder, comprising: a non-transitory computer-readable storage medium configured to store programming instructions; and one or more processors coupled to the non-transitory computer-readable storage medium and configured to execute the programming instructions to cause the encoder to: select a target neural network from a plurality of neural networks according to a network selection policy, wherein the plurality of neural networks comprises a first neural network and a second neural network, wherein the second neural network comprises a convolutional network and a second entropy encoding layer, and wherein the convolutional network comprises a plurality of convolutional layers and an activation rectified linear unit (ReLU) layer; and perform compression encoding on a current video frame using the target neural network in order to obtain compression information corresponding to the current video frame, wherein when the first neural network is the target neural network, the compression information comprises first compression information of a first feature of the current video frame, and the compression encoding uses a reference frame of the current video frame for a compression process of the first feature, and wherein when the second neural network is the target neural network, the compression information comprises second compression information of a residual of the current video frame, the compression encoding uses the reference frame of the current video frame for a generation process of the residual, the convolutional network obtains the residual based on the reference frame of the current video frame, and the second entropy encoding layer performs entropy encoding on the residual of the current video frame in order to output the second compression information.
2 . The encoder of claim 1 , wherein the first neural network comprises an encoding network and a first entropy encoding layer, the encoding network obtains the first feature from the current video frame, and the first entropy encoding layer performs entropy encoding on the first feature based on the reference frame of the current video frame in order to output the first compression information.
3 . The encoder of claim 2 , wherein when the target neural network is the first neural network, the one or more processors are further configured to execute the programming instructions to cause the encoder to: obtain, using the encoding network, the first feature from the current video frame; predict, using the first entropy encoding layer, the first feature based on the reference frame of the current video frame in order to generate a predicted result of the first feature; generate, using the first entropy encoding layer, a probability distribution of the first feature based on the predicted result of the first feature; and perform entropy encoding, using the first entropy encoding layer, on the first feature based on the probability distribution of the first feature in order to obtain the first compression information.
4 . The encoder of claim 1 , wherein the network selection policy is related to one or more of location information of the current video frame or a data amount carried in the current video frame.
5 . The encoder of claim 4 , wherein the one or more processors are configured to execute the programming instructions to cause the encoder to further select the target neural network from the plurality of neural networks according to the network selection policy by selecting the target neural network from the plurality of neural networks based on the location information of the current video frame in a current video sequence, wherein the location information indicates that the current video frame is an X th frame in the current video sequence.
6 . The encoder of claim 4 , wherein the one or more processors are configured to execute the programming instructions to cause the encoder to further select the target neural network from the plurality of neural networks according to the network selection policy by selecting the target neural network from the plurality of neural networks based on an attribute of the current video frame, wherein the attribute of the current video frame indicates the data amount carried in the current video frame, and wherein the attribute of the current video frame comprises any one or more of an entropy, a contrast, or a saturation of the current video frame.
7 . The encoder of claim 1 , wherein the one or more processors are further configured to execute the programming instructions to cause the encoder to: generate indication information corresponding to the compression information, wherein the indication information indicates that the compression information is obtained using the target neural network; and send the indication information to a decoder.
8 . An encoder, comprising: a non-transitory computer-readable storage medium configured to store programming instructions; and one or more processors coupled to the non-transitory computer-readable storage medium and configured to execute the programming instructions to cause the encoder to: perform compression encoding on a current video frame using a first neural network and a reference frame of the current video frame in order to obtain first compression information of a first feature of the current video frame; generate a first video frame using the first neural network, wherein the first video frame is a first reconstructed frame of the current video frame; perform compression encoding on the current video frame using a second neural network and the reference frame of the current video frame in order to obtain second compression information of a second feature of the current video frame, wherein the second neural network comprises a convolutional network and a second entropy encoding layer, wherein the convolutional network comprises a plurality of convolutional layers and an activation rectified linear unit (ReLU) layer, wherein the convolutional network obtains a residual of the current video frame based on the reference frame of the current video frame, and wherein the second entropy encoding layer performs entropy encoding on the residual of the current video frame in order to output the second compression information; generate a second video frame using the second neural network, wherein the second video frame is a second reconstructed frame of the current video frame; and determine, using either the first neural network or the second neural network and based on the first compression information, the first video frame, the second compression information, and the second video frame, compression information corresponding to the current video frame, wherein the compression information is the first compression information when using the first neural network, and wherein the compression information is the second compression information when using the second neural network.
9 . The encoder of claim 8 , wherein the first neural network comprises an encoding network and a first entropy encoding layer, wherein the encoding network obtains the first feature from the current video frame, and wherein the first entropy encoding layer performs entropy encoding on the first feature in order to output the first compression information.
10 . The encoder of claim 8 , wherein the one or more processors are further configured to execute the programming instructions to cause the encoder to: generate indication information corresponding to the compression information, wherein the indication information indicates that the compression information is obtained using the first neural network or the second neural network; and send the indication information to a decoder.
11 . An encoder, comprising: a non-transitory computer-readable storage medium configured to store programming instructions; and one or more processors coupled to the non-transitory computer-readable storage medium and configured to execute the programming instructions to cause the encoder to: perform compression encoding on a third video frame using a first neural network and a first reference frame of the third video frame in order to obtain first compression information of a first feature of the third video frame; and perform compression encoding on a fourth video frame using a second neural network and a second reference frame of the fourth video frame in order to obtain second compression information of a second feature of the fourth video frame, wherein the second neural network comprises a convolutional network and a second entropy encoding layer, wherein the convolutional network comprises a plurality of convolutional layers and an activation rectified linear unit (ReLU) layer, wherein the convolutional network obtains a residual of a current video frame based on a third reference frame of the current video frame, and wherein the second entropy encoding layer performs entropy encoding on the residual of the current video frame in order to output the second compression information, and wherein the third video frame and the fourth video frame are different video frames in a same video sequence.
12 . The encoder of claim 11 , wherein the first neural network comprises an encoding network and a first entropy encoding layer, wherein the encoding network obtains the first feature from the current video frame, and wherein the first entropy encoding layer performs entropy encoding on the first feature in order to output the first compression information.
13 . The encoder of claim 11 , wherein the one or more processors are further configured to execute the programming instructions to cause the encoder to: generate indication information corresponding to the compression information, wherein the indication information indicates that the compression information is obtained using the first neural network or the second neural network; and send the indication information to a decoder.
14 . A decoder, comprising: a non-transitory computer-readable storage medium configured to store programming instructions; and one or more processors coupled to the non-transitory computer-readable storage medium and configured to execute the programming instructions to cause the decoder to: obtain compression information of a current video frame; select, from a plurality of neural networks, a target neural network corresponding to the current video frame, wherein the plurality of neural networks comprises a third neural network and a fourth neural network, wherein the fourth neural network comprises a convolutional network and a first entropy decoding layer, and wherein the convolutional network comprises a plurality of convolutional layers and an activation rectified linear unit (ReLU) layer; perform decompression, using the target neural network and based on the compression information, to obtain a reconstructed frame of the current video frame, wherein when the target neural network is the third neural network, the compression information comprises first compression information of a first feature of the current video frame, the decompression is on the first compression information and uses a reference frame of the current video frame for a decompression process of the first compression information in order to obtain the first feature; perform, using the first feature, a generation process of the reconstructed frame of the current video frame, wherein when the target neural network is the fourth neural network, the compression information comprises second compression information of a second feature of the current video frame, the decoder performs the decompression using the second compression information in order to obtain the second feature; and perform, using the reference frame of the current video frame and the second feature, the generation process of the reconstructed frame of the current video frame.
15 . The decoder of claim 14 , wherein the third neural network comprises a second entropy decoding layer and a decoding network, wherein the second entropy decoding layer performs entropy decoding on the first compression information of the current video frame based on the reference frame of the current video frame, and wherein the decoding network generates the reconstructed frame of the current video frame based on the first feature.
16 . The decoder of claim 14 , wherein the one or more processors are configured to execute the programming instructions to cause the decoder to further select the target neural network corresponding to the current video frame by: obtaining indication information corresponding to the compression information; and selecting the target neural network from the plurality of neural networks based on the indication information.
17 . The decoder of claim 14 , wherein the first entropy decoding layer performs entropy decoding on the second compression information, and the convolutional network performs the generation process of a third reconstructed frame of the current video frame based on the reference frame of the current video frame and the second feature.
18 . A decoder, comprising: a non-transitory computer-readable storage medium configured to store programming instructions; and one or more processors coupled to the non-transitory computer-readable storage medium and configured to execute the programming instructions to cause the decoder to: decompress first compression information of a third video frame using a third neural network and a first reference frame of the third video frame in order to obtain a first reconstructed frame of the third video frame, wherein the first compression information comprises third compression information of a first feature of the third video frame; perform, using the first feature, a generation process of the first reconstructed frame; decompress second compression information of a fourth video frame using a fourth neural network and the second compression information in order to obtain a second reconstructed frame of the fourth video frame, wherein the second compression information comprises fourth compression information of a second feature of the fourth video frame, wherein the fourth neural network comprises a convolutional network and a fourth entropy decoding layer, and wherein the convolutional network comprises a plurality of convolutional layers and an activation rectified linear unit (ReLU) layer; and perform, using a second reference frame of the fourth video frame and the second feature, the generation process of the second reconstructed frame.
19 . The decoder of claim 18 , wherein the third neural network comprises a third entropy decoding layer and a decoding network, wherein the third entropy decoding layer performs entropy decoding on the first compression information of a current video frame based on a third reference frame of the current video frame, and wherein the decoding network generates a third reconstructed frame of the current video frame based on the first feature.
20 . The decoder of claim 18 , wherein the fourth entropy decoding layer performs entropy decoding on the second compression information, and wherein the convolutional network performs the generation process of a third reconstructed frame of a current video frame based on a third reference frame of the current video frame and the second feature.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This is a continuation of International Patent Application No. PCT/CN2021/112077 filed on Aug. 11, 2021, which claims priority to Chinese Patent Application No. 202011271217.8 filed on Nov. 13, 2020. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties. TECHNICAL FIELD This application relates to the field of artificial intelligence, and in particular, to a video frame compression method, a video frame decompression method, and an apparatus. BACKGROUND Artificial intelligence (AI) is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer, to perceive an environment, obtain knowledge, and achieve an optimal result by using the knowledge. In other words, artificial intelligence is a branch of computer science, and is intended to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is to study design principles and implementation methods of various intelligent machines, so that the machines have perceiving, inference, and decision-making functions. Currently, a common application manner of artificial intelligence is to compress a video frame by using a neural network based on deep learning. Specifically, an encoder calculates an optical flow of a current video frame relative to a reference frame of the current video frame by using the neural network, to generate an optical flow of an original current video frame relative to the reference frame. The encoder performs compression encoding on the optical flow, to obtain a compressed optical flow. Both the reference frame of the current video frame and the current video frame belong to a current video sequence, and the reference frame of the current video frame is a video frame that needs to be referred to when compression encoding is performed on the current video frame. The compressed optical flow is decompressed, to obtain a decompressed optical flow, and a predicted current video frame is generated based on the decompressed optical flow and the reference frame. The neural network is used to calculate a residual between the original current video frame and the predicted current video frame, and compression encoding is performed on the residual. The compressed optical flow and a compressed residual are sent to a decoder. Therefore, the decoder may obtain, by using the neural network, a decompressed current video frame based on a decompressed reference frame, the decompressed optical flow, and a decompressed residual. The process of obtaining the decompressed video frame by using the neural network depends excessively on quality of the decompressed reference frame, and errors are accumulated frame by frame. Therefore, a solution for improving quality of a reconstructed frame of a video frame is urgently required. SUMMARY This application provides a video frame compression method, a video frame decompression method, and an apparatus. When compression information is obtained by using a first neural network, quality of a reconstructed frame of a current video frame does not depend on quality of a reconstructed frame of a reference frame of the current video frame, to prevent errors from being accumulated frame by frame, and improve quality of the reconstructed frame of the video frame. In addition, advantages of the first neural network and a second neural network are combined, to minimize a data amount that needs to be transmitted, and improve quality of the reconstructed frame of the video frame. To resolve the foregoing technical problem, this application provides the following technical solutions. According to a first aspect, this application provides a video frame compression method. In the method, an artificial intelligence technology may be applied to the field of video frame encoding/decoding. The method may include: An encoder determines a target neural network from a plurality of neural networks according to a network selection policy, where the plurality of neural networks include a first neural network and a second neural network; and performs compression encoding on a current video frame by using the target neural network, to obtain compression information corresponding to the current video frame. If the compression information is obtained by using the first neural network, the compression information includes first compression information of a first feature of the current video frame, a reference frame of the current video frame is used for a compression process of the first feature of the current video frame, and the reference frame of the current video frame is not used for a generation process of the first feature of the current video frame. In other words, the first feature of the current video frame can be obtained only