US-12621494-B2 - SEI message for generative face video

US12621494B2US 12621494 B2US12621494 B2US 12621494B2US-12621494-B2

Abstract

Methods and apparatuses are provided for processing video data by using generative face video supplemental enhancement information (SEI) messages. An exemplary method for generating a face picture includes: receiving a bitstream; decoding coded information of the bitstream to obtain a base picture and a supplemental enhancement information (SEI) message; determining whether the SEI message applies to a neural network for generating a face picture; in response to the SEI message applies to the neural network for generating the face picture, determining a mode and a corresponding face information parameter used to code the face picture based on the SEI message; and generating the face picture based on the base picture and the face information parameter by the neural network.

Inventors

Bolin Chen
Jie Chen
Yan Ye
Shiqi Wang

Assignees

ALIBABA INNOVATION PRIVATE LIMITED

Dates

Publication Date: 20260505
Application Date: 20240329

Claims (20)

1 . A method for generating a face picture, comprising: receiving a bitstream; decoding coded information of the bitstream to obtain a base picture and a supplemental enhancement information (SEI) message; determining whether the SEI message applies to a neural network for generating a face picture; in response to the SEI message applying to the neural network for generating the face picture, determining a mode used to code the face picture and a corresponding face information parameter based on the SEI message; and generating, using the neural network, the face picture based on the base picture and the face information parameter.
2 . The method according to claim 1 , wherein the SEI message comprises an identifying number indicator for indicating whether the SEI message is used to code the face picture; and determining whether the SEI message applies to the neural network for generating the face picture comprises: determining, based on the identifying number indicator, whether the SEI message is used to code the face picture.
3 . The method according to claim 2 , wherein the identifying number indicator indicates a generative face video filter.
4 . The method according to claim 1 , wherein the SEI message comprises a mode indicator, and wherein determining the mode used to code the face picture comprises: determining the mode based on the mode indicator.
5 . The method according to claim 4 , wherein the SEI message further comprises a parameter indicator corresponding to the mode indicator, and the face information parameter is determined based on the parameter indicator.
6 . The method according to claim 4 , wherein the mode indicator indicates at least one of the following as the mode: 2D facial landmarks, 2D keypoints, consistent regions, 3D keypoints, compact features, or facial semantics.
7 . The method according to claim 4 , wherein the mode indicator comprises at least one of a coordinate indicator for indicating whether the SEI message carries coordinate parameters for coding the face picture, a matrix indicator for indicating whether the SEI message carries matrix parameters for coding the face picture, or a semantic indicator for indicating whether the SEI message carries semantic parameters for coding the face picture.
8 . The method according to claim 7 , wherein the SEI message further comprises a parameter indicator for conveying the coordinate parameters, in response to the coordinate indicator indicates that the SEI message carries coordinate parameters; and wherein the coordinate parameters are determined as the face information parameters based on the parameter indicator.
9 . The method according to claim 7 , wherein the SEI message further comprises a parameter indicator for conveying the matrix parameters, in response to the matrix indicator indicating that the SEI message carries matrix parameters; and wherein the matrix parameters are determined as the face information parameters based on the parameter indicator.
10 . A method of encoding a video sequence into a bitstream, comprising: receiving a video sequence; and encoding one or more pictures of the video sequence to generate a bitstream, comprising: encoding a base picture from the one or more pictures and a supplemental enhancement information (SEI) message, the SEI message indicating a mode and a corresponding face information parameter used to code a face picture, and wherein the bitstream is used for generating the face picture by a neural network based on the base picture and the face information parameter.
11 . The method according to claim 10 , wherein the SEI message comprises an identifying number indicator for indicating whether the SEI message is used to code the face picture.
12 . The method according to claim 11 , wherein the identifying number indicator indicates a generative face video filter.
13 . The method according to claim 10 , wherein the SEI message comprises a mode indicator for indicating the mode used to code the face picture.
14 . The method according to claim 13 , wherein the SEI message further comprises a parameter indicator corresponding to the mode indicator for conveying the face information parameter.
15 . The method according to claim 14 , wherein the mode indicator indicates at least one of the following as the mode: 2D facial landmarks, 2D keypoints, consistent regions, 3D keypoints, compact features, or facial semantics.
16 . The method according to claim 13 , wherein the mode indicator comprises at least one of a coordinate indicator for indicating whether the SEI message carries coordinate parameters for coding the face picture, a matrix indicator for indicating whether the SEI message carries matrix parameters for coding the face picture, or a semantic indicator for indicating whether the SEI message carries semantic parameters for coding the face picture.
17 . The method according to claim 16 , wherein the SEI message further comprises a parameter indicator for conveying the coordinate parameters, in response to the coordinate indicator indicates that the SEI message carries coordinate parameters.
18 . The method according to claim 16 , wherein the SEI message further comprises a parameter indicator for conveying the matrix parameters, in response to the matrix indicator indicates that the SEI message carries matrix parameters.
19 . A non-transitory computer readable storage medium storing a bitstream of a video, the bitstream comprising: a base picture and a supplemental enhancement information (SEI) message, the SEI message indicating a mode and a corresponding face information parameter used to code a face picture, and wherein the bitstream is used for generating the face picture by a neural network based on the base picture and the face information parameter.
20 . The non-transitory computer readable storage medium according to claim 19 , wherein the SEI message comprises an identifying number indicator for indicating whether the SEI message is used to code the face picture.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The disclosure claims the benefits of priority to U.S. Provisional Application No. 63/494,493, filed on Apr. 6, 2023, U.S. Provisional Application No. 63/511,200, filed on Jun. 30, 2023, U.S. Provisional Application No. 63/587,763, filed on Oct. 4, 2023, and U.S. Provisional Application No. 63/618,387, filed on Jan. 8, 2024, all of which are incorporated herein by reference in their entireties. TECHNICAL FIELD The present disclosure generally relates to video processing, and more particularly, to methods and apparatuses for using supplemental enhancement information (SEI) messages to perform face video generative compression. BACKGROUND A video is a set of static pictures (or “frames”) capturing the visual information. To reduce the storage memory and the transmission bandwidth, a video can be compressed before storage or transmission and decompressed before display. The compression process is usually referred to as encoding and the decompression process is usually referred to as decoding. There are various video coding formats which use standardized video coding technologies, most commonly based on prediction, transform, quantization, entropy coding and in-loop filtering. The video coding standards, such as the High Efficiency Video Coding (HEVC/H.265) standard, the Versatile Video Coding (VVC/H.266) standard, AVS standards, specifying the specific video coding formats, are developed by standardization organizations. With more and more advanced video coding technologies being adopted in the video standards, the coding efficiency of the new video coding standards get higher and higher. SUMMARY OF THE DISCLOSURE Embodiments of the present disclosure provide methods and apparatuses for processing video data by using generative face video supplemental enhancement information (SEI) messages. According to some exemplary embodiments, there is provided a method for generating a face picture, the method including: receiving a bitstream; decoding coded information of the bitstream to obtain a base picture and a supplemental enhancement information (SEI) message; determining whether the SEI message applies to a neural network for generating a face picture; in response to the SEI message applies to the neural network for generating the face picture, determining a mode and a corresponding face information parameter used to code the face picture based on the SEI message; and generating the face picture based on the base picture and the face information parameter by the neural network. According to some exemplary embodiments, there is provided a method for encoding a video sequence into a bitstream, the method including: receiving a video sequence; and encoding one or more pictures of the video sequence to generate a bitstream, including: encoding a base picture of the one or more pictures and a supplemental enhancement information (SEI) message, the SEI message indicating a mode and a corresponding face information parameter used to code a face picture, and wherein the bitstream is used for generating the face picture by a neural network based on the base picture and the face information parameter. According to some exemplary embodiments, there is provided a non-transitory computer readable storage medium storing a bitstream of a video. The bitstream includes: a base picture and a supplemental enhancement information (SEI) message, the SEI message indicating a mode and a corresponding face information parameter used to code a face picture, and wherein the bitstream is used for generating the face picture by a neural network based on the base picture and the face information parameter. BRIEF DESCRIPTION OF THE DRAWINGS Embodiments and various aspects of the present disclosure are illustrated in the following detailed description and the accompanying figures. Various features shown in the figures are not drawn to scale. FIG. 1 is a schematic diagram illustrating an exemplary system for coding image data, according to some embodiments of the present disclosure. FIG. 2A is a schematic diagram illustrating an exemplary encoding process of a hybrid video coding system, consistent with embodiments of the disclosure. FIG. 2B is a schematic diagram illustrating another exemplary encoding process of a hybrid video coding system, consistent with embodiments of the disclosure. FIG. 3A is a schematic diagram illustrating an exemplary decoding process of a hybrid video coding system, consistent with embodiments of the disclosure. FIG. 3B is a schematic diagram illustrating another exemplary decoding process of a hybrid video coding system, consistent with embodiments of the disclosure. FIG. 4 is a block diagram of an exemplary apparatus for coding image data, according to some embodiments of the present disclosure. FIG. 5 is a flowchart of an exemplary method for processing video based on generative face video SEI messages, according to some embodiments of the present disclosure. FIG. 6 is anot