EP-4742672-A1 - VIDEO PROCESSING METHOD AND RELATED APPARATUS

EP4742672A1EP 4742672 A1EP4742672 A1EP 4742672A1EP-4742672-A1

Abstract

This application discloses a video processing method and a related apparatus. The method is applied to a video processing apparatus. The apparatus and a cloud application are disposed in a data center. A communication connection is established between the data center and a terminal device. The method includes: performing rendering to generate raw video data, where the raw video data is obtained by rendering raw model data corresponding to the raw video data, the raw model data is generated by the cloud application according to an application operation instruction sent by the terminal device through the communication connection, and the raw video data includes M rendering layers; and encoding the M rendering layers in the raw video data into i encoding layers, to generate a target video bitstream, where a first part of rendering layers in the M rendering layers are encoded into a first encoding layer, ..., and an i th part of rendering layers in the M rendering layers are encoded into an i th encoding layer; and M≥i≥1. In the foregoing method, layer encoding is performed based on a rendering result that has a visual attribute and that is obtained through layer rendering, so that mutual impact between layers is reduced and controllability of an image effect is enhanced.

Inventors

CAI, Changli
MA, Haichuan
LI, GANG

Assignees

Huawei Cloud Computing Technologies Co., Ltd.

Dates

Publication Date: 20260513
Application Date: 20240218

Claims (20)

A video processing method, applied to a video processing apparatus, wherein the video processing apparatus and a cloud application are disposed in at least one data center, a communication connection is established between the at least one data center and a terminal device, and the method comprises: performing rendering to generate raw video data, wherein the raw video data is obtained by rendering raw model data corresponding to the raw video data, the raw model data is generated by the cloud application according to an application operation instruction sent by the terminal device through the communication connection, the raw video data comprises M rendering layers, and M≥2; encoding the M rendering layers in the raw video data into i encoding layers, to generate a target video bitstream, wherein a first part of rendering layers in the M rendering layers are encoded into a first encoding layer, a second part of rendering layers in the M rendering layers are encoded into a second encoding layer, ..., and an i th part of rendering layers in the M rendering layers are encoded into an i th encoding layer; and M≥i≥1; and sending the target video bitstream to the terminal device.
The method according to claim 1, wherein the first part of rendering layers are a first rendering layer in the M rendering layers, the second part of rendering layers are a second rendering layer in the M rendering layers, ..., and the i th part of rendering layers are an i th rendering layer in the M rendering layers; and the method further comprises: receiving an image effect configuration instruction, wherein the image effect configuration instruction is used to enable/disable an effect of a target rendering layer in the M rendering layers; and the performing rendering to generate the raw video data comprises: starting/stopping, according to the image effect configuration instruction, rendering the target rendering layer, to start/stop encoding an encoding layer corresponding to the target rendering layer.
The method according to claim 1, wherein the method further comprises: receiving an image effect configuration instruction, wherein the image effect configuration instruction is used to enhance/weaken an effect of a target rendering layer in the M rendering layers; and the performing rendering to generate the raw video data comprises: improving/reducing, according to the image effect configuration instruction, quality of rendering the target rendering layer.
The method according to any one of claims 1 to 3, wherein the first encoding layer in the i encoding layers is an original base layer, another encoding layer in the i encoding layers is an original enhance layer, and the method further comprises: obtaining resource status information, wherein the resource status information indicates a status of one or more of the following: a rendering pipeline, an encoder, the communication connection, or the terminal device; determining, based on the resource status information, that a resource status can satisfy sending of at least m original enhance layers, wherein m≤i-1; and reconstructing the original base layer and the m original enhance layers into a target base layer, wherein a first bitstream comprises a bitstream corresponding to the target base layer.
The method according to claim 4, wherein after the reconstructing the original base layer and the m original enhance layers into the target base layer, the method further comprises: determining that a resource status cannot satisfy sending of the at least m original enhance layers but can satisfy sending of at least n original enhance layers, wherein n<m; reconstructing the original base layer and the n original enhance layers into the target base layer; and stopping rendering an (n+2) th rendering layer, an (n+3) th rendering layer, ..., and an (m+1) th rendering layer.
The method according to any one of claims 2 to 5, wherein the method further comprises: estimating resource occupation information and/or image effect information according to the image effect configuration instruction, wherein the resource occupation information is used to describe a resource occupation status of the terminal device or the communication connection after the raw video data is encoded according to the image effect configuration instruction, and the image effect information is used to describe an image effect displayed on the terminal device after the raw video data is encoded according to the image effect configuration instruction; and sending the resource occupation information and/or the image effect information to the terminal device.
A video processing method, applied to a video processing apparatus, wherein the method comprises: receiving a target video bitstream, wherein the target video bitstream is generated by performing encoding on raw video data, the raw video data comprises M rendering layers, the target video bitstream comprises at least one encoding layer, the at least one encoding layer comprises a base layer and/or at least one enhance layer, an i th encoding layer in the target video bitstream is obtained by encoding a part of rendering layers in the M rendering layers, M≥2, and M≥i≥1; and decoding the i th encoding layer into the part of rendering layers corresponding to the i th encoding layer, to decode the target video bitstream.
The method according to claim 7, wherein the method further comprises: sending resource status information, wherein the resource status information is used to determine that a resource status can satisfy sending of at least m original enhance layers, m≤M-1, the resource status information comprises status information of a terminal device, and the status information of the terminal device comprises one or more of the following: CPU usage, layer decoding overheads, and layer decoding latency.
A video processing apparatus, wherein the video processing apparatus and a cloud application are disposed in at least one data center, a communication connection is established between the at least one data center and a terminal device, and the video processing apparatus comprises: a rendering module, configured to perform rendering to generate raw video data, wherein the raw video data is obtained by rendering raw model data corresponding to the raw video data, the raw model data is generated by the cloud application according to an application operation instruction sent by the terminal device through the communication connection, the raw video data comprises M rendering layers, and M≥2; an encoding module, configured to encode the M rendering layers in the raw video data into i encoding layers, to generate a target video bitstream, wherein a first part of rendering layers in the M rendering layers are encoded into a first encoding layer, a second part of rendering layers in the M rendering layers are encoded into a second encoding layer, ..., and an i th part of rendering layers in the M rendering layers are encoded into an i th encoding layer; and M≥i≥1; and a transmission module, configured to send the target video bitstream to the terminal device.
The apparatus according to claim 9, wherein the first part of rendering layers are a first rendering layer in the M rendering layers, the second part of rendering layers are a second rendering layer in the M rendering layers, ..., and the i th part of rendering layers are an i th rendering layer in the M rendering layers; and the apparatus further comprises: an instruction receiving module, configured to receive an image effect configuration instruction, wherein the image effect configuration instruction is used to enable/disable an effect of a target rendering layer in the M rendering layers; and the rendering module is further configured to: start/stop, according to the image effect configuration instruction, rendering the target rendering layer, to start/stop encoding an encoding layer corresponding to the target rendering layer.
The apparatus according to claim 9, wherein the apparatus further comprises: an instruction receiving module, configured to receive an image effect configuration instruction, wherein the image effect configuration instruction is used to enhance/weaken an effect of a target rendering layer in the M rendering layers; and the rendering module is further configured to: improve/reduce, according to the image effect configuration instruction, quality of rendering the target rendering layer.
The apparatus according to any one of claims 9 to 11, wherein the first encoding layer in the i encoding layers is an original base layer, another encoding layer in the i encoding layers is an original enhance layer, and the apparatus further comprises: a resource status obtaining module, configured to: obtain resource status information, wherein the resource status information indicates a status of one or more of the following: a rendering pipeline, an encoder, the communication connection, and the terminal device; and determine, based on the resource status information, that a resource status can satisfy sending of at least m original enhance layers, wherein m≤i-1; and the encoding module is further configured to: reconstruct the original base layer and the m original enhance layers into a target base layer, wherein a first bitstream comprises a bitstream corresponding to the target base layer.
The apparatus according to claim 12, wherein: the resource status obtaining module is further configured to determine, after the encoding module reconstructs the original base layer and the m original enhance layers into the target base layer, that a resource status cannot satisfy sending of the at least m original enhance layers but can satisfy sending of at least n original enhance layers, wherein n<m; the encoding module is further configured to: reconstruct the original base layer and the n original enhance layers into the target base layer; and the rendering module is further configured to: stop rendering an (n+2) th rendering layer, an (n+3) th rendering layer, ..., and an (m+1) th rendering layer.
The apparatus according to any one of claims 10 to 13, wherein the apparatus further comprises: an estimation module, configured to estimate resource occupation information and/or image effect information according to the image effect configuration instruction, wherein the resource occupation information is used to describe a resource occupation status of the terminal device or the communication connection after the raw video data is encoded according to the image effect configuration instruction, and the image effect information is used to describe an image effect displayed on the terminal device after the raw video data is encoded according to the image effect configuration instruction; and the transmission module is further configured to: send the resource occupation information and/or the image effect information to the terminal device.
A video processing apparatus, wherein the apparatus comprises: a receiving module, configured to receive a target video bitstream, wherein the target video bitstream is generated by performing encoding on raw video data, the raw video data comprises M rendering layers, the target video bitstream comprises at least one encoding layer, the at least one encoding layer comprises a base layer and/or at least one enhance layer, an i th encoding layer in the target video bitstream is obtained by encoding a part of rendering layers in the M rendering layers, M≥2, and M≥i≥1; and a decoding module, configured to decode the i th encoding layer into the part of rendering layers corresponding to the i th encoding layer, to decode the target video bitstream.
The apparatus according to claim 15, wherein the apparatus further comprises: a sending module, configured to send resource status information, wherein the resource status information is used to determine that a resource status can satisfy sending of at least m original enhance layers, m≤M-1, the resource status information comprises status information of a terminal device, and the status information of the terminal device comprises one or more of the following: CPU usage, layer decoding overheads, and layer decoding latency.
A cloud application system, wherein the cloud application system comprises a cloud application, a rendering apparatus, and an encoding apparatus, and the cloud application, the rendering apparatus, and the encoding apparatus are disposed in at least one data center, wherein the cloud application is configured to generate model data; the rendering apparatus is configured to render the model data to generate raw video data, wherein the raw video data comprises M rendering layers, and M≥2; and the encoding apparatus is configured to encode the M rendering layers in the raw video data into i encoding layers, wherein M≥i≥1, to generate a target video bitstream.
A computing device cluster, wherein the computing device cluster comprises at least one computing device, and each computing device comprises a processor and a memory, wherein a processor of the at least one computing device is configured to execute instructions stored in a memory of the at least one computing device, to enable the computing device cluster to perform the video processing method according to any one of claims 1 to 6 or the video processing method according to claim 7 or 8.
A computer program product comprising instructions, wherein when the instructions are run by a computing device cluster, the computing device cluster is enabled to perform the video processing method according to any one of claims 1 to 6 or the video processing method according to claim 7 or 8.
A computer-readable storage medium, comprising computer program instructions, wherein when the computer program instructions are executed by a computing device cluster, the computing device cluster performs the video processing method according to any one of claims 1 to 6 or the video processing method according to claim 7 or 8.

Description

TECHNICAL FIELD This application relates to the field of network technologies, and in particular, to a video processing method and a related apparatus. BACKGROUND Layer coding, namely, scalable video coding (scalable video coding, SVC), is used to encode a video signal into a layered form, and output a multi-layer bitstream including a base layer and an enhance layer. When bandwidth resources are insufficient, only a base layer bitstream is transmitted and decoded. When the bandwidth resources are sufficient, an enhance layer bitstream may be transmitted and decoded to improve video decoding quality. In related technologies, an encoder for layer encoding divides a video in terms of time, space, and quality, encodes raw video data obtained at a render side to obtain a base layer and an enhance layer, and sends, based on a network transmission capability, a video bitstream including the base layer and the enhance layer to a terminal device. A decoding module is installed on the terminal device, decodes the video bitstream, and obtains a video frame through reconstruction, and the video frame is displayed. However, due to losses caused by data compression in rendering, encoding, and transmission processes, layers affect each other in this video processing method. Consequently, rendering content cannot be intuitively controlled by users, and user experience is affected. SUMMARY This application provides a video processing method and a related apparatus, to perform layer encoding by using information obtained through layer rendering. In this way, mutual impact between layers of an image is reduced, a user can flexibly control an image rendering result, and benefits of the layer rendering are maximized. Technical solutions are as follows. According to a first aspect, a video processing method is provided, and is applied to a video processing apparatus. The video processing apparatus and a cloud application are disposed in at least one data center. A communication connection is established between the at least one data center and a terminal device. The method includes: performing rendering to generate raw video data, where the raw video data is obtained by rendering raw model data corresponding to the raw video data, the raw model data is generated by the cloud application according to an application operation instruction sent by the terminal device through the communication connection, the raw video data includes M rendering layers, and M≥2; encoding the M rendering layers in the raw video data into i encoding layers, to generate a target video bitstream, where a first part of rendering layers in the M rendering layers are encoded into a first encoding layer, a second part of rendering layers in the M rendering layers are encoded into a second encoding layer, ..., and an ith part of rendering layers in the M rendering layers are encoded into an ith encoding layer; and M≥i≥1; and sending the target video bitstream to the terminal device. It can be learned that, in a process of rendering and encoding the raw model data to generate the video bitstream, the video processing apparatus first renders the raw model data into rendered data including the M rendering layers, and correspondingly encodes a part of rendering layers in the M rendering layers into one encoding layer, so that a total of i encoding layers are encoded, to complete layer encoding. The part of rendering layers are one or more rendering layers in the M rendering layers, but are not all rendering layers in the M rendering layers. Optionally, one rendering layer in the M rendering layers may be correspondingly encoded into one encoding layer, or a plurality of rendering layers in the M rendering layers may be correspondingly encoded into one encoding layer. In this way, the layer encoding is performed by using a rendering layer obtained through layer rendering as a dimension, to reduce mutual impact between layers of an image, and avoid losses of overall image details caused by encoding and transmission. In addition, control of a user on the image can be directly reflected on a finally presented image through rendering, encoding, and transmission, so that controllability of an image effect is enhanced and user experience is improved. According to a possible implementation of the first aspect, the first part of rendering layers are a first rendering layer in the M rendering layers, the second part of rendering layers are a second rendering layer in the M rendering layers, ..., and the ith part of rendering layers are an ith rendering layer in the M rendering layers. The method further includes: receiving an image effect configuration instruction, where the image effect configuration instruction is used to enable/disable an effect of a target rendering layer in the M rendering layers; and the performing rendering to generate the raw video data includes: starting/stopping, according to the image effect configuration instruction, rendering the target render