CN-122029814-A - Method, apparatus and medium for visual data processing
Abstract
Embodiments of the present disclosure provide a solution for visual data processing. A method for visual data processing is presented. The method includes performing a conversion between visual data and a bitstream of the visual data using a Neural Network (NN) based model, a color transformation between an internal color format associated with the conversion and an output color format being allowed to be disabled.
Inventors
- S. Eisenleck
- WU YAOJUN
- ZHANG ZHAOBIN
- WANG MENG
- ZHANG KAI
- ZHANG LI
Assignees
- 抖音视界有限公司
- 字节跳动有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20241009
- Priority Date
- 20231010
Claims (20)
- 1. A method for visual data processing, comprising: conversion between visual data and a bitstream of the visual data is performed using a Neural Network (NN) based model, color conversion between an internal color format and an output color format associated with the conversion being allowed to be disabled.
- 2. The method of claim 1, wherein the bitstream includes an indication to indicate whether the color transform is disabled.
- 3. The method of any of claims 1-2, wherein the internal color format indicates a color space for encoding and decoding the visual data, and the output color format indicates a color space of visual data output from the conversion.
- 4. A method according to any one of claims 1 to 3, wherein whether the color transformation is disabled depends on the size of the component of the input visual data of the color transformation or the size of the component of the output visual data of the color transformation.
- 5. The method of claim 4, wherein the color transformation is disabled if a size of a first component of the input visual data is different from a size of a second component of the input visual data, or The color transformation is disabled if the size of the first component of the output visual data is different from the size of the second component of the output visual data.
- 6. The method of any of claims 4 to 5, wherein if a size of a first component of the input visual data is the same as a size of a second component of the input visual data, whether the color transform is enabled is determined based on an indication in the bitstream, or If the size of the first component of the output visual data is the same as the size of the second component of the output visual data, then whether the color conversion is enabled is determined based on the indication in the bitstream.
- 7. The method of any of claims 4 to 6, wherein if the input visual data is in a 4:2:2 format or a 4:2:0 format, the color transformation is disabled, or If the output visual data is in a 4:2:2 format or a 4:2:0 format, the color transformation is disabled.
- 8. The method of any of claims 4 to 7, wherein if the input visual data is in 4:4:4 format, whether the color transform is enabled is determined based on an indication in the bitstream, or If the output visual data is in a 4:4:4 format, whether the color transform is enabled is determined based on an indication in the bitstream.
- 9. The method of claim 3, wherein whether the color transform is disabled depends on the output color format.
- 10. The method of claim 9, wherein the color transform is enabled if the output color format indicates that the color space of the visual data output from the conversion is RGB or standard RGB, or If the output color format indicates that the color space of the visual data output from the conversion is YUV or YCbCr, the color conversion is disabled.
- 11. The method according to any of claims 1 to 10, wherein the color transformation is allowed to be performed with at least one predetermined parameter.
- 12. The method of claim 11, wherein the parameter comprises a matrix, an offset, a displacement, or a coefficient.
- 13. The method of any of claims 11 to 12, wherein the bitstream comprises an indication for indicating whether the color transform is performed with the at least one predetermined parameter.
- 14. The method of any of claims 11 to 12, wherein if the color transform is enabled, the bitstream comprises an indication indicating whether the color transform is performed with the at least one predetermined parameter.
- 15. The method of any of claims 11 to 14, wherein the bitstream comprises an indication for indicating the at least one predetermined parameter.
- 16. The method of any of claims 11 to 14, wherein the at least one predetermined parameter is not indicated in the bitstream.
- 17. The method of any of claims 1-16, wherein the color transform is integer.
- 18. The method of any one of claims 1 to 17, wherein the color transformation follows an inverse normalization process or an integer process.
- 19. The method of claim 18, wherein the inverse normalization process comprises a bit depth conversion process, or the integer process comprises a rounding operation, a down-rounding operation, or an up-rounding operation.
- 20. The method of any of claims 17-19, wherein all multiplication and addition coefficients used in the color transform are integers.
Description
Method, apparatus and medium for visual data processing Technical Field Embodiments of the present disclosure relate generally to visual data processing technology and, more particularly, to neural network-based visual data codec. Background Deep learning has evolved rapidly in the last decade in various fields, especially in the fields of computer vision and image processing. Neural networks were originally invented through interdisciplinary studies of neuroscience and mathematics. It shows great capability in the context of nonlinear transformation and classification. In the last five years, image/video compression techniques based on neural networks have made significant progress. It is reported that the latest image compression algorithm based on the neural network achieves rate distortion (R-D) performance equivalent to that of the multi-functional video codec (VVC). With the continuous improvement of the compression performance of the neural image, the video compression based on the neural network has become an actively developed research field. However, the codec efficiency of image/video codec based on neural network is generally expected to be further improved. Disclosure of Invention Embodiments of the present disclosure provide a solution for visual data processing. In a first aspect, a method for visual data processing is presented. The method includes performing a conversion between visual data and a bitstream of the visual data using a Neural Network (NN) based model, a color transformation between an internal color format associated with the conversion and an output color format being allowed to be disabled. Based on the method according to the first aspect of the present disclosure, the color transformation between the internal color format associated with the conversion and the output color format is allowed to be disabled. The proposed method may advantageously enable a more flexible use of the color transformation compared to conventional solutions, where such color transformation is always applied, and thus allow the output visual data from the conversion to be in a format different from the 4:4:4 format, such as the 4:2:0 format, the 4:2:2 format, etc. In this way, the codec flexibility and the codec efficiency can be improved. In a second aspect, an apparatus for visual data processing is presented. The apparatus includes a processor and a non-transitory memory having instructions thereon. The instructions, when executed by a processor, cause the processor to perform a method according to the first aspect of the present disclosure. In a third aspect, a non-transitory computer readable storage medium is presented. The non-transitory computer readable storage medium stores instructions that cause a processor to perform a method according to the first aspect of the present disclosure. In a fourth aspect, another non-transitory computer readable recording medium is presented. The non-transitory computer readable recording medium stores a bit stream of visual data generated by a method performed by an apparatus for visual data processing. The method includes performing a conversion between visual data and a bitstream using a Neural Network (NN) based model, a color transformation between an internal color format associated with the conversion and an output color format being allowed to be disabled. In a fifth aspect, a method for storing a bitstream of visual data is presented. The method includes performing a conversion between visual data and a bitstream using a Neural Network (NN) based model, color conversion between an internal color format associated with the conversion and an output color format being allowed to be disabled, and storing the bitstream in a non-transitory computer readable recording medium. This summary is intended to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Drawings The foregoing and other objects, features and advantages of exemplary embodiments of the disclosure will be apparent from the following detailed description of exemplary embodiments of the disclosure, in which like reference numbers generally refer to the same parts. FIG. 1A illustrates a block diagram of an example visual data codec system according to some embodiments of the present disclosure; FIG. 1B is a schematic diagram illustrating an example transform codec scheme; FIG. 2 illustrates an example potential representation of an image; FIG. 3 is a schematic diagram illustrating an example automatic encoder implementing a super a priori model; FIG. 4 is a schematic diagram illustrating an example combined model configured to jointly optimize a context model with a super prior and an automatic encoder; FIG. 5 illustrates an example encoding proc