CN-115733976-B - Method, medium and apparatus for augmented reality video encoding

CN115733976BCN 115733976 BCN115733976 BCN 115733976BCN-115733976-B

Abstract

The present disclosure relates to an adaptive quantization matrix for augmented reality video coding. Encoding an extended reality (XR) video frame may include obtaining an XR video frame including a background image and a virtual object, obtaining a first region of the background image from an image renderer, the virtual object overlaid on the first region, dividing the XR video frame into a virtual region and a real region, wherein the virtual region includes the first region of the background image and the virtual object, the real region includes a second region of the background image, determining a corresponding first quantization parameter for the virtual region based on an initial quantization parameter associated with the virtual region, determining a corresponding second quantization parameter for the real region based on an initial quantization parameter associated with the real region, and encoding the virtual region based on the corresponding first quantization parameter and encoding the real region based on the corresponding second quantization parameter.

Inventors

Y.ZHOU

Assignees

苹果公司

Dates

Publication Date: 20260512
Application Date: 20220826
Priority Date: 20210827

Claims (20)

1. A method for encoding an extended reality (XR) video frame, comprising: obtaining an XR video frame, the XR video frame comprising a background image and a virtual object overlaying at least a portion of the background image; Dividing the XR video frame into a first region comprising virtual content, a second region comprising real content within a physical environment, and a third region comprising virtual content and real content, wherein the first region comprises at least a portion of the virtual object, the second region comprises a region of the background image separate from the first region, and the third region comprises a portion of the virtual object and a portion of the background image; Determining a corresponding first quantization parameter for the first region based on an initial quantization parameter associated with the virtual region; Determining a corresponding second quantization parameter for the second region based on an initial quantization parameter associated with the real region; determining a corresponding third quantization parameter for the third region based on an initial quantization parameter associated with the intermediate region, and The first region is encoded based on the corresponding first quantization parameter, the second region is encoded based on the corresponding second quantization parameter, and the third region is encoded based on the corresponding third quantization parameter.
2. The method of claim 1, wherein determining the corresponding first quantization parameter for the first region is further based on an upper threshold limit associated with a virtual region and a lower threshold limit associated with a virtual region.
3. The method of claim 1, wherein determining the corresponding second quantization parameter for the second region is further based on an upper threshold limit associated with a real region and a lower threshold limit associated with a real region.
4. The method according to claim 1, wherein: the first region includes at least a portion of the virtual object that meets a first complexity criterion; The second region includes a first portion of the region of the background image separate from the first region, wherein the first portion of the region of the background image separate from the first region does not satisfy a second complexity criterion; the third region includes at least one of (i) a portion of the at least one virtual object that does not satisfy the first complexity criterion, and (ii) a second portion of the region of the background image that is separate from the first region, wherein the second portion of the region of the background image that is separate from the first region satisfies the second complexity criterion.
5. The method of claim 1, wherein determining the corresponding third quantization parameter for the third region is further based on an upper threshold associated with a middle region and a lower threshold associated with a middle region.
6. The method of claim 1, further comprising obtaining, via a gaze tracking user interface, an input indicating a focus area, wherein dividing the XR video frame is based at least in part on the focus area.
7. A non-transitory computer readable medium comprising computer code executable by at least one processor to: Obtaining an extended reality (XR) video frame, the XR video frame comprising a background image and a virtual object overlaying at least a portion of the background image; Dividing the XR video frame into a first region comprising virtual content, a second region comprising real content within a physical environment, and a third region comprising virtual content and real content, wherein the first region comprises at least a portion of the virtual object, the second region comprises a region of the background image separate from the first region, and the third region comprises a portion of the virtual object and a portion of the background image; Determining a corresponding first quantization parameter for the first region based on an initial quantization parameter associated with the virtual region; Determining a corresponding second quantization parameter for the second region based on an initial quantization parameter associated with the real region; determining a corresponding third quantization parameter for the third region based on an initial quantization parameter associated with the intermediate region, and The first region is encoded based on the corresponding first quantization parameter, the second region is encoded based on the corresponding second quantization parameter, and the third region is encoded based on the corresponding third quantization parameter.
8. The non-transitory computer-readable medium of claim 7, wherein the computer-readable code for determining the corresponding first quantization parameter for the first region further comprises computer-readable code for determining the corresponding first quantization parameter further based on an upper threshold limit associated with a virtual region and a lower threshold limit associated with a virtual region.
9. The non-transitory computer-readable medium of claim 7, wherein the computer-readable code for determining the corresponding second quantization parameter for the second region further comprises computer-readable code for determining the corresponding second quantization parameter further based on an upper threshold limit associated with a real region and a lower threshold limit associated with a real region.
10. The non-transitory computer-readable medium of claim 7, wherein an initial quantization parameter associated with the virtual region is less than an initial quantization parameter associated with the real region.
11. The non-transitory computer-readable medium of claim 7, wherein the computer-readable medium further comprises computer-readable code executable by the at least one processor to: for the first region: Determining a corresponding first region size based on an initial region size associated with the virtual region, and Dividing the first region into one or more additional virtual regions based on the corresponding first region size; For the second region: determining a corresponding second region size based on an initial region size associated with the real region, and The second region is divided into one or more additional real regions based on the corresponding second region size.
12. The non-transitory computer-readable medium of claim 11, wherein the initial region size associated with a virtual region is smaller than the initial region size associated with a real region.
13. The non-transitory computer-readable medium of claim 7, wherein: the first region includes at least a portion of the virtual object that meets a first complexity criterion; The second region including a first portion of the region of the background image separated from the first region, wherein the first portion of the region of the background image separated from the first region does not satisfy a second complexity criterion, and The third region includes at least one of (i) a portion of the virtual object that does not satisfy the first complexity criterion, and (ii) a second portion of the region of the background image that is separate from the first region, wherein the second portion of the region satisfies the second complexity criterion.
14. The non-transitory computer-readable medium of claim 7, wherein the computer-readable code for determining the corresponding third quantization parameter for the third region further comprises computer-readable code for determining the corresponding third quantization parameter further based on an upper threshold limit associated with an intermediate region and a lower threshold limit associated with an intermediate region.
15. The non-transitory computer-readable medium of claim 13, wherein an initial quantization parameter associated with the intermediate region is less than an initial quantization parameter associated with the real region and greater than an initial quantization parameter associated with the virtual region.
16. The non-transitory computer-readable medium of claim 13, wherein the computer-readable medium further comprises computer-readable code executable by the at least one processor to: for the first region: Determining a corresponding first region size based on an initial region size associated with the virtual region, and Dividing the first region into one or more additional virtual regions based on the corresponding first region size; For the second region: determining a corresponding second region size based on an initial region size associated with the real region, and Dividing the second region into one or more additional real regions based on the corresponding second region size, and For the third region: Determining a corresponding third region size based on an initial region size associated with the intermediate region, and The third region is divided into one or more additional intermediate regions based on the corresponding third region size.
17. The non-transitory computer-readable medium of claim 16, wherein an initial region size associated with the intermediate region is greater than an initial region size associated with the virtual region and less than an initial region size associated with the real region.
18. The non-transitory computer-readable medium of claim 7, wherein the computer-readable medium further comprises computer-readable code executable by the at least one processor to obtain input indicative of a focus area via a gaze tracking user interface, wherein the computer-readable code for partitioning the XR video frames further comprises computer-readable code for partitioning the XR video frames based at least in part on the focus area.
19. An apparatus for video encoding, comprising: an image capturing device configured to capture a background image; at least one processor, and At least one computer-readable medium comprising computer-readable code, the computer readable code is executable by the at least one processor to: Obtaining an extended reality (XR) video frame, the XR video frame comprising the background image and a virtual object overlaying at least a portion of the background image; Dividing the XR video frame into a first region comprising virtual content, a second region comprising real content within a physical environment, and a third region comprising virtual content and real content, wherein the first region comprises at least a portion of the virtual object, the second region comprises a region of the background image separate from the first region, and the third region comprises a portion of the virtual object and a portion of the background image; Determining a corresponding first quantization parameter for the first region based on an initial quantization parameter associated with the virtual region; Determining a corresponding second quantization parameter for the second region based on an initial quantization parameter associated with the real region; determining a corresponding third quantization parameter for the third region based on an initial quantization parameter associated with the intermediate region, and The first region is encoded based on the corresponding first quantization parameter, the second region is encoded based on the corresponding second quantization parameter, and the third region is encoded based on the corresponding third quantization parameter.
20. The apparatus of claim 19, wherein the computer-readable code for determining the corresponding first quantization parameter for the first region further comprises computer-readable code for determining the corresponding first quantization parameter further based on an upper threshold limit associated with a virtual region and a lower threshold limit associated with a virtual region.

Description

Method, medium and apparatus for augmented reality video encoding Technical Field The present disclosure relates generally to image processing. More particularly, but not by way of limitation, the present disclosure relates to techniques and systems for video coding. Background Some video coding systems use bit rate control algorithms to determine how many bits to allocate to a particular region of a video frame to ensure uniform picture quality for a given video coding standard and to reduce the bandwidth required to transmit the encoded video frame. Some bit rate control algorithms use frame-level and macroblock-level content statistics (e.g., complexity and contrast) to determine quantization parameters and corresponding bit allocations. The quantization parameter is an integer mapped to a quantization step size and controls the amount of compression of each region of the video frame. For example, an 8×8 pixel region is multiplied by a quantization parameter and divided by a quantization matrix. The resulting value is then rounded to the nearest integer. The large quantization parameter corresponds to a higher quantization, a more compression and a lower image quality than the small quantization parameter corresponding to a lower quantization, a less compression and a higher image quality. The bit rate control algorithm may use a constant quantization parameter or a varying quantization parameter to adapt to a target average bit rate, constant image quality, etc. However, many bit rate control algorithms are objective and there is no guarantee that more bits are allocated to the region of interest than to the background. Some bit rate control algorithms are able to determine a region of interest and allocate more bits to the region of interest than to the background, but they are typically computationally expensive and time consuming to operate. There is a need for an improved technique for encoding video frames. Drawings Fig. 1 shows an example diagram of an extended reality (XR) video frame. Fig. 2 illustrates, in flow chart form, an exemplary process for encoding an augmented reality video frame based on an adaptive quantization matrix. Fig. 3 shows an example diagram of an augmented reality video frame divided into a virtual region and a real region. Fig. 4 illustrates, in flow chart form, an exemplary process for encoding an augmented reality video frame based on an adaptive quantization matrix and input from a gaze tracking user interface. Fig. 5A-5C illustrate in flow chart form an exemplary process for encoding an augmented reality video frame based on an adaptive quantization matrix and first and second complexity criteria. Fig. 6 illustrates an example diagram of an augmented reality video frame divided into regions based on a first complexity criterion and a second complexity criterion. Fig. 7A-7C illustrate in flow chart form an exemplary process for encoding an augmented reality video frame based on an adaptive quantization matrix, first and second complexity criteria, and an adjusted region size. Fig. 8 shows an example diagram of an intermediate region of an augmented reality video frame divided into regions based on first and second complexity criteria and an adjusted region size. Fig. 9 illustrates, in block diagram form, an exemplary system for encoding an augmented reality video stream. Fig. 10 illustrates an exemplary system for use in various video encoding systems, including for encoding an augmented reality video stream. Detailed Description The present disclosure relates to systems, methods, and computer-readable media for video encoding an augmented reality (XR) video stream. In particular, an XR video frame comprising a background image and at least one virtual object may be obtained. A first region of the background image on which the at least one virtual object is to be overlaid may be obtained from an image renderer. The XR video frame may be divided into at least one virtual region and at least one real region. The at least one virtual region includes the first region of the background image and the at least one virtual object. The at least one real area includes a second area of the background image. For each of the at least one virtual region, a corresponding first quantization parameter may be determined based on an initial quantization parameter associated with the virtual region. For each of the at least one real region, a corresponding second quantization parameter may be determined based on an initial quantization parameter associated with the real region. Each of the at least one virtual region may be encoded based on the corresponding first quantization parameter, and each of the at least one real region may be encoded based on the corresponding second quantization parameter. Various examples of electronic systems and techniques for using such systems in connection with encoding an augmented reality video stream are described. A physical environment refers to a phys