JP-2026514474-A - Companion device-assisted multi-view video coding

JP2026514474AJP 2026514474 AJP2026514474 AJP 2026514474AJP-2026514474-A

Abstract

The device is configured to acquire a first set of multiview pictures, the first set of multiview pictures comprising a first picture and a second picture, the first picture being from a first viewpoint and the second picture being from a second viewpoint; to transmit first encoded video data, the first encoded video data based on the first set of multiview pictures, to a receiving device; to receive a multiview encoding queue from the receiving device; to acquire a second set of multiview pictures of video data, the second set of multiview pictures comprising a third picture and a fourth picture, the third picture being from a first viewpoint and the fourth picture being from a second viewpoint; to perform a multiview encoding process on the second set of multiview pictures to generate second encoded video data based on the multiview encoding queue; and to transmit second encoded video data to a receiving device.

Inventors

イェホナタン・ダラル
ギデオン・シュロモ・カッツ
シェイ・ランディス
イダン・マイケル・ホーン

Assignees

クアルコム，インコーポレイテッド

Dates

Publication Date: 20260511
Application Date: 20240327
Priority Date: 20230424

Claims (20)

It is a device, A memory configured to store video data, The circuit comprises one or more processors implemented in the circuit and coupled to the memory, wherein the one or more processors Obtain a first set of multiview pictures of the video data, wherein the first set of multiview pictures includes a first picture and a second picture, the first picture being from a first viewpoint and the second picture being from a second viewpoint. A first encoded video data, wherein the first encoded video data is based on the first set of multiview pictures and is transmitted to a receiving device. The receiving device receives a multiview coding queue, A second set of multiview pictures of the video data is obtained, wherein the second set of multiview pictures includes a third picture and a fourth picture, the third picture being from the first viewpoint and the fourth picture being from the second viewpoint. Based on the multiview coding queue received from the receiving device, a multiview coding process is performed on the second set of multiview pictures to generate second coded video data, wherein the multiview coding process reduces interview redundancy between the third picture and the fourth picture. A device configured to transmit the second encoded video data to the receiving device.
The aforementioned one or more processors After transmitting the second encoded video data to the receiving device, the system receives an updated multiview encoding queue from the receiving device. A third set of multiview pictures of the video data is obtained, wherein the third set of multiview pictures includes a fifth picture and a sixth picture, the fifth picture is from the first viewpoint and the sixth picture is from the second viewpoint. To generate third encoded video data, the set of third multiview pictures is encoded based on the updated multiview encoding queue received from the receiving device, The device according to claim 1, further configured to transmit the third encoded video data to the receiving device.
The multiview coding queue, The relative shift between the first picture block and the second picture block, Brightness correction between the first picture and the second picture, Inter-block shift between anchor blocks and reconstruction blocks, or The device according to claim 1, comprising one or more motion data for reference shift.
The aforementioned device is an Extended Reality (XR) headset, The aforementioned one or more processors The receiving device receives virtual element data generated based on the first set of multiview pictures and the second set of multiview pictures. The device according to claim 1, configured to output the virtual element data for display in an XR scene.
It is a device, A memory configured to store video data, The circuit comprises one or more processors implemented in the circuit and coupled to the memory, wherein the one or more processors Obtaining first encoded video data from a transmitting device, wherein the first encoded video data is based on a first set of multiview pictures of the video data, the first set of multiview pictures includes a first picture and a second picture, where the first picture is from a first viewpoint and the second picture is from a second viewpoint. Based on the first encoded video data, a multiview encoding queue is determined. The multiview coding queue is transmitted to the transmitting device. A device configured to receive second encoded video data from the transmitting device, wherein the second encoded video data is encoded based on a second set of multiview pictures including a third picture and a fourth picture, and the second encoded video data is encoded using a multiview encoding process that reduces interview redundancy between the third picture and the fourth picture based on the multiview encoding queue.
The device according to claim 5, wherein one or more processors are further configured to decode the second encoded video data.
The multi-view coding queue is a first multi-view coding queue, and the one or more processors are A second multiview coding queue is determined based on the second encoded video data. The second multiview coding queue is transmitted to the transmitting device. The device according to claim 5, further configured to obtain third encoded video data from the transmitting device, wherein the third encoded video data is encoded based on a set of third multiview pictures including a fifth picture and a sixth picture, and the third encoded video data is encoded using a multiview encoding process that reduces interview redundancy between the fifth picture and the sixth picture based on the second multiview encoding queue.
The multiview coding queue includes a depth map indicating the depth of objects represented in the first picture and the second picture. The device according to claim 5, wherein one or more processors are configured to determine the depth map based on the first picture and the second picture as part of determining the multiview coding queue.
The multiview coding queue includes one or more illumination compensation coefficients, The device according to claim 5, wherein one or more processors are configured to determine the illumination compensation coefficient based on the first picture and the second picture as part of determining the multiview coding queue.
The aforementioned transmitting device is an Extended Reality (XR) headset, One or more processors, The second set of pictures is processed in order to generate virtual element data. The device according to claim 5, further configured to transmit the virtual element data to the XR headset.
A method for processing video data, Obtaining a first set of multiview pictures of the video data, wherein the first set of multiview pictures includes a first picture and a second picture, the first picture being from a first viewpoint and the second picture being from a second viewpoint. A first encoded video data, wherein the first encoded video data is based on the first set of multiview pictures and is transmitted to a receiving device. Receiving a multiview coded queue from the aforementioned receiving device, Obtaining a second set of multiview pictures of the video data, wherein the second set of multiview pictures includes a third picture and a fourth picture, the third picture being from the first viewpoint and the fourth picture being from the second viewpoint. Based on the multiview coding queue received from the receiving device, a multiview coding process is performed on the second set of multiview pictures to generate second coded video data, wherein the multiview coding process reduces interview redundancy between the third picture and the fourth picture. A method comprising transmitting the second encoded video data to the receiving device.
After transmitting the second encoded video data to the receiving device, the system receives an updated multiview encoding queue from the receiving device. To obtain a third set of multiview pictures of the video data, wherein the third set of multiview pictures includes a fifth picture and a sixth picture, the fifth picture being from the first viewpoint and the sixth picture being from the second viewpoint. To generate third encoded video data, the third set of multiview pictures is encoded based on the updated multiview encoding queue received from the receiving device, The method according to claim 11, further comprising transmitting the third encoded video data to the receiving device.
The multiview coding queue, The relative shift between the first picture block and the second picture block, Brightness correction between the first picture and the second picture, Inter-block shift between anchor blocks and reconstruction blocks, or The method according to claim 11, comprising one or more motion data for reference shift.
The method described above is The receiving device receives virtual element data generated based on the first set of multiview pictures and the second set of multiview pictures, The method according to claim 11, further comprising outputting the virtual element data for display in an Extended Reality (XR) scene.
A method for processing video data, To obtain first encoded video data from a transmitting device, wherein the first encoded video data is based on a first set of multiview pictures of the video data, and the first set of multiview pictures includes a first picture and a second picture, where the first picture is from a first viewpoint and the second picture is from a second viewpoint. Determining a multiview coding queue based on the first encoded video data, The multiview coding queue is transmitted to the transmitting device, A method comprising: obtaining second encoded video data from the transmitting device, wherein the second encoded video data is encoded using a multiview encoding process that reduces interview redundancy between the third picture and the fourth picture based on a set of second multiview pictures including a third picture and a fourth picture.
The method according to claim 15, further comprising decoding a second encoded video data.
The multiview coding queue is a first multiview coding queue, and the method is Determining a second multiview coding queue based on the second encoded video data, Transmitting the second multiview coding queue to the transmitting device, The method according to claim 15, further comprising obtaining third encoded video data from the transmitting device, wherein the third encoded video data is encoded based on a set of third multiview pictures including a fifth picture and a sixth picture, and the third encoded video data is encoded using a multiview encoding process that reduces interview redundancy between the fifth picture and the sixth picture based on the second multiview encoding queue.
The multiview coding queue includes a depth map indicating the depth of objects represented in the first picture and the second picture. The method according to claim 15, wherein determining the multiview coding queue includes determining the depth map based on the first picture and the second picture.
The multiview coding queue includes one or more illumination compensation coefficients, The method according to claim 15, wherein determining the multiview coding queue includes determining the illumination compensation coefficient based on the first picture and the second picture.
The aforementioned transmitting device is an Extended Reality (XR) headset, The method described above is Processing the second set of pictures in order to generate virtual element data, The method according to claim 15, further comprising transmitting the virtual element data to the XR headset.

Description

This application claims priority to U.S. Patent Application No. 18/306,136, filed on 24 April 2023, which is incorporated herein by reference in its entirety. This disclosure relates to video coding and video decoding. The popularity of virtual reality (VR), augmented reality (AR), and mixed reality (MR) technologies is rapidly increasing and is expected to be widely adopted in non-gaming applications such as healthcare, education, social media, retail, and many others. VR, AR, and MR are sometimes collectively referred to as Extended Reality (XR). Due to this growing popularity, there is increasing demand for XR devices, such as XR goggles, which feature high-quality 3D graphics, higher video resolution, and low latency response. This disclosure describes techniques for processing video data in a transmitting device and a receiving device. The transmitting device may be an XR device or other type of device. The receiving device may be a user-defined equipment (UE) device, such as a smartphone or tablet. The transmitting device may perform a limited video coding process to generate encoded video data. The transmitting device may apply channel coding to the encoded video data to generate error correction data. The transmitting device may transmit the error correction data and at least a portion of the encoded video data to the receiving device. The receiving device may estimate the video data based on one or more previously reconstructed pictures. The receiving device may then encode the estimated video data. The receiving device may use one or more coding tools to encode the estimated video data that was not used by the transmitting device when performing the limited video coding process on the video data. The receiving device may use the error correction data and the estimated video data to reconstruct the portion of the encoded video data that was not sent by the transmitting device. This process may avoid the need to send the portion of the encoded video data. In one example, this disclosure describes a method for decoding video data, the method comprising: receiving error correction data from a transmitting device, wherein the error correction data provides error correction information and is generated based on encoded video data of one or more blocks of the picture in the video data; generating prediction data for the picture using one or more coding tools not used to generate encoded video data of one or more blocks, wherein the prediction data for the picture includes predictions of blocks of the picture based at least partially on blocks of the picture previously reconstructed in one or more of the video data; generating encoded video data based on the prediction data for the picture in the receiving device; generating error-corrected encoded video data using the error correction data in order to perform error correction processing on the encoded video data in the receiving device; and performing a reconstruction operation in the receiving device to reconstruct blocks of the image based on the error-corrected encoded video data, wherein the reconstruction operation is controlled by the values of one or more parameters. In another example, the Disclosure describes a method for encoding video data, the method comprising: obtaining video data from a video source in a transmitting device; generating encoded video data of a first picture and encoded video data of a second picture of the video data in the transmitting device based on a set of parameters; performing channel coding on the encoded video data of the first picture and the encoded video data of the second picture in order to generate error correction data for the first picture and error correction data for the second picture in the transmitting device; and transmitting the encoded video data of the first picture, the error correction data for the first picture, and the error correction data for the second picture in the transmitting device. In another example, the Disclosure describes a method for encoding video data, the method comprising: obtaining video data from a video source in a transmitting device; generating transformation blocks based on the video data in the transmitting device; determining which of the transformation blocks are anchor transformation blocks in the transmitting device; calculating a correlation matrix for the set of transformation blocks in the transmitting device; generating a bit-reduced non-anchor transformation matrix in the transmitting device; and transmitting the anchor transformation blocks, non-anchor transformation blocks, and correlation matrix to a receiving device in the transmitting device. In another example, the present disclosure describes a device comprising a memory configured to store video data, a communication interface, and one or more processors implemented in a circuit and coupled to the memory, wherein one or more processors are configured to perform the method according to any one of claims