US-20260127839-A1 - METHOD AND APPARATUS FOR RENDERING HYBRID MEDIA BY COMBINING HETEROGENEOUS MEDIA FOR SPATIAL VIDEO REPRODUCTION SERVICE

US20260127839A1US 20260127839 A1US20260127839 A1US 20260127839A1US-20260127839-A1

Abstract

A method and apparatus for rendering hybrid media by combining heterogeneous media for spatial video reproduction service. An aspect of the present disclosure provides a apparatus for rendering hybrid media by combining heterogeneous media for spatial video reproduction service, the apparatus including a bitstream receiver configured to receive a bitstream containing multiple target attribute data items from a communication network, a target scene acquisition unit configured to obtain multiple target scene data items from the multiple target attribute data items, a selection information acquisition unit configured to obtain a user’s selection information from a user, an input data selection unit configured to select one or more target scene data items based on the user’s selection information from among the multiple target scene data items, and a scene reproduction unit configured to reproduce the one or more target scene data items.

Inventors

Hong Chang SHIN
Sang Woon Kwak
Gwang Soon Lee

Assignees

ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Dates

Publication Date: 20260507
Application Date: 20251107
Priority Date: 20241107

Claims (13)

1 . An apparatus for hybrid reproduction, the apparatus comprising: at least one memory storing commands; and at least one processor, wherein, by executing the commands, the at least one processor is to: a bitstream receiver configured to receive a bitstream containing multiple target attribute data items from a communication network; a target scene acquisition unit configured to obtain multiple target scene data items from the multiple target attribute data items; a selection information acquisition unit configured to obtain a user’s selection information from a user; an input data selection unit configured to select one or more target scene data items based on the user’s selection information from among the multiple target scene data items; and a scene reproduction unit configured to reproduce the one or more target scene data items.
2 . The apparatus of claim 1 , further comprising: a spatial division unit configured to divide, based on a predetermined criterion, a global reproduction space for reproducing the multiple target scene data items into a plurality of local unit spaces, wherein the multiple target scene data items correspond to the plurality of local unit spaces, respectively.
3 . The apparatus of claim 2 , wherein the input data selection unit is configured to select one or more local unit spaces based on the user's selection information from among the multiple target scene data items corresponding respectively to the plurality of local unit spaces, and to select the one or more target scene data items corresponding respectively to the one or more local unit spaces.
4 . The apparatus of claim 1 , wherein the selection information acquisition unit is configured to obtain a gaze of the user as the user’s selection information, and wherein the input data selection unit is configured to select the one or more target scene data items based on the gaze.
5 . The apparatus of claim 1 , wherein the multiple target scene data items comprise: multiple target scene data items that correspond respectively to multiple time axes, and wherein the input data selection unit is configured to select the one or more target scene data items based on the user’s selection information from among the multiple target scene data items corresponding respectively to the multiple time axes.
6 . The apparatus of claim 1 , wherein the multiple target scene data items comprise: multiple target scene data items that correspond respectively to multiple scales, and wherein the input data selection unit is configured to select the one or more target scene data items based on the user’s selection information from among the multiple target scene data items corresponding respectively to the multiple scales.
7 . The apparatus of claim 1 , wherein the multiple target attribute data items include respective encoded data items, and wherein the target scene acquisition unit is configured to decode the respective encoded data items to generate multiple decoded data items, and to learn the multiple decoded data items as the multiple target scene data items.
8 . A method of performing a hybrid reproduction, the method comprising: receiving a bitstream containing multiple target attribute data items from a communication network; obtaining multiple target scene data items from the multiple target attribute data items; obtaining a user’s selection information from a user; performing an input data selection by selecting one or more target scene data items based on the user’s selection information from among the multiple target scene data items; and performing a scene reproduction by reproducing the one or more target scene data items.
9 . The method of claim 8 , further comprising: performing a spatial division by dividing, based on a predetermined criterion, a global reproduction space for reproducing the multiple target scene data items into a plurality of local unit spaces, wherein the multiple target scene data items correspond to the plurality of local unit spaces, respectively.
10 . The method of claim 9 , wherein the performing of the input data selection comprises: selecting one or more local unit spaces based on the user's selection information from among the multiple target scene data items corresponding respectively to the plurality of local unit spaces; and selecting the one or more target scene data items corresponding respectively to the one or more local unit spaces.
11 . The method of claim 8 , wherein the obtaining of the selection information comprises: obtaining a gaze of the user as the user’s selection information; and wherein the selecting input data comprises selecting the one or more target scene data items based on the gaze.
12 . The method of claim 8 , wherein the multiple target scene data items comprise: multiple target scene data items that correspond respectively to multiple time axes, and wherein the performing of the input data selection comprises selecting the one or more target scene data items based on the user’s selection information from among the multiple target scene data items corresponding respectively to the multiple time axes.
13 . The method of claim 8 , wherein the multiple target scene data items comprise: multiple target scene data items that correspond respectively to multiple scales, and wherein the performing of the input data selection comprises selecting the one or more target scene data items based on the user’s selection information from among the multiple target scene data items corresponding respectively to the multiple scales.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present application claims priority to Korean Patent Application No. 10-2024-0157043, filed on November 7, 2024, and Korean Patent Application No. 10-2025-0111251, filed on August 12, 2025, the disclosures of which are incorporated by reference herein in their entireties. TECHNICAL FIELD The present disclosure relates to a method and apparatus for rendering hybrid media by combining heterogeneous media for spatial video reproduction service. BACKGROUND The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. Immersive media is a media service that enhances user immersion on various large displays, such as diverse virtual reality (VR) devices like head-mounted displays (HMDs) or a single television or a multiple TV setup. From the perspective of immersive video, providing full six Degrees of Freedom (6 DoF) for the user's unrestricted motions is considered a fundamental requirement for delivering complete immersion, and related technologies are being developed. The Moving Picture Expert Group (MPEG), under the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC), is standardizing MPEG Immersive Video (MIV) technology—a multi-viewpoint immersive content compression method for providing 6DoF-supported immersive media services—,Visual Volumetric Video Coding (V3C), a method of efficiently storing and transmitting compressed data, and Gaussian Splat Coding(GSC), a method of compression for compressing and representing 3D Gaussian primitives for real-time rendering of immersive scenes. V3C is designed to enable the storage and transmission of not only content compressed by using MIV, but also content compressed by using other standard technologies such as Video-based Point Cloud Compression (V-PCC) for high-density point cloud objects. By using the V3C standard technology and compression standard technologies like MIV, V-PCC, or GSC, corresponding media services can be provided. In such cases, heterogeneous media services may be simultaneously delivered to real or virtual spatial media. Here, ‘heterogeneous’ may refer to cases where the underlying technologies for providing immersive media services differ, or where the same underlying technology is provided via different media or in different forms depending on the media service scenario. When such heterogeneous immersive media services are provided simultaneously to the real or virtual spatial media and interact with each other, efficient and seamless media service delivery needs to be ensured, taking these interactions need to be taken into account. SUMMARY According to at least one aspect, the present disclosure provides an apparatus for hybrid reproduction, the apparatus including a bitstream receiver, a target scene acquisition unit, a selection information acquisition unit, an input data selection unit, and a scene reproduction unit. The bitstream receiver is configured to receive a bitstream containing multiple target attribute data items from a communication network. The target scene acquisition unit is configured to obtain multiple target scene data items from the multiple target attribute data items. The selection information acquisition unit is configured to obtain a user’s selection information from a user. The input data selection unit is configured to select one or more target scene data items based on the user’s selection information from among the multiple target scene data items. The scene reproduction unit is configured to reproduce the one or more target scene data items. According to another aspect, the present disclosure provides a method of performing a hybrid reproduction, including receiving a bitstream containing multiple target attribute data items from a communication network, obtaining multiple target scene data items from the multiple target attribute data items, obtaining a user’s selection information from a user, performing an input data selection by selecting one or more target scene data items based on the user’s selection information from among the multiple target scene data items, and performing a scene reproduction by reproducing the one or more target scene data items. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a functional block diagram illustrating a hybrid reproduction apparatus according to at least one embodiment of the present disclosure. FIG. 2 is a diagram illustrating a user experiencing a metaverse or immersive virtual reality (VR) service. FIG. 3 is a diagram illustrating the global reproduction space divided into a plurality of local unit spaces. FIG. 4 is a flowchart of a hybrid reproduction method according to at least one embodiment of the present disclosure. FIG. 5 is a schematic block diagram of an illustrative configuration of a computing device that may be used to implement the methods or apparatuses according to the present disc