KR-102961719-B1 - METHODS, APPARATUS AND SYSTEMS FOR A PRE-RENDERED SIGNAL FOR AUDIO RENDERING

KR102961719B1KR 102961719 B1KR102961719 B1KR 102961719B1KR-102961719-B1

Abstract

The present disclosure relates to a method for decoding audio scene content from a bitstream by a decoder comprising an audio renderer having one or more rendering tools. The method comprises the steps of receiving a bitstream, decoding a description of an audio scene from the bitstream, determining one or more effective audio elements from the description of the audio scene, determining effective audio element information indicating the location of the effective audio elements from the description of the audio scene, decoding a rendering mode indication from the bitstream, wherein the rendering mode indication indicates whether the one or more effective audio elements represent a sound field obtained from a pre-rendered audio element and whether they should be rendered using a predetermined rendering mode, and, in response to the rendering mode indication indicating that the one or more effective audio elements represent a sound field obtained from a pre-rendered audio element and should be rendered using a predetermined rendering mode, the step of rendering the one or more effective audio elements using a predetermined rendering mode, wherein the step of rendering the one or more effective audio elements using a predetermined rendering mode considers the effective audio element information, and the predetermined rendering mode defines a predetermined configuration of a rendering tool for controlling the influence of the acoustic environment of the audio scene on the rendering output. The disclosure also relates to a method for generating audio scene content and a method for encoding audio scene content into a bitstream.

Inventors

테렌티브, 레온
페르쉬, 크리스토프
피셔, 다니엘

Assignees

돌비 인터네셔널 에이비

Dates

Publication Date: 20260508
Application Date: 20190408
Priority Date: 20180411

Claims (7)

As a method for decoding audio scene content by a decoder, the method comprises: A step of receiving a bitstream comprising an effective audio element of an audio scene, effective audio element information, and listener location area information by the above decoder - wherein the effective audio element information indicates the effective audio element location of the effective audio element, and the listener location area information indicates a listener location area in an acoustic environment -; and A method comprising the step of rendering the effective audio element based on the effective audio element information and the listener location zone information.
In paragraph 1, Step of receiving the rendering mode; A step of determining, based on the above rendering mode, that the effective audio element represents a sound field obtained from a pre-rendered audio element; A step of determining that the above effective audio element should be rendered using a predetermined rendering mode; and A method further comprising the step of rendering the effective audio element using the predetermined rendering mode within the above-mentioned listener location zone.
In paragraph 2, the step of rendering the effective audio element using the predetermined rendering mode is, A method for applying sound attenuation modeling according to the respective distance between the listener position and the effective audio element position of the effective audio element.
In paragraph 2, The above-determined rendering mode depends on the above-determined listener location zone, method.
A method according to claim 1, wherein the acoustic environment is a virtual reality/augmented reality/mixed reality (VR/AR/MR) acoustic environment.
A non-transient computer-readable storage medium comprising the above instructions that cause a processor to execute instructions to perform the method according to claim 1.
As a device for audio decoding, the device is: An audio decoder configured to receive a bitstream comprising an effective audio element of an audio scene, effective audio element information, and listener location zone information, wherein the effective audio element information indicates the effective audio element location of the effective audio element, and the listener location zone information indicates a listener location zone in an acoustic environment; and A device comprising a renderer configured to render the effective audio element based on the effective audio element information and the listener location zone information.

Description

Methods, apparatus and systems for a pre-rendered signal for audio rendering Cross-reference regarding related applications This application claims priority to the following priority applications: U.S. Provisional Application No. 62/656,163 filed April 11, 2018 (Ref. D18040USP1) and U.S. Provisional Application No. 62/755,957 filed November 5, 2018 (Ref. D18040USP2), which are incorporated herein by reference. Technology field The present disclosure relates to providing an apparatus, system, and method for audio rendering. FIG. 1 illustrates an exemplary encoder configured to process metadata and audio renderer extensions. In some cases, 6DoF renderers cannot reproduce the content creator's desired soundfield in certain location(s) (regions, paths) of virtual reality/augmented reality/mixed reality (VR/AR/MR) space, which is: 1. Insufficient metadata describing sound sources and VR/AR/MR environments; and 2. This is due to the limited capabilities of the 6DoF renderer and resources. Certain 6DoF renderers (which generate a sound field based solely on the original audio source signal and the depiction of the VR/AR/MR environment) may fail to reproduce the intended signal at the desired location(s) for the following reasons: 1.1) Bitrate limit for parameterized information (metadata) describing VR/AR/MR environments and corresponding audio signals; 1.2) Unavailability of data for inverse 6DoF rendering (e.g., reference recordings of one or several points of interest are available, but it is unknown how the 6DoF renderer recreates this signal and what data input is required for this); 2.1) artistic intent that may differ from the default output of the 6DoF renderer (e.g., consistent with physical laws) (e.g., similar to the concept of “artistic downmix”); and 2.2) Capability limitations on the decoder (6DoF renderer) implementation (e.g., constraints on bitrate, complexity, latency, etc.). At the same time, high audio quality (and/or fidelity to a predefined reference signal) audio reproduction (i.e., 6DoF renderer output) for a given location(s) in VR/AR/MR space may be required. For example, this may be required for 3DoF/3DoF+ compatibility constraints or compatibility requirements regarding different processing modes of the 6DoF render (e.g., between a “low power” mode that does not account for VR/AR/MR geometry effects and a “baseline” mode). Therefore, there is a need for encoding/decoding methods and corresponding encoders/decoders that improve the reproduction of the sound field desired by the content creator in VR/AR/MR spaces. An aspect of disclosure relates to a method for decoding audio scene content from a bitstream by a decoder comprising an audio renderer having one or more rendering tools. The method may include the step of receiving a bitstream. The method may further include the step of decoding a description of the audio scene from the bitstream. The audio scene may include an acoustic environment, such as, for example, a VR/AR/MR acoustic environment. The method may further include the step of determining one or more effective audio elements from the description of the audio scene. The method may further include the step of determining effective audio element information indicating the location of one or more effective audio elements from the description of the audio scene. The method may further include the step of decoding a rendering mode indication from the bitstream. The rendering mode indication may indicate whether one or more effective audio elements represent a sound field obtained from a pre-rendered audio element and whether they should be rendered using a predetermined rendering mode. The method may further include the step of rendering one or more effective audio elements using a predetermined rendering mode in response to a rendering mode indication that one or more effective audio elements represent a sound field obtained from a pre-rendered audio element and should be rendered using a predetermined rendering mode. The step of rendering one or more effective audio elements using a predetermined rendering mode may consider effective audio element information. The predetermined rendering mode may define a predetermined configuration of a rendering tool to control the influence of the acoustic environment of the audio scene on the rendering output. The effective audio elements may be rendered, for example, relative to a reference position. The predetermined rendering mode may enable or disable a specific rendering tool. Additionally, the predetermined rendering mode may enhance the acoustics of one or more effective audio elements (e.g., add artificial sounds). One or more effective audio elements encapsulate, so to speak, the effects of the audio environment, such as echo, reverberation, and acoustic occlusion. This enables the use of a simple rendering mode (i.e., a predetermined rendering mode) in the decoder. At the same time, artistic intent can be protected, and the user (l