EP-4738347-A2 - METHODS AND DEVICES FOR CODING OR DECODING OF SCENE-BASED IMMERSIVE AUDIO CONTENT

EP4738347A2EP 4738347 A2EP4738347 A2EP 4738347A2EP-4738347-A2

Abstract

The present document describes a method (500) for encoding an Ambisonics input audio signal (101). The method (500) comprises providing (501) the input audio signal (101) to a SPAR encoder (110, 130) and to a DirAC analyzer and parameter encoder (120). Furthermore, the method (500) comprises generating (502) an encoder bit stream (106) based on output (102, 105) of the SPAR encoder (110, 130) and based on output (104) of the DirAC analyzer and parameter encoder (120).

Inventors

BRUHN, STEFAN

Assignees

Dolby International AB

Dates

Publication Date: 20260506
Application Date: 20221130

Claims (15)

A method (510) for decoding an encoder bit stream (106) which is indicative of an Ambisonics input audio signal (101); the method (510) comprising, - generating (511) an intermediate Ambisonics signal (201) using a SPAR decoder (210, 230) based on the encoder bit stream (106); and - processing (512) the intermediate Ambisonics signal (201) using a DirAC synthesizer (220) to provide an output audio signal (211) for rendering.
The method (510) of claim 1, wherein the method (510) comprises, - extracting a SPAR metadata bit stream (102) and an audio bit stream (105) from the encoder bit stream (106); and - generating the intermediate Ambisonics signal (201) from the SPAR metadata bit stream (102) and the audio bit stream (105) using the SPAR decoder (210, 230).
The method (510) of claim 2, wherein the method (510) comprises, - generating a set of reconstructed downmix channel signals (205) from the audio bit stream (105) using an audio decoder (230); and - upmixing the set of reconstructed downmix channel signals (205) to the intermediate Ambisonics signal (201) based on the SPAR metadata bit stream (102) using an upmix unit (210).
The method (510) of any one of claims 1-3, wherein the method (510) comprises, - extracting a DirAC metadata bit stream (104) from the encoder bit stream (106); and - processing (512) the intermediate Ambisonics signal (201) in dependance of the DirAC metadata bit stream (104) using the DirAC synthesizer (220) to provide the output audio signal (211).
The method (510) of any one of claims 1-4, wherein the method (510) comprises, - processing the intermediate Ambisonics signal (201) within a DirAC analyzer (250) to generate auxiliary DirAC metadata (204); and - processing (512) the intermediate Ambisonics signal (201) in dependance of the auxiliary DirAC metadata (204) using the DirAC synthesizer (220) to provide the output audio signal (211).
The method (510) of claim 5, wherein the method (510) comprises - generating subband data within a plurality of frequency bands and/or a plurality of time/frequency tiles, which represents the intermediate Ambisonics signal (201); - selecting a subset of the plurality of frequency bands and/or the plurality of time/frequency tiles; and - determining, based on the subband data, the auxiliary DirAC metadata (204) for the selected subset of frequency bands and/or time/frequency tiles, in particular for the selected subset of frequency bands and/or time/frequency tiles only.
The method (510) of claim 6, wherein the method (510) comprises, - determining property information regarding a property of the input audio signal (101) and/or of the intermediate Ambisonics signal (201), in particular a property with regards to a noise like or a tonal character of the input audio signal (101) and/or of the intermediate Ambisonics signal (201); and - selecting the subset of frequency bands and/or time/frequency tiles based on the property information.
The method (510) of claim 6 or 7, wherein the subset of frequency bands and/or time/frequency tiles corresponds to a frequency range of frequencies at or below a pre-determined threshold frequency.
The method (510) of any one of claims 1-8, wherein the method (510) comprises generating an Ambisonics output signal (211) from the intermediate Ambisonics signal (201) using the DirAC synthesizer (220) having an Ambisonics order which is greater than an Ambisonics order of the input audio signal (101) and/or of the intermediate Ambisonics signal (201).
The method (510) of any one of claims 1-9, wherein the output signal (211) comprises at least one of an Ambisonics output signal, a binaural output signal, a stereo or a multi-loudspeaker output signal.
The method (510) of any one of claims 1-10, wherein - the intermediate Ambisonics signal (201) comprises less channels than the Ambisonics input audio signal (101); and/or - the SPAR decoder (210, 230) is used to perform a partial upmixing operation to generate an intermediate Ambisonics signal (201) which comprises less channels than the Ambisonics input audio signal (101).
The method (510) of claim 11, wherein - the partial upmixing operation is performed in a filter bank domain with a plurality of subbands and/or a plurality of time / frequency tiles; and - the intermediate Ambisonics signal (201) comprises less channels than the Ambisonics input audio signal (101) for all of the plurality of subbands and/or for all of the plurality of time / frequency tiles; or - the intermediate Ambisonics signal (201) comprises less channels than the Ambisonics input audio signal (101) for only a subset of the plurality of subbands and/or the plurality of time / frequency tiles.
The method (510) of any one of claims 1-11, wherein the method (510) comprises, - extracting an audio bit stream (105) from the encoder bit stream (106); - generating a set of reconstructed downmix channel signals (205) from the audio bit stream (105) using an audio decoder (230); - applying an analysis filter bank to the set of reconstructed downmix channel signals (205) to transform the set of reconstructed downmix channel signals (205) into a filter bank domain; - generating (511) an intermediate Ambisonics signal (201) which is represented in the filter bank domain, based on the set of reconstructed downmix channel signals (205) in the filter bank domain; and - processing (512) the intermediate Ambisonics signal (201) which is represented in the filter bank domain using the DirAC synthesizer (220).
A non-transitory computer-readable medium storing instructions that, upon execution by one or more processors, cause the one or more processors to perform operations of any one of claims 1 to 13.
A decoding device (200) for decoding an encoder bit stream (106) which is indicative of an Ambisonics input audio signal (101); wherein the decoding device (200) is configured to - generate an intermediate Ambisonics signal (201) using a SPAR decoder (210, 230) based on the encoder bit stream (106); and - process the intermediate Ambisonics signal (201) using a DirAC synthesizer (220) to provide an output audio signal (211) for rendering.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority from U.S. Provisional Application No. 63/284,198, filed 30 November 2021, and U.S. Provisional Application No. 63/410,587, filed 27 September 2022. This application is a European divisional application of Euro-PCT patent application EP 22822564.5 (reference: D21140EP01), filed 30 November 2022. TECHNICAL FIELD The present document relates to methods and corresponding devices for processing audio, in particular for coding immersive audio content. BACKGROUND The sound or soundfield within the listening environment of a listener that is placed at a listening position may be described using an Ambisonics audio signal, in particular a first order Ambisonics signal (FOA) or a higher order Ambisonics signal (HOA). The Ambisonics signal may be viewed as a multi-channel audio signal, with each channel corresponding to a particular directivity pattern of the soundfield at the listening position of the listener. An Ambisonics signal may be described using a three-dimensional (3D) cartesian coordinate system, with the origin of the coordinate system corresponding to the listening position, the x-axis pointing to the front, the y-axis pointing to the left and the z-axis pointing up. The present document addresses the technical problem of enabling a particularly efficient and flexible coding of a Ambisonics audio signals. The technical problem is solved by each one of the independent claims. Preferred examples are described in the dependent claims. SUMMARY According to an aspect, a method for encoding an Ambisonics input audio signal is described. The method comprises providing the input audio signal to a spatial reconstruction (SPAR) encoder and to a directional audio coding (DirAC) analyzer and parameter encoder. Furthermore, the method comprises generating an encoder bit stream based on an output of the SPAR encoder and based on an output of the DirAC analyzer and parameter encoder. According to another aspect, a method for decoding an encoder bit stream which is indicative of an Ambisonics input audio signal is described. The method comprises generating an intermediate Ambisonics signal using a spatial reconstruction (SPAR) decoder based on the encoder bit stream. Furthermore, the method comprises processing the intermediate Ambisonics signal using a directional audio coding (DirAC) synthesizer to provide an output audio signal for rendering. It should be noted that the methods described herein can each be implemented in software and/or computer readable code on one or more processors, in whole or in part of the respective methods. According to a further aspect, a software program is described. The software program may be adapted for execution on a processor and for performing the method steps outlined in the present document when carried out on the processor. According to another aspect, a storage medium is described. The storage medium may comprise a software program adapted for execution on a processor and for performing the method steps outlined in the present document when carried out on the processor. According to a further aspect, a computer program product is described. The computer program may comprise executable instructions for performing the method steps outlined in the present document when executed on a computer. According to another aspect, a system comprising one or more processors is described. The system further comprises a non-transitory computer-readable medium storing instructions that, upon execution by the one or more processors, cause the one or more processors to perform operations of one or more of the methods described herein. According to a further aspect, a non-transitory computer-readable medium is described, which stores instructions that, upon execution by one or more processors, cause the one or more processors to perform operations of one or more of the methods described herein. According to another aspect, an encoding device for encoding an Ambisonics input audio signal is described. The encoding device is configured to provide the input audio signal to a spatial reconstruction (SPAR) encoder and to a directional audio coding (DirAC) analyzer and parameter encoder. The encoding device is further configured to generate an encoder bit stream based on output of the SPAR encoder and based on output of the DirAC analyzer and parameter encoder. According to a further aspect, a decoding device for decoding an encoder bit stream which is indicative of an Ambisonics input audio signal is described. The decoding device is configured to generate an intermediate Ambisonics signal using a spatial reconstruction (SPAR) decoder based on the encoder bit stream. Furthermore, the decoding device is configured to process the intermediate Ambisonics signal using a directional audio coding (DirAC) synthesizer to provide an output audio signal for rendering. It should be noted that the methods and systems including its preferred emb