EP-4407436-B1 - BITSTREAM REPRESENTING AUDIO IN AN ENVIRONMENT

EP4407436B1EP 4407436 B1EP4407436 B1EP 4407436B1EP-4407436-B1

Inventors

KOPPENS, JEROEN GERARDUS HENRICUS

Dates

Publication Date: 20260513
Application Date: 20221020

Claims (12)

An apparatus for generating a bitstream, the apparatus comprising: a metadata generator (203) arranged to generate metadata for audio data for a plurality of audio elements representing audio sources in an environment, the metadata comprising acoustic environment data for the environment, the acoustic environment data describing properties affecting sound propagation for the audio sources in the environment, at least some of the acoustic environment data being applicable to a plurality of listening poses in the environment and the properties including both static and dynamic properties; and a bitstream generator (205) arranged to generate the bitstream to include the metadata; characterized in that the acoustic environment data comprises a first data field for first bits representing a value of a first property of the properties affecting sound propagation and a second data field for an indication of whether the acoustic environment data comprises an extension data field for second bits representing the value of a first property; and the indication is an indication that the extension field comprises bits for extending the range of a provided data value and an indication that the extension field comprises bits for increasing the resolution of the provided data value, and the second bits extend a range of possible values for the first property and the second bits increase a resolution of possible values for the first property.
The apparatus of claim 1 wherein the acoustic environment data comprises a data group describing a data format for at least part of a representation of property values for at least one property of the properties affecting sound propagation and a plurality of data groups each comprising data describing at least one property value using the representation.
The apparatus of any previous claim wherein the acoustic environment data comprises a data group describing a frequency grid and a plurality of data groups each comprising data describing a frequency dependent property of the properties using the frequency grid, and wherein the bitstream comprises an indicator for indicating whether the bitstream comprises the data group describing the frequency grid, and the data group comprises an indication of a format for data describing the frequency grid, the data group comprising at least one of: data providing an indication of a predetermined default grid; data indicating a start frequency and a frequency range for at least some subranges of the frequency grid; and data indicating individual frequencies.
The apparatus of any previous claim wherein the acoustic environment data comprises a data group describing an orientation representation format for representing orientation properties, and at least one data group comprising data describing an orientation property of the properties using the orientation representation format, the data group comprising at least one of: data providing an indication of a predetermined default orientation representation; data indicating a set of predetermined angles; and data indicating angles on a quantized grid.
The apparatus of any previous claim wherein the acoustic environment data includes an animation indication for at least a first audio element, the animation indication indicating if at least one property for the first audio element varies during a time interval; and the acoustic environment data for an animation indication that the first audio element has at least one varying property comprises data describing a variation of the at least one varying property.
The apparatus of any previous claim wherein the audio elements comprise a number of sound effect elements and the acoustic environment data comprises data linking a user controlled change to the environment with a first sound effect element of the number of sound effect elements.
The apparatus of any of the previous claims wherein the acoustic environment data is arranged in consecutive data sets, each data set comprising data for a time interval, and a first data set of the consecutive data sets comprises a first property value for at least one property of the properties affecting sound propagation and a time indication for the first property value, the time indication indicating a time within a time interval represented by the first data set.
The apparatus of any of the previous claims wherein the acoustic environment data is arranged in consecutive data sets, each data set comprising data for a time interval and the bitstream generator (205) is arranged to determine if a property value for a first property of the properties affecting sound propagation is provided for a default time within a time interval represented by a first data set; and to include the first property value in the first data set without a time indication if so and to otherwise include the first property value in the first data set with a time indication for the first property value.
The apparatus of any previous claim wherein the acoustic environment data for a first audio element comprises an indication of a first region of applicability and a second region of applicability for a first property value for a first property of the properties affecting sound propagation, the first region of applicability indicating a region for a position of the first audio element for which the first property value applies and the second region of applicability indicating a region for a listening position for which the first property value applies.
An apparatus for generating rendered audio, the apparatus comprising: a first receiver (303) arranged to receive audio data for a plurality of audio elements representing audio sources in an environment; a second receiver (305) arranged to receive a bitstream comprising metadata for the audio data for the plurality of audio elements representing audio sources in the environment, the metadata comprising acoustic environment data for the environment, the acoustic environment data describing properties affecting sound propagation for the audio sources in the environment, at least some of the acoustic environment data being applicable to a plurality of listening poses in the environment and the properties including both static and dynamic properties; a renderer (307) arranged to generate output audio data for the environment in response to the audio data and the acoustic environment data; characterized in that the acoustic environment data comprises a first data field for first bits representing a value of a first property of the properties affecting sound propagation and a second data field for an indication of whether the acoustic environment data comprises an extension data field for second bits representing the value of a first property; and the indication is an indication that the extension field comprises bits for extending the range of a provided data value and an indication that the extension field comprises bits for increasing the resolution of the provided data value, and the second bits extend a range of possible values for the first property and the second bits increase a resolution of possible values for the first property.
A method of generating a bitstream, the method comprising: generating metadata for audio data for a plurality of audio elements representing audio sources in an environment, the metadata comprising acoustic environment data for the environment, the acoustic environment data describing properties affecting sound propagation for the audio sources in the environment, at least some of the acoustic environment data being applicable to a plurality of listening poses in the environment and the properties including both static and dynamic properties; and generating the bitstream to include the metadata; characterized in that the acoustic environment data comprises a first data field for first bits representing a value of a first property of the properties affecting sound propagation and a second data field for an indication of whether the acoustic environment data comprises an extension data field for second bits representing the value of a first property; and the indication is an indication that the extension field comprises bits for extending the range of a provided data value and an indication that the extension field comprises bits for increasing the resolution of the provided data value, and the second bits extend a range of possible values for the first property and the second bits increase a resolution of possible values for the first property.
A method of generating rendered audio, ende the method comprising: receiving audio data for a plurality of audio elements representing audio sources in an environment; receiving a bitstream comprising metadata for the audio data for the plurality of audio elements representing audio sources in the environment, the metadata comprising acoustic environment data for the environment, the acoustic environment data describing properties affecting sound propagation for the audio sources in the environment, at least some of the acoustic environment data being applicable to a plurality of listening poses in the environment and the properties including both static and dynamic properties; and generating output audio data for the environment in response to the audio data and the acoustic environment data; characterized in that the acoustic environment data comprises a first data field for first bits representing a value of a first property of the properties affecting sound propagation and a second data field for an indication of whether the acoustic environment data comprises an extension data field for second bits representing the value of a first property; and the indication is an indication that the extension field comprises bits for extending the range of a provided data value and an indication that the extension field comprises bits for increasing the resolution of the provided data value, and the second bits extend a range of possible values for the first property and the second bits increase a resolution of possible values for the first property.

Description

FIELD OF THE INVENTION The invention relates to a bitstream, as well as an apparatus for generating such a bitstream and an apparatus for processing such a bitstream, representing an audio environment, and in particular, but not exclusively, to a bitstream representing a virtual audio environment, such as for a Virtual Reality application. BACKGROUND OF THE INVENTION The variety and range of experiences based on audiovisual content have increased substantially in recent years with new services and ways of utilizing and consuming such content continuously being developed and introduced. In particular, many spatial and interactive services, applications and experiences are being developed to give users a more involved and immersive experience. Examples of such applications are Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR) applications which are rapidly becoming mainstream, with a number of solutions being aimed at the consumer market. A number of standards are also under development by a number of standardization bodies. Such standardization activities are actively developing standards for the various aspects of VR/AR/MR systems including e.g. streaming, broadcasting, rendering, etc. VR applications tend to provide user experiences corresponding to the user being in a different world/ environment/ scene whereas AR (including Mixed Reality MR) applications tend to provide user experiences corresponding to the user being in the current environment but with additional information or virtual objects or information being added. Thus, VR applications tend to provide a fully immersive synthetically generated world/ scene whereas AR applications tend to provide a partially synthetic world/ scene which is overlaid the real scene in which the user is physically present. However, the terms are often used interchangeably and have a high degree of overlap. In the following, the term Virtual Reality/ VR will be used to denote both Virtual Reality and Augmented/ Mixed Reality. Communication of audiovisual data, and specifically, audio data, describing an environment, and specifically an audio environment, such that it can provide a flexible representation allowing user end adaptation to provide e.g. a VR experience is a very challenging task. The communicated data should preferably describe the environment such that it can locally be used to render a dynamic experience that reflects changes in (virtual) listening positions and changes in the environment itself. A large amount of research has been undertaking to seek to derive advantageous approaches for efficient communication of data describing such environments. Examples of suggested approaches for distributing and rendering data describing acoustic environments or properties may be found in US2021/092546A1 and US2021/287651A1. Various suggestions for suitable data streams and formats have been put forward with most of these including an individualized model where individual audio sources are presented separately and with associated metadata describing various properties, such as positions of the audio sources etc. In addition, some general data describing the audio environment may be provided, such as data describing reverberation, attenuation etc. However, defining a bitstream format that provides efficient (e.g. reduced data rate) communication of such information is very difficult and many issues, characteristics, and trade-offs must be carefully considered and balanced to achieve an advantageous approach. The Moving Picture Experts Group (MPEG) has started a standardization approach for developing a standard known as MPEG-I for bitstreams suitable for VR and similar experiences. Hence, an improved approach and data format/ bitstream for supporting audio in immersive applications and services such as VR and AR would be advantageous. In particular, an approach/ bitstream/ format that allows improved operation, increased flexibility, reduced complexity, facilitated implementation, an improved audio experience, a reduced complexity, reduced computational burden, improved audio quality, reduced data rate, improved trade-offs, and/or improved performance and/or operation would be advantageous. SUMMARY OF THE INVENTION Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination. According to aspects and optional features of the invention, there is provided an apparatus for generating a bitstream in accordance with claim 1. The approach may provide improved performance and operation for many applications including immersive, flexible, and varying audiovisual applications such as e.g. for many VR and AR applications. The approach may provide improved trade-offs between different desires in many scenarios such as between the desire to provide accurate, complete, and/or dynamic data for an environment and the desire to provide a bitstream with low data rate.