CN-122029841-A - Generating an audio data signal

CN122029841ACN 122029841 ACN122029841 ACN 122029841ACN-122029841-A

Abstract

An apparatus comprises a receiver (201), the receiver (201) receiving a data signal comprising audio data for at least a first audio signal and first and second acoustic environment data for an acoustic environment, wherein a data size of an acoustic environment parameter set is larger for the first acoustic environment data and an update rate is higher for the second acoustic environment data. An acoustic data generator (203) selects between the first and second acoustic environment data and generates rendered acoustic environment data. For example, the second acoustic environment data may be selected only if the corresponding first acoustic environment data is not received. The renderer (205) generates an audio output signal by rendering the audio signal based on the rendered acoustic environment data. A reduction in delay in rendering an acoustic environment can be achieved without sacrificing long-term accuracy.

Inventors

M. Z. Shcherba
A.W.J. OMEN
E - G - P - Sihaiyeer Adams
P.H.A. Dylan

Assignees

皇家飞利浦有限公司

Dates

Publication Date: 20260512
Application Date: 20240930
Priority Date: 20231012

Claims (14)

1. A device for generating an output audio signal, the device comprising: a receiver (201) arranged to receive a data signal comprising audio data for at least a first audio signal and metadata, the metadata comprising: First acoustic environment data for an acoustic environment, the first acoustic environment data comprising repeated first sets of acoustic environment parameters, each first set of acoustic environment parameters providing a description of the acoustic environment; Second acoustic environment data for the acoustic environment, the second acoustic environment data comprising repeated second acoustic environment parameter sets, each second acoustic environment parameter set providing a description of the acoustic environment, the data size of the first acoustic environment parameter set exceeding the data size of the second acoustic environment parameter set, and the update rate of the second acoustic environment parameter set being higher than the update rate of the first acoustic environment parameter set; An acoustic data generator (203) arranged to select between the first acoustic environment data and the second acoustic environment data to generate rendered acoustic environment data, and A renderer (205) arranged to generate the audio output signal by rendering the audio signal based on the rendered acoustic environment data.
2. The audio device of claim 1, wherein quantization of at least one parameter in the second acoustic environment parameter set is coarser than a corresponding parameter in the first acoustic environment parameter set.
3. The audio device of claim 1 or 2, further comprising: a listener gesture processor (207) arranged to determine a listener gesture, and wherein The renderer (205) is arranged to render the audio signals in accordance with the listener gestures.
4. The audio device of any preceding claim, wherein the data of each set of the first set of acoustic environment parameters is distributed in a plurality of non-contiguous data segments.
5. The audio apparatus of claim 4 wherein at least one data segment of a first set of the first set of acoustic environment parameters comprises an indication of a start position of data of the first set, and the acoustic data generator is arranged to generate the rendered acoustic environment data to represent the first set in dependence on the indication of the start position.
6. The audio apparatus according to claim 4 or 5, wherein at least one data segment of a first set of the first set of acoustic environment parameters comprises an indication of a data size of the data of the first set, and the acoustic data generator is arranged to generate the rendered acoustic environment data to represent the first set in dependence on the indication of the data size.
7. The audio apparatus according to claim 5 or 6, wherein the receiver (201) is arranged to store data from a data segment when received, and the acoustic data generator (203) is arranged to generate the rendered acoustic environment data to represent the first set from the stored data from the data segment.
8. The apparatus of any preceding claim, wherein the first acoustic environment data of a given set of the first set of acoustic environment parameters comprises a data integrity verification value and the acoustic data generator (203) is arranged to generate the acoustic environment data from a previously received set of the first set of acoustic environment parameters, but not from the given set, if the data integrity verification value matches a data integrity verification value generated from data of the previously received set.
9. The apparatus of any preceding claim, wherein the second set of acoustic environmental parameters comprises fewer parameters than the first set of acoustic environmental parameters.
10. The apparatus of any preceding claim, wherein at least one set of the first set of acoustic environmental parameters comprises at least one parameter differentially encoded relative to parameters in the set of the second set of acoustic environmental parameters.
11. A data signal comprising at least a first audio signal and metadata, the metadata comprising: First acoustic environment data for an acoustic environment, the first acoustic environment data comprising repeated first sets of acoustic environment parameters, each first set of acoustic environment parameters providing a description of the acoustic environment; A second acoustic environment data for the acoustic environment, the second acoustic environment data comprising repeated second acoustic environment parameter sets, each second acoustic environment parameter set providing a description of the acoustic environment, the data size of the first acoustic environment parameter set exceeding the data size of the second acoustic environment parameter set, and the update rate of the second acoustic environment parameter set being higher than the update rate of the first acoustic environment parameter set.
12. The apparatus for generating the data signal of claim 11, further comprising: a transmitter (901) arranged to transmit the data signal over a communication channel; A determiner (903) arranged to determine a communication channel attribute of the communication channel, and A controller (905) is arranged to adapt properties of the first acoustic environment data in dependence of the communication channel properties.
13. A method of generating an output audio signal, the method comprising: Receiving a data signal comprising audio data for at least a first audio signal and metadata, the metadata comprising: First acoustic environment data for an acoustic environment, the first acoustic environment data comprising repeated first sets of acoustic environment parameters, each first set of acoustic environment parameters providing a description of the acoustic environment; Second acoustic environment data for the acoustic environment, the second acoustic environment data comprising repeated second acoustic environment parameter sets, each second acoustic environment parameter set providing a description of the acoustic environment, the data size of the first acoustic environment parameter set exceeding the data size of the second acoustic environment parameter set, and the update rate of the second acoustic environment parameter set being higher than the update rate of the first acoustic environment parameter set; Selecting between the first acoustic environment data and the second acoustic environment data to generate rendered acoustic environment data, and The audio output signal is generated by rendering the audio signal based on the rendered acoustic environment data.
14. A computer program product comprising computer program code means adapted to perform all the steps of claim 13 when said program is run on a computer.

Description

Generating an audio data signal Technical Field The present invention relates to generating audio signals and/or audio data signals, and in particular, but not exclusively, to generating such signals to support, for example, augmented reality (eXtended Reality) applications. Background In recent years, with the continued development and introduction of new services and new ways of utilizing and consuming audiovisual content, the variety and scope of audiovisual content-based experiences has increased significantly. In particular, many spatialization and interactive services, applications and experiences are being developed to give users a more engaging and immersive experience. Examples of such applications are Virtual Reality (VR), augmented Reality (AR) and Mixed Reality (MR) applications (collectively referred to as augmented reality XR applications), which are rapidly becoming mainstream, there are many solutions towards the consumer market. Many standards are also being developed by many standardization bodies. Such standardization bodies are actively developing standards for various aspects of VR/AR/MR/XR systems, including, for example, streaming, broadcasting, rendering, and the like. VR applications tend to provide user experiences corresponding to users in different worlds/environments/scenes, while AR (including mixed reality MR) applications tend to provide user experiences corresponding to users in the current environment, but with the addition of additional information or virtual objects or information. Thus, VR applications tend to provide a fully immersive synthesized world/scene, while AR applications tend to provide a partially synthesized world/scene that is superimposed over the real scene in which the user is physically located. However, these terms are often used interchangeably and have a high degree of overlap and are often referred to as XR applications. Hereinafter, these terms will be used interchangeably. VR applications typically provide a virtual reality experience to users, allowing users to move (relatively) freely in the virtual environment and dynamically change their location and where they are looking. Typically, such virtual reality applications are based on a three-dimensional model of a scene, where the model is dynamically evaluated to provide a particular requested view. Such methods are well known in gaming applications such as for computers and gaming machines, such as in first person shooter-type games. In addition to visual rendering, most XR (and particularly VR) applications also provide a corresponding audio experience. In many applications, audio preferably provides a spatial audio experience in which an audio source is perceived as coming from a location corresponding to the location of a corresponding object in a virtual scene, including objects that are currently visible and objects that are not currently visible (e.g., behind a user). Thus, the audio and video scenes preferably remain perceptually consistent and both provide a full spatial experience. For audio, headphone reproduction using binaural audio rendering techniques is widely employed. In many scenarios, headphone reproduction provides a highly immersive, personalized experience for the user. Using head tracking, rendering can be performed in response to the user's head movements, which highly increases the sense of immersion. Typically, audio data is provided along with metadata describing the acoustic environment (such as acoustic properties of the room, etc.). This allows the rendering of the audio to be adapted to provide a perception of a more realistic environment. However, while such an approach may provide a suitable user experience in many practical applications, it often does not provide an optimal user experience in all scenarios. In particular, in many cases, sub-optimal audio quality/perception/user experience may result. For example, the representation of the acoustic environment may not be perceived as accurate or authentic, or there may be an undesirable delay before such perception can be achieved. US 2021/287251 A1 discloses a method and apparatus for simulating reverberation (reverberation) in a rendering system. The decoder/renderer may refer to the audio scene description format file information or other bitstream information and/or look-up tables regarding the virtual scene geometry to determine how to simulate reverberation. The late reverberation may be generated based on reverberator parameters derived from the virtual scene geometry, including delay line length, attenuation filter coefficients, and diffusion-to-direct ratio (diffuse-to-direct ratio) characteristics, using a feedback delay network. Early reverberation may be generated based on determining which potentially reflective elements in the virtual scene geometry are reflection planes, which may involve simulation of light reflection, potentially through beam tracking. Hence, an improved method for distribution