EP-4062657-B1 - SOUNDFIELD ADAPTATION FOR VIRTUAL REALITY AUDIO

EP4062657B1EP 4062657 B1EP4062657 B1EP 4062657B1EP-4062657-B1

Inventors

OLIVIERI, FERDINANDO
SHAHBAZI MIRZAHASANLOO, TAHER
PETERS, NILS GÜNTHER

Dates

Publication Date: 20260506
Application Date: 20201119

Claims (15)

A device configured to play one or more of a plurality of audio streams, the audio streams comprising at least one decomposed version of ambisonic coefficients, which includes at least one spatial component and at least one audio source, wherein the at least one spatial component describes spatial characteristics associated with the at least one audio source in a spherical harmonic domain representation, the device comprising: a memory configured to store (250) the at least one spatial component and the at least one audio source within the plurality of audio streams; and one or more processors coupled to the memory, and configured to: receive (252), from motion sensors, rotation information; rotate (254) the at least one spatial component based on the rotation information to form at least one rotated spatial component; and reconstruct (256) ambisonic signals from the at least one rotated spatial component and the at least one audio source.
The device of claim 1, wherein the at least one spatial component comprises a V-vector that identifies spatial characteristics of a corresponding audio object, and the at least one audio source comprises a U-vector representative of the audio source, wherein the one or more processors are, optionally, further configured to reconstruct the U-vector by applying a projection matrix to a reference residual vector and dequantized energy signal, and wherein the projection matrix optionally comprises temporal and spatial rotation data.
The device of claim 1, wherein the one or more processors are further configured to output a representation of the at least one audio source to one or more speakers; or the device of claim 1, wherein the one or more processors are further configured to combine at least two representations of the at least one audio source by at least one of mixing or interpolation.
The device of claim 1, further comprising a display device.
The device of claim 4, further comprising a microphone, wherein the one or more processors are further configured to receive a voice command from the microphone and control the display device based on the voice command.
The device of claim 1, further comprising one or more speakers, and/or, wherein the device comprises a mobile handset.
The device of claim 1, wherein the device comprises an extended reality headset, and wherein an acoustical space comprises a scene represented by video data captured by a camera, or where an acoustical space comprises a virtual world.
The device of claim 1, further comprising a head-mounted device configured to present an acoustical space.
The device of claim 1, further comprising a wireless transceiver, the wireless transceiver being coupled to the one or more processors and being configured to receive a wireless signal, the wireless signal comprising one of more of a signal conforming to a 5 th generation cellular standard, a Bluetooth standard or a Wi-Fi standard.
A method of playing one or more of a plurality of audio streams, the audio streams comprising at least one decomposed version of ambisonic coefficients, which includes at least one spatial component and at least one audio source, wherein the at least one spatial component describes spatial characteristics associated with the at least one audio source in a spherical harmonic domain representation, the method comprising: storing (250), by a memory, the at least one spatial component and the at least one audio source within the plurality of audio streams; receiving (252), by one or more processors from motion sensors, rotation information; rotating (254), by the one or more processors, the at least one spatial component based on the rotation information to form at least one rotated spatial component; and reconstructing (256), by the one or more processors, ambisonic signals from the rotated at least one spatial component and the at least one audio source.
The method of claim 10, wherein the at least one spatial component comprises a V-vector that identifies spatial characteristics of a corresponding audio object, and the at least one audio source comprises a U-vector representative of the audio source, the method optionally further comprising reconstructing the U-vector by applying a projection matrix to a reference residual vector and dequantized energy signal, wherein the projection matrix optionally comprises temporal and spatial rotation data.
The method of claim 10, further comprising outputting, by the one or more processors, a representation of the at least one audio source to one or more speakers; or the method of claim 10, further comprising combining, by the one or more processors, at least two representations of the at least one audio source by at least one of mixing or interpolation; or the method of claim 10, further comprising receiving a voice command from a microphone and controlling a display device based on the voice command.
The method of claim 10, wherein the method is performed upon a mobile handset; or the method of claim 10, wherein the method is performed upon an extended reality headset, and wherein an acoustical space comprises a scene represented by video data captured by a camera, or wherein an acoustical space comprises a virtual world; or the method of claim 10, wherein the method is performed upon a head-mounted device configured to present an acoustical space.
The method of claim 10, further comprising receiving a wireless signal, the wireless signal comprising one of more of a signal conforming to a 5 th generation cellular standard, a Bluetooth standard or a Wi-Fi standard.
A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors to perform the method of any of claims 10-14.

Description

TECHNICAL FIELD This disclosure relates to processing of media data, such as audio data. BACKGROUND Computer-mediated reality systems are being developed to allow computing devices to augment or add to, remove or subtract from, or generally modify existing reality experienced by a user. Computer-mediated reality systems (which may also be referred to as "extended reality systems," or "XR systems") may include, as examples, virtual reality (VR) systems, augmented reality (AR) systems, and mixed reality (MR) systems. The perceived success of computer-mediated reality systems are generally related to the ability of such computer-mediated reality systems to provide a realistically immersive experience in terms of both the video and audio experience where the video and audio experience align in ways expected by the user. Although the human visual system is more sensitive than the human auditory systems (e.g., in terms of perceived localization of various objects within the scene), ensuring an adequate auditory experience is an increasingly important factor in ensuring a realistically immersive experience, particularly as the video experience improves to permit better localization of video objects that enable the user to better identify sources of audio content. US 2019069110 A1 discloses a method of encoding sound objects using spherical harmonic symmetries. There are situations where the ambisonic sound field may be rotated. For example, when multiple sound objects in an ambisonic sound field have already been encoded into the ambisonic representation using a microphone array. A transformation module includes a rotator that performs a first order ambisonic rotation that includes applying a 3×3 rotation matrix to the velocity components of the sound field 306 while keeping the pressure component unmodified. This is equivalent to a simple vector rotation. Alternatively, the transformation module includes a rotator that performs higher order ambisonics rotation involving rotation of vectors with dimensionality higher than 3, such as by use of spherical harmonic rotation matrices computation by recursion. US 2019/069118 A1 discloses a sound processing apparatus that comprises a head direction acquisition unit that acquires a head direction of a user listening to sound; a rotation matrix generation unit that selects two first rotation matrices on the basis of the head direction from a plurality of first rotation matrices for rotation in a first direction held in advance, selects one second rotation matrix on the basis of the head direction from a plurality of second rotation matrices for rotation in a second direction held in advance, and generates a third rotation matrix on the basis of the selected two first rotation matrices and the selected one second rotation matrix; and a head-related transfer function composition unit that composes an input signal in a spherical harmonic domain, a head-related transfer function in the spherical harmonic domain, and the third rotation matrix to generate a headphone drive signal in a time-frequency domain. US 2016/241980 A1 discusses adaptive ambisonic binaural rendering. SUMMARY This disclosure relates generally to auditory aspects of the user experience of computer-mediated reality systems, including virtual reality (VR), mixed reality (MR), augmented reality (AR), computer vision, and graphics systems. Various aspects of the techniques may provide for adaptive audio capture and rendering of an acoustical space for extended reality systems. The invention is defined in the independent claims. Optional features are set out in the dependent claims. The details of one or more examples of this disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of various aspects of the techniques will be apparent from the description and drawings, and from the claims. BRIEF DESCRIPTION OF DRAWINGS FIGS. 1A-1C are diagrams illustrating systems that may perform various aspects of the techniques described in this disclosure.FIG. 2 is a diagram illustrating an example of a VR device worn by a user.FIG. 3 illustrates an example of a wireless communications system 100 that supports devices and methods in accordance with aspects of the present disclosure.FIG. 4 is a block diagram illustrating an example audio playback system according to the techniques described in this disclosure.FIG. 5 is a block diagram of an example audio playback system further illustrating various aspects of techniques of this disclosure.FIG. 6 is a block diagram of an example audio playback system further illustrating various aspects of techniques of this disclosure.FIG. 7 is a block diagram of an example audio playback system further illustrating various aspects of techniques of this disclosure.FIG. 8 is a conceptual diagram illustrating an example concert with three or more audio receivers.FIG. 9 is a flowchart illustrating an example of using rotation informa