EP-4740490-A1 - BEAMFORMING CONTROL FOR 6-DEGREES OF FREEDOM AUDIO RENDERING

EP4740490A1EP 4740490 A1EP4740490 A1EP 4740490A1EP-4740490-A1

Abstract

A method for a rendering of a six degrees-of-freedom audio scene, the method comprising: providing at least two microphones located at respective position for audio capturing; identifying at least one known audio source being captured based on a beamforming by at least one microphone of the at least two microphones, the at least one audio source is located in the six degrees-of-freedom audio scene relative to the at least two microphones; identifying at least one unknown audio source in the six degrees-of-freedom audio scene, wherein the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone; controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones, the controlling of beamforming is such that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source.

Inventors

LEPPÄNEN, Jussi Artturi
MATE, SUJEET SHYAMSUNDAR
LEHTINIEMI, ARTO JUHANI

Assignees

Nokia Technologies Oy

Dates

Publication Date: 20260513
Application Date: 20240619

Claims (1)

CLAIMS: 1. A method for a rendering of a six degrees-of-freedom audio scene, the method comprising: providing at least two microphones located at respective position for audio capturing; identifying at least one known audio source being captured based on a beamforming by at least one microphone of the at least two microphones, the at least one audio source is located in the six degrees-of-freedom audio scene relative to the at least two microphones; identifying at least one unknown audio source in the six degrees-of-freedom audio scene, wherein the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone; and controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones, the controlling of beamforming is such that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source. 2. The method as claimed in claim 1, wherein the at least two microphones are higher-order Ambisonics microphones. 3. The method as claimed in any of claims 1 or 2, wherein controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones comprises determining the at least one unknown audio source is masking the at least one known audio source with respect to the beamforming by at least one microphone of the at least two microphones. 4. The method as claimed in any of claims 1 to 3, wherein controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones further comprises generating a signal, the signal indicating that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source. 5. The method as claimed in any of claims 1 to 3, wherein controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones further comprises generating a signal, the signal associated with the at least one known audio source and indicating which the at least one microphone of the two microphones and the at least one alternative microphone is to be selected. 6. The method as claimed in any of claims 1 to 3, wherein controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones further comprises switching from the one of the at least two microphones to the at least one alternative microphone. 7. The method as claimed in any of claims 1 to 6, wherein controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones comprises obtaining a signal associated with the at least one known audio source, the signal indicating where the at least one microphone of the two microphones and the at least one alternative microphone is to be selected. 8. The method as claimed in claim 7, further comprising receiving a six degrees-of-freedom audio scene description wherein the six degrees-of-freedom audio scene description comprises the signal associated with the at least one known audio source. 9. The method as claimed in any of claims 1 to 8, wherein controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones comprises activating one of: the at least one alternative microphone; or the at least one microphone, for capturing audio from the at least one known audio source and deactivating the other of the at least one alternative microphone or the at least one microphone. 10. The method as claimed in any of claims 1 to 9, wherein controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones comprises controlling a disabling or enabling of the at least one known source. 11. The method as claimed in any of claims 1 to 10, further comprising determining the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone. 12. The method as claimed in claim 11, wherein determining the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone comprises determining the position of the at least one unknown audio source is between or substantially between the at least one known audio source and the at least one of the at least two microphones. 13. The method as claimed in claim 11, wherein determining the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone comprises determining a distance between the at least one unknown audio source and the at least one known audio source is less than a threshold distance. 14. The method as claimed in claim 11, wherein determining the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone comprises determining an energy estimate of an energy of the at least one known audio source is lower than a threshold energy. 15. The method as claimed in any of claims 1 to 14, furthermore comprising determining at least one spatial metadata parameter associated with the at least one known audio source based on an analysis of the beamformed at least one known audio source. 16. An apparatus comprising means for performing the method of any of claims 1 to 15. 17. A computer program comprising instructions, which, when executed by an apparatus, cause the apparatus to perform the method of any of claims 1 to 15. 18. An apparatus for a rendering of a six degrees-of-freedom audio scene, the apparatus comprising means configured to: provide at least two microphones located at respective position for audio capturing; identify at least one known audio source being captured based on a beamforming by at least one microphone of the at least two microphones, the at least one audio source is located in the six degrees-of-freedom audio scene relative to the at least two microphones; identifying at least one unknown audio source in the six degrees-of-freedom audio scene, wherein the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone; and control the beamforming based on identifying at least one alternative microphone of the at least two microphones, the control of beamforming is such that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source. 19. An apparatus for a rendering of a six degrees-of-freedom audio scene, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: provide at least two microphones located at respective position for audio capturing; identify at least one known audio source being captured based on a beamforming by at least one microphone of the at least two microphones, the at least one audio source is located in the six degrees-of-freedom audio scene relative to the at least two microphones; identifying at least one unknown audio source in the six degrees-of-freedom audio scene, wherein the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone; and control the beamforming based on identifying at least one alternative microphone of the at least two microphones, the control of beamforming is such that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source. 20. A non-transitory computer readable medium comprising program instructions for causing an apparatus, for a rendering of a six degrees-of-freedom audio, at least to: provide at least two microphones located at respective position for audio capturing; identify at least one known audio source being captured based on a beamforming by at least one microphone of the at least two microphones, the at least one audio source is located in the six degrees-of-freedom audio scene relative to the at least two microphones; identifying at least one unknown audio source in the six degrees-of-freedom audio scene, wherein the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone; and control the beamforming based on identifying at least one alternative microphone of the at least two microphones, the control of beamforming is such that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source.

Description

BEAMFORMING CONTROL FOR 6-DEGREES OF FREEDOM AUDIO RENDERING Field The present application relates to apparatus and methods for beamforming control for 6-degrees of freedom audio rendering, but not exclusively for beamforming control for 6-degrees of freedom audio rendering in augmented reality and/or virtual reality apparatus. Background The current implementations of the MPEG-I Immersive audio standard (ISO/IEC 23090-4 WD3) renderer is configured to support 6 degrees-of-freedom (6DoF) rendering of audio scenes comprising multiple first order or higher-order Ambisonics (FOA, HOA) microphone recordings or synthesized signals. The renderer implementations can be able to provide a binaural signal ‘at’ the listener position (and orientation) based on the recorded FOA/HOA signals and their positions. That is, the renderer is able to provide a binaural signal at a non-sampled position in the scene, thus providing a 6DoF experience for the listener. Figure 1 shows an example scene 100 with microphones m1101, m2103, m3105, m4 107, and m5109 at positions p1, p2, p3, p4 and p5 respectively and a listener 111 l1 at position pl. Current MPEG-I audio renderer implementations are able to utilize information (e.g., position) about audio sources present in the scene (e.g., during the recording or audio sources present in synthetic ambisonics sources) such as described in WO2022136725. The positions of any of the audio sources present in the recorded or synthetic audio scene may be given as input information to the renderer in addition to the audio signals and positions of the FOA or HOA microphones. Using this additional information about the audio sources allows for the renderer to provide improved sound quality through improved localization and distance attenuation behavior for the known sources. To achieve this, the renderer can be configured to perform beamforming (from a FOA/HOA signal) towards a known source to estimate the source properties (energy at different frequency bands). Based on this and the position of the known source, the renderer can be configured to determine from which direction, from the listener’s perspective, and in which frequency bands, how much energy is being contributed by the known source. However during estimation of audio source spectral properties in order to perform 6DoF rendering of scenes comprising multiple HOA sources, a problem can occur if the HOA source and the audio source whose position information is provided to the renderer does not consider the presence of other audio sources between a HOA source and an informed source (whose rendering is being improved selectively). There is current research into rendering and improving the estimate of spectral properties of an audio source (for example with beamforming) and where the audio source is ‘obscured’ or ‘blocked’ or ‘masked’ by an intermediate audio source. Summary There is provided according to a first aspect a method for a rendering of a six degrees-of-freedom audio scene, the method comprising: providing at least two microphones located at respective position for audio capturing; identifying at least one known audio source being captured based on a beamforming by at least one microphone of the at least two microphones, the at least one audio source is located in the six degrees-of-freedom audio scene relative to the at least two microphones; identifying at least one unknown audio source in the six degrees-of- freedom audio scene, wherein the at least one unknown audio source at least partially interferes with the capturing of the at least one known audio source by the at least one microphone; and controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones, the controlling of beamforming is such that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source. The at least two microphones may be higher-order Ambisonics microphones. Controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones may comprise determining the at least one unknown audio source is masking the at least one known audio source with respect to the beamforming by at least one microphone of the at least two microphones. Controlling the beamforming based on identifying at least one alternative microphone of the at least two microphones may further comprise generating a signal, the signal indicating that the at least one alternative microphone starts capturing audio from the at least one known audio source while the at least one unknown audio source is at least partially located between the at least one microphone and the at least one known audio source. Controlling the beamforming based on identifying at least one alternative microphone of the at least t