EP-4740488-A1 - AUDIO ENHANCEMENT AND OPTIMIZATION OF AN IMMERSIVE AUDIO EXPERIENCE

EP4740488A1EP 4740488 A1EP4740488 A1EP 4740488A1EP-4740488-A1

Abstract

Techniques are disclosed herein for providing audio enhancement and optimization of an immersive audio experience. Examples may include generating an audio feature set for a transduced audio stream captured in an environment, inputting the audio feature set to a neural network model configured to generate an audio isolation mask associated with the transduced audio stream, and generating isolated audio for the transduced audio stream based at least in part on the audio isolation mask.

Inventors

YOST, Christian
LESTER, MICHAEL
PROSINSKI, Michael
SCONZA, Justin

Assignees

Shure Acquisition Holdings, Inc.

Dates

Publication Date: 20260513
Application Date: 20240703

Claims (20)

1. An apparatus comprising at least one processor and a memory storing instructions that are operable, when executed by the at least one processor, to cause the apparatus to: generate an audio feature set for a transduced audio stream captured via at least one capture device positioned within an environment defining at least one audio capture area; receive, from a user device, one or more user audio isolation control parameters; input the audio feature set to a neural network model configured to generate an audio isolation mask associated with the transduced audio stream; generate isolated audio for the transduced audio stream based at least in part on (i) the audio isolation mask and (ii) one or more user audio isolation control parameters; and generate output data for an output device based at least in part on the isolated audio.
2. The apparatus of claim 1, wherein the environment is an arena environment.
3. The apparatus of claim 2, wherein the arena environment defines a playing region, a spectator region, and a noise source region, and wherein the instructions are further operable to cause the apparatus to: generate the isolated audio for the playing region, the spectator region, or the noise source region based at least in part on (i) the audio isolation mask and (ii) the one or more user audio isolation control parameters.
4. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: receive the transduced audio stream from an audio mixer device.
5. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: generate mixed isolated audio based at least in part on the isolated audio and different isolated audio associated with the environment; and generate the output data for the output device based at least in part on the mixed isolated audio.
6. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: receive a first audio channel stream via a first capture device positioned within a first audio capture area of the environment; receive a second audio channel stream via a second capture device positioned within a second audio capture area of the environment; generate a first audio feature set for the first audio channel stream; generate a second audio feature set for the second audio channel stream; input the first audio feature set to a first neural network model to generate a first mixing control signal; input the second audio feature set to a second neural network model to generate a second mixing control signal; and select the transduced audio stream from a plurality of transduced audio streams based at least in part on the first mixing control signal and the second mixing control signal.
7. The apparatus of claim 6, wherein the instructions are further operable to cause the apparatus to: select the transduced audio stream via an audio mixer device.
8. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: input an audio signal sample associated with the transduced audio stream to a timefrequency domain transformation pipeline of a digital signal processing process for a transformation period; input the audio signal sample to a deep neural network (DNN) processing loop comprising the neural network model; and based on the audio isolation mask being determined prior to expiration of the transformation period, apply the audio isolation mask to a frequency domain version of the audio signal sample associated with the time-frequency domain transformation pipeline to generate the isolated audio.
9. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: generate a reference audio feature set for a reference microphone signal associated with the environment; and input the audio feature set and the reference audio feature set to the neural network model to generate the audio isolation mask.
10. The apparatus of claim 1, wherein the transduced audio stream comprises at least one microphone signal from a group comprising a first microphone signal and one or more microphone signals associated with one or more sounds in the environment.
11. The apparatus of claim 1, wherein the audio isolation mask comprises a denoiser mask, a speech removal mask, or a signal of interest mask.
12. The apparatus of claim 1, wherein the output data comprises broadcast audio.
13. The apparatus of claim 1, wherein the output data comprises speech reinforcement audio.
14. The apparatus of claim 1, wherein the output data comprises visual data configured to render via a display of the output device.
15. The apparatus of claim 1, wherein the output device is a haptic device, and wherein the output data comprises a control signal for the haptic device.
16. The apparatus of claim 1, wherein the output data comprises a video stream associated with the isolated audio.
17. The apparatus of claim 1 , wherein the instructions are further operable to cause the apparatus to: perform beam steering associated with the at least one capture device based at least in part on the audio isolation mask.
18. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: initiate selection of an audio channel associated with desirable audio based at least in part on the audio isolation mask.
19. A computer-implemented method comprising: generating an audio feature set for a transduced audio stream captured via at least one capture device positioned within an environment defining at least one audio capture area; inputting the audio feature set to a neural network model configured to generate an audio isolation mask associated with the transduced audio stream; generating isolated audio for the transduced audio stream based at least in part on (i) the audio isolation mask and (ii) one or more user audio isolation control parameters; and generating output data for an output device based at least in part on the isolated audio.
20. A computer program product, stored on a computer readable medium, comprising instructions that, when executed by one or more processors of an apparatus, cause the one or more processors to: generate an audio feature set for a transduced audio stream captured via at least one capture device positioned within an environment defining at least one audio capture area; input the audio feature set to a neural network model configured to generate an audio isolation mask associated with the transduced audio stream; generate isolated audio for the transduced audio stream based at least in part on (i) the audio isolation mask and (ii) one or more user audio isolation control parameters; and generate output data for an output device based at least in part on the isolated audio.

Description

AUDIO ENHANCEMENT AND OPTIMIZATION OF AN IMMERSIVE AUDIO EXPERIENCE CROSS REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of U.S. Provisional Patent Application No. 63/512,512, titled “AUDIO ENHANCEMENT AND OPTIMIZATION OF AN IMMERSIVE ARENA BASED AUDIO EXPERIENCE,” and filed on July 7, 2023, the entirety of which is hereby incorporated by reference. TECHNICAL FIELD [0002] Embodiments of the present disclosure relate generally to audio processing and, more particularly, to systems, methods, and computer program products for enhancing audio signals related to audio environments. BACKGROUND [0003] An audio processing system for an audio environment may utilize one or more microphones and digital signal processing to capture, process, and/or transmit audio data associated with the audio environment. However, noise, reverberation, acoustic feedback, and/or other undesirable sound is often introduced during audio capture by an audio processing system for an audio environment. BRIEF SUMMARY [0004] Various embodiments of the present disclosure are directed to apparatuses, systems, methods, and computer readable media for providing audio enhancement and optimization of an immersive audio experience. These characteristics as well as additional features, functions, and details of various embodiments are described below. The claims set forth herein further serve as a summary of this disclosure. BRIEF DESCRIPTION OF THE DRAWINGS [0005] Having thus described some embodiments in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein: [0006] FIG. 1 illustrates an example audio isolation signal processing system in accordance with one or more embodiments disclosed herein; [0007] FIG. 2 illustrates an example audio isolation signal processing apparatus configured in accordance with one or more embodiments disclosed herein; [0008] FIG. 3 illustrates an example audio system in accordance with one or more embodiments disclosed herein; [0009] FIG. 4 illustrates another example audio system in accordance with one or more embodiments disclosed herein; [0010] FIG. 5 illustrates another example audio system in accordance with one or more embodiments disclosed herein; [0011] FIG. 6 illustrates an example audio isolation signal processing subsystem in accordance with one or more embodiments disclosed herein; [0012] FIG. 7 illustrates another example audio isolation signal processing subsystem in accordance with one or more embodiments disclosed herein; [0013] FIG. 8 illustrates an example pre-processing subsystem in accordance with one or more embodiments disclosed herein; [0014] FIG. 9 illustrates an example post-processing subsystem in accordance with one or more embodiments disclosed herein; [0015] FIG. 10 illustrates an example arena environment associated with an audio isolation signal processing system in accordance with one or more embodiments disclosed herein; [0016] FIG. 11 illustrates an example audio stream and isolated audio in accordance with one or more embodiments disclosed herein; [0017] FIG. 12 illustrates an example system in accordance with one or more embodiments disclosed herein; [0018] FIG. 13 illustrates an example audio processing control user interface in accordance with one or more embodiments disclosed herein; [0019] FIG. 14 illustrates another example audio processing control user interface in accordance with one or more embodiments disclosed herein; [0020] FIG. 15 illustrates another example audio processing control user interface in accordance with one or more embodiments disclosed herein; and [0021] FIG. 16 illustrates an example method for providing audio enhancement and/or optimization of an immersive audio experience in accordance with one or more embodiments disclosed herein. DETAILED DESCRIPTION [0022] Various embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the present disclosure are shown. Indeed, the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Overview [0023] Various embodiments of the present disclosure address technical problems associated with accurately, efficiently and/or reliably isolating audio associated with an audio environment such as, for example, an arena environment. Noise, reverberation, acoustic feedback, and/or other undesirable audio are often introduced during audio capture operations related to microphones located in an audio environment. For arena environments, such noise, reverberation, acoustic feedback, and/or other undesirable audio affect quality of broadcast audio, broadcast video, and/or speech reinforcement associated with an arena environment, which may prod