US-12621621-B2 - Adaptive panner of audio objects
Abstract
An audio object including audio content and object metadata is received. The object metadata indicates an object spatial position of the audio object to be rendered by audio speakers in a playback environment. Based on the object spatial position and source spatial positions of the audio speakers, initial gain values for the audio speakers are determined. The initial gain values can be used to select a set of audio speakers from among the audio speakers. Based on the object spatial position and a set of source spatial positions at which the set of audio speakers are respectively located in the playback environment, a set of non-negative optimized gain values for the set of audio speakers is determined. The audio object at the object spatial position is rendered with the set of optimized gain values for the set of audio speakers.
Inventors
- Jun Wang
- Giulio CENGARLE
- Juan Felix TORRES
- Daniel Arteaga
Assignees
- DOLBY LABORATORIES LICENSING CORPORATION
- DOLBY INTERNATIONAL AB
Dates
- Publication Date
- 20260505
- Application Date
- 20231211
- Priority Date
- 20160727
Claims (3)
- 1 . A computer-implemented method, comprising: receiving an audio object comprising audio content and object metadata, the object metadata of the audio object indicating an object spatial position of the audio object to be rendered by a plurality of audio speakers, each audio speaker in the plurality of audio speakers being located in a respective source spatial position in a plurality of source spatial positions, wherein the object spatial position is related to audio content in one or more audio frames, or one or more subdivisions of an audio frame; determining, based on the object spatial position of the audio object and the plurality of source spatial positions of the plurality of audio speakers, a plurality of initial gain values for the plurality of audio speakers, each audio speaker in the plurality of audio speakers being assigned with a respective initial gain value in the plurality of initial gain values; determining, for each of the plurality of audio speakers, whether a respective audio speaker is an active audio speaker or whether the respective audio speaker is not an active audio speaker, wherein an audio speaker is an active audio speaker if the initial gain value assigned to the audio speaker is above a threshold value, and wherein the audio speaker is not an active audio speaker if the initial gain value assigned to the audio speaker is below or at a threshold value; determining, based on the object spatial position of the audio object and a set of source spatial positions at which the set of active audio speakers are respectively located, a set of optimized non-negative gain values for the set of active audio speakers, wherein the set of optimized gain values are yielded using the initial gain values of the active speakers as input; and outputting, for each audio speaker in the set of active audio speakers a respective optimized gain value in the plurality of optimized gain values.
- 2 . A system comprising: one or more processors; and a non-transitory computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations of: receiving an audio object comprising audio content and object metadata, the object metadata of the audio object indicating an object spatial position of the audio object to be rendered by a plurality of audio speakers, each audio speaker in the plurality of audio speakers being located in a respective source spatial position in a plurality of source spatial positions, wherein the object spatial position is related to audio content in one or more audio frames, or one or more subdivisions of an audio frame; determining, based on the object spatial position of the audio object and the plurality of source spatial positions of the plurality of audio speakers, a plurality of initial gain values for the plurality of audio speakers, each audio speaker in the plurality of audio speakers being assigned with a respective initial gain value in the plurality of initial gain values; determining, for each of the plurality of audio speakers, whether a respective audio speaker is an active audio speaker or whether the respective audio speaker is not an active audio speaker, wherein an audio speaker is an active audio speaker if the initial gain value assigned to the audio speaker is above a threshold value, and wherein the audio speaker is not an active audio speaker if the initial gain value assigned to the audio speaker is below or at a threshold value; determining, based on the object spatial position of the audio object and a set of source spatial positions at which the set of active audio speakers are respectively located, a set of optimized non-negative gain values for the set of active audio speakers, wherein the set of optimized gain values are yielded using the initial gain values of the active speakers as input; and outputting, for each audio speaker in the set of active audio speakers a respective optimized gain value in the plurality of optimized gain values.
- 3 . A non-transitory computer-readable medium storing instructions that, when exceed by a processors, cause the one or more processors to perform the operations of: receiving an audio object comprising audio content and object metadata, the object metadata of the audio object indicating an object spatial position of the audio object to be rendered by a plurality of audio speakers, each audio speaker in the plurality of audio speakers being located in a respective source spatial position in a plurality of source spatial positions, wherein the object spatial position is related to audio content in one or more audio frames, or one or more subdivisions of an audio frame; determining, based on the object spatial position of the audio object and the plurality of source spatial positions of the plurality of audio speakers, a plurality of initial gain values for the plurality of audio speakers, each audio speaker in the plurality of audio speakers being assigned with a respective initial gain value in the plurality of initial gain values; determining, for each of the plurality of audio speakers, whether a respective audio speaker is an active audio speaker or whether the respective audio speaker is not an active audio speaker, wherein an audio speaker is an active audio speaker if the initial gain value assigned to the audio speaker is above a threshold value, and wherein the audio speaker is not an active audio speaker if the initial gain value assigned to the audio speaker is below or at a threshold value; determining, based on the object spatial position of the audio object and a set of source spatial positions at which the set of active audio speakers are respectively located, a set of optimized non-negative gain values for the set of active audio speakers, wherein the set of optimized gain values are yielded using the initial gain values of the active speakers as input; and outputting, for each audio speaker in the set of active audio speakers a respective optimized gain value in the plurality of optimized gain values.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is continuation of U.S. patent application Ser. No. 17/833,761, filed Jun. 6, 2022, which is a continuation of U.S. patent application Ser. No. 17/149,683, filed on Jan. 14, 2021, now U.S. Pat. No. 11,356,787 which is a continuation of U.S. patent application Ser. No. 16/555,126, filed on Aug. 29, 2019, now U.S. Pat. No. 10,897,682, which is continuation of U.S. patent application Ser. No. 15/647,121, filed on Jul. 11, 2017, now U.S. Pat. No. 10,405,120, issued on Sep. 3, 2019, which is continuation of U.S. patent application Ser. No. 15/451,241, filed on Mar. 6, 2017, now U.S. Pat. No. 9,949,052, issued on Apr. 17, 2018, which claims priority to U.S. Provisional Application No. 62/345,602, filed on Jun. 3, 2016, European Patent Application No. 16181436.3, filed on Jul. 27, 2016 and Spanish Patent Application No. P201630341, filed on Mar. 22, 2016, each of which is incorporated by reference in its entirety. TECHNOLOGY Example embodiments disclosed herein relate generally to processing audio data, and more specifically, to adaptive panner of audio objects including dynamic audio objects and static audio objects. BACKGROUND Input audio content such as originally authored/produced audio content, and the like, may include a large number of audio objects individually represented in an object-based audio format such as Dolby ATMOSĀ® to help create a spatially diverse, immersive and accurate audio experience. Audio playback systems such as those used by cinemas and home theaters are also becoming increasingly versatile and complex, evolving from 5.1 to 7.1, then from 5.1.2 to 7.1.4, then 22.2 (e.g., as defined in ITU-R BS.2051-0), the content of which is incorporated herein by reference in its entirety, among others. As audio source layouts (or audio speaker layouts) transition from planar two-dimensional (2D) arrays to three-dimensional (3D) arrays with elevated speakers and increasing audio channels, reproducing sounds in a playback environment is becoming increasingly complex. In content creation as well as end user content consumption, speaker positions might be presumed to be in compliance with a standard audio source layout's recommended specification. This presumption, however, can be incorrect in the real world. For example, in a home theater, speakers such as surround speakers are often located at non-standard positions despite the standard audio source layout's recommended specification. As a result, spatial distortion can occur in audio rendering if the audio rendering is based on a presumption that the speakers are located at the standard positions. The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated. BRIEF DESCRIPTION OF DRAWINGS The example embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which: FIG. 1 and FIG. 2 illustrate one or more example system frameworks of one or more gain optimizers in accordance with example embodiments described herein; FIG. 3 illustrates an example adaptive audio playback system that uses precomputed gain values for interpolation in accordance with example embodiments described herein; FIG. 4 illustrates discrete object positions at which gain values can be pre-calculated in accordance with example embodiments described herein; FIG. 5 illustrates an example adaptive audio playback system that determines initial gains based on a first gain optimization method and uses a second gain optimization method to refine a selected group of the initial gains in accordance with example embodiments described herein; FIG. 6 illustrates an example memory-complexity curve with different sparseness settings in accordance with example embodiments described herein; FIG. 7 illustrates an adaptive audio playback system in which gains are interpolated from precomputed gains and in which tradeoffs between memory and complexity can be adjusted with different sparseness settings for precomputed gain storage in accordance with example embodiments described herein; FIG. 8 illustrates an example audio object that traverses in similar diagonal spatial trajectories in two different playback environments in accordance with example embodiments described herein; FIG. 9 illustrates example panning curves for an audio object with a diagonal trajectory across a room in accordance with example embodiments described herein; FIG. 10 illustrates