EP-4740085-A1 - SCALING AUDIO SOURCES IN EXTENDED REALITY SYSTEMS WITHIN TOLERANCES

EP4740085A1EP 4740085 A1EP4740085 A1EP 4740085A1EP-4740085-A1

Abstract

In general, various aspects of the techniques are directed to rescaling audio element for extended reality scene playback. A device comprising a memory and processing circuitry may be configured to perform the techniques. The memory may store an audio bitstream representative of an audio element in an extended reality scene. The processing circuitry may obtain a playback dimension associated with a physical space in which playback of the audio bitstream is to occur, and obtain a source dimension associated with a source space for the extended reality scene. The processing circuitry may modify, based on the playback dimension and the source dimension, a location of the audio element to obtain a modified location for the audio element, and render, based on the modified location for the audio element, the audio element to one or more speaker feeds. The processing circuitry may output the one or more speaker feeds.

Inventors

MUNOZ, ISAAC GARCIA
TUNG, ALEX
DAVIS, GRAHAM BRADLEY
GENOVESE, ANDREA FELICE
SUME, TINSAYE YITBAREK

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260513
Application Date: 20240703

Claims (20)

1. A device configured to process an audio bitstream, the device comprising: a memory configured to store the audio bitstream representative of an audio element in an extended reality scene; and processing circuitry coupled of the memory and configured to: obtain a playback dimension associated with a physical space in which playback of the audio bitstream is to occur; obtain a source dimension associated with a source space for the extended reality scene; obtain a tolerance associated with the extended reality scene; modify, based on the playback dimension, the source dimension, and the tolerance, a location of the audio element to obtain a modified location for the audio element; render, based on the modified location for the audio element, the audio element to one or more speaker feeds; and output the one or more speaker feeds.
2. The device of claim 1, wherein the processing circuitry is, when configured to modify the location of the audio element, configured to: determine, based on the playback dimension and the source dimension, a rescale factor; and apply the rescale factor to the location of the audio element within the tolerance to obtain the modified location for the audio element.
3. The device of claim 2, wherein the processing circuitry is further configured to obtain, from the audio bitstream, a first syntax element indicating that auto rescale is to be performed for the audio element and a second syntax element indicating the tolerance, and wherein the processing circuitry is, when configured to apply the rescale factor, automatically apply, for a duration in which the audio element is present for playback, the rescale factor to the location of the audio element within the tolerance to obtain the modified location for the audio element.
4. The device of claim 2, wherein the processing circuitry is, when configured to determine the rescale factor, configured to: determine the rescale factor as the playback dimension divided by the source dimension; and modify the rescale factor based on the tolerance to obtain a modified rescale factor, and wherein the processing circuitry is, when configured to apply the rescale factor, is configured to apply the modified rescale factor to the location of the audio element to obtain the modified location for the audio element.
5. The device of claim 1, wherein the playback dimension includes one or more of a width of the physical space, a length of the physical space, and a height of the physical space, and wherein the source dimension includes one or more of a width of the source space, a length of the source space, and a height of the source space.
6. The device of claim 1, wherein the processing circuitry is configured to obtain a syntax element defining the tolerance from the bitstream.
7. The device of claim 1, wherein the tolerance includes a height tolerance, a width tolerance, and a depth tolerance.
8. The device of claim 7, wherein the tolerance includes a minimum and maximum for each of the height tolerance, a width tolerance, and a depth tolerance.
9. The device of claim 1, wherein the processing circuitry is further configured to obtain a center alignment, wherein the center alignment indicates that a center of the source dimension is to be aligned with a center of the playback dimension, and wherein the processing circuitry is configured to modify, based on the playback dimension, the source dimension, the tolerance, and the center alignment, the location of the audio element to obtain the modified location for the audio element.
10. The device of claim 1, wherein the processing circuitry is further configured to obtain a rotation, wherein the rotation indicates that the source dimension is to be rotate a front direction with respect to the playback dimension, and wherein the processing circuitry is configured to modify, based on the playback dimension, the source dimension, the tolerance, and the rotation, the location of the audio element to obtain the modified location for the audio element.
11. The device of claim 1, further comprising one or more speakers configured to reproduce, based on the one or more speaker feeds, a soundfield.
12. A method of processing an audio element, the method comprising: obtaining a playback dimension associated with a physical space in which playback of an audio bitstream is to occur, the audio bitstream representative of the audio element in an extended reality scene; obtaining a source dimension associated with a source space for the extended reality scene; obtaining a tolerance associated with the extended reality scene; modifying, based on the playback dimension, the source dimension, and the tolerance, a location of the audio element to obtain a modified location for the audio element; rendering, based on the modified location for the audio element, the audio element to one or more speaker feeds; and outputting the one or more speaker feeds.
13. The method of claim 12, wherein modifying the location of the audio element comprises: determining, based on the playback dimension and the source dimension, a rescale factor; and applying the rescale factor to the location of the audio element within the tolerance to obtain the modified location for the audio element.
14. The method of claim 13, further comprising obtaining, from the audio bitstream, a first syntax element indicating that auto rescale is to be performed for the audio element and a second syntax element indicating the tolerance, and wherein applying the rescale factor comprises automatically applying, for a duration in which the audio element is present for playback, the rescale factor to the location of the audio element within the tolerance to obtain the modified location for the audio element.
15. The method of claim 13, wherein determining the rescale factor comprises: determining the rescale factor as the playback dimension divided by the source dimension; and modifying the rescale factor based on the tolerance to obtain a modified rescale factor, and wherein applying the rescale factor comprises applying the modified rescale factor to the location of the audio element to obtain the modified location for the audio element.
16. The method of claim 12, wherein the playback dimension includes one or more of a width of the physical space, a length of the physical space, and a height of the physical space, and wherein the source dimension includes one or more of a width of the source space, a length of the source space, and a height of the source space.
17. The method of claim 12, wherein obtaining the tolerance comprises obtaining a syntax element defining the tolerance from the bitstream.
18. The method of claim 12, wherein the tolerance includes a height tolerance, a width tolerance, and a depth tolerance.
19. The method of claim 18, wherein the tolerance includes a minimum and maximum for each of the height tolerance, a width tolerance, and a depth tolerance.
20. A non-transitory computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to: obtain a playback dimension associated with a physical space in which playback of an audio bitstream is to occur, the audio bitstream representative of an audio element in an extended reality scene; obtain a source dimension associated with a source space for the extended reality scene; obtain a tolerance associated with the extended reality scene; modify, based on the playback dimension, the source dimension, and the tolerance, a location of the audio element to obtain a modified location for the audio element; render, based on the modified location for the audio element, the audio element to one or more speaker feeds; and output the one or more speaker feeds.

Description

SCALING AUDIO SOURCES IN EXTENDED REALITY SYSTEMS WITHIN TOLERANCES [0001] This application claims priority to U.S. Patent Application No. 18/762,424, filed July 2, 2024 and U.S. Provisional Patent Application No. 63/512,482, filed July 7, 2023, the entire contents of each of which are hereby incorporated by reference. U.S. Patent Application No. 18/762,424, filed July 2, 2024 claims the benefit of U.S. Provisional Patent Application no. 63/512,482, filed July 7, 2023. TECHNICAL FIELD [0002] This disclosure relates to processing of audio data. BACKGROUND [0003] Computer-mediated reality systems are being developed to allow computing devices to augment or add to, remove or subtract from, or generally modify existing reality experienced by a user. Computer-mediated reality systems (which may also be referred to as “extended reality systems,” or “XR systems”) may include, as examples, virtual reality (VR) systems, augmented reality (AR) systems, and mixed reality (MR) systems. The perceived success of computer-mediated reality systems are generally related to the ability of such computer-mediated reality systems to provide a realistically immersive experience in terms of both the visual and audio experience where the visual and audio experience align in ways expected by the user. [0004] Although the human visual system is more sensitive than the human auditory systems (e.g., in terms of perceived localization of various objects within the scene), ensuring an adequate auditory experience is an increasingly import factor in ensuring a realistically immersive experience, particularly as the visual experience improves to permit better localization of visual objects that enable the user to better identify sources of audio content. SUMMARY [0005] This disclosure generally relates to techniques for scaling audio sources in extended reality systems. Rather than require users to only operate extended reality systems in locations that permit one-to-one correspondence in terms of spacing with a source location at which the extended reality scene was captured and/or for which the extended reality scene was generated, various aspects of the techniques enable an extended reality system to scale a source location to accommodate a playback location. As such, if the source location includes microphones that are spaced 10 meters (10M) apart, the extended reality system may scale that spacing resolution of 10M to accommodate a scale of a playback location using a scaling factor that is determined based on a source dimension defining a size of the source location and a playback dimension defining a size of a playback location. Using the scaling provided in accordance with various aspects of the techniques described in this disclosure, the extended reality system may improve reproduction of the soundfield to modify a location of audio sources to accommodate the size of the playback space. [0006] However, even when scaling is employed, there are instances where the playback location, or in other words, a real world space is irregular (e.g., slanted walls, vaulted ceilings, domes ceilings, slanted ceilings, etc.) or a representation of the real world space is incomplete (e.g., a scan or mapping of the real world space contains elements, such as furniture, lighting fixtures, etc., that prevent a complete scan or mapping of the real world space). [0007] To accommodate irregular or incomplete representations of the real world space, various aspects of the techniques may enable the extended reality system to obtain a tolerance that defines a percentage of the extended reality scene that remains outside of the real world space. The creator of the extended reality scene may define the tolerance (which may be specified in the bitstream, or the user may specify and/or select a tolerance) by which to modify scaling of the extended reality scene to accommodate the real world space. In addition, various aspects of the techniques may enable the extended reality system to modify the scaling in three dimensions to accommodate irregular real world spaces. [0008] By including tolerance and scaling in three dimensions, various aspects of the techniques may enable the extended reality system to provide a more immersive experience that can account for irregular or incomplete representations of the real world space. In enabling such scaling, the extended reality system may improve an immersive experience for the user when consuming the extended reality scene given that the extended reality scene more closely matches the playback space. The user may then experience the entirety of the extended reality scene safely within the confines of the permitted playback space. In this respect, the techniques may improve operation of the extended reality system itself. [0009] In one example, the techniques are directed to a device configured to process an audio bitstream, the device comprising: a memory configured to store the audio bitstream representative of