JP-7855735-B2 - Method, system, and apparatus for acoustic 3D spread modeling for voxel-based geometric representations

JP7855735B2JP 7855735 B2JP7855735 B2JP 7855735B2JP-7855735-B2

Inventors

セティアワン，パンジ
テレンティフ，レオン
フィッシャー，ダニエル
フェルシュ，クリストフヨーゼフ

Assignees

ドルビー・インターナショナル・アーベー

Dates

Publication Date: 20260508
Application Date: 20230613
Priority Date: 20220615

Claims (16)

A method for rendering audio in an audio scene , which is performed by one or more processors , the method being: Step (S101) of receiving a voxel-based audio scene representation of the audio scene, wherein the audio scene representation includes indications for expansion voxels (205; 305) representing 3D expansion, along with a plurality of audio source signals for audio sources associated with the 3D expansion; The step (S102) is to obtain the coordinates of the intersections (201, 202; 301) within the aforementioned 3D spread; Step (S103) is to determine one or more line segments (203, 204; 303, 304) that pass through the aforementioned intersection (201, 202; 301) and extend along the respective coordinate directions of the audio scene representation, wherein the endpoints (203a, 203b, 204a, 204b; 303a, 303b, 304a, 304b) of each line segment (203, 204; 303, 304) are determined based on the coordinates of one or more spreading voxels (205; 305); A method comprising the step (S104) of assigning an audio source signal from among the plurality of audio source signals to an audio source position (308a, 308b, 309a, 309b) in the audio scene based on one or more line segments (203, 204; 303, 304).
The method according to claim 1, wherein the intersection is one of the geometric center of the 3D spread and the centroid of the 3D spread.
The method according to claim 1, wherein the endpoints of each line segment are determined based on the extreme coordinate values of the 3D spread along their respective coordinate directions, and the length of the line segment corresponds to the maximum dimension of the projection of the 3D spread in each coordinate direction.
The aforementioned audio scene representation further shows the concealed voxels, Assigning the aforementioned audio source signal includes assigning the aforementioned audio source signal to coordinates within a voxel other than the hidden voxel. The method according to claim 1.
The aforementioned audio scene representation further shows unfilled voxels, Assigning the aforementioned audio source signal includes assigning the audio source signal to the coordinates on each line segment that are closest to the endpoint of each line segment and located within a spread voxel or an unfilled voxel. The method according to claim 4.
The method according to claim 1, wherein assigning the audio source signal further comprises determining one or more possible target positions for assigning the audio source signal based on the line segment.
The aforementioned audio scene representation further shows unfilled voxels, Determining the one or more possible target locations involves selecting coordinates for the one or more possible target locations that are closest to the endpoints of each line segment and are located within an extended voxel or an unfilled voxel. The method according to claim 6.
The method according to claim 6, wherein determining the one or more possible target locations includes selecting coordinates for the one or more possible target locations that are closest to the endpoints of each line segment and located within the spread voxel.
This method further: A step of selecting the audio source position from the possible target positions based on a predefined minimum distance between audio sources; The step includes assigning an audio source signal from among the plurality of audio source signals to the selected audio source position. The method according to claim 6.
The method according to claim 1, further comprising the step of obtaining a mapping indicating the assignment of the audio source signal to the audio source location.
The method according to claim 10, further comprising assigning gain to the audio source location based at least partially on the mapping.
This method further: The step of obtaining the coordinates of the listener's location; The steps include rendering the assigned audio source signal based on a reference distance between the listener's position and the 3D spread, The method according to claim 1.
The method according to claim 12, further comprising rendering the audio source signal based on concealment and diffraction modeling.
A device (1100) for rendering audio in a voxel-based audio scene, the device comprising one or more processors (1101, 1102) configured to perform a method, the method being: Step (S101) of receiving a voxel-based audio scene representation of the audio scene, wherein the audio scene representation includes indications for expansion voxels (205; 305) representing 3D expansion, along with a plurality of audio source signals for audio sources associated with the 3D expansion; The step (S102) is to obtain the coordinates of the intersections (201, 202; 301) within the aforementioned 3D spread; Step (S103) is to determine one or more line segments (203, 204; 303, 304) that pass through the aforementioned intersection (201, 202; 301) and extend along the respective coordinate directions of the audio scene representation, wherein the endpoints (203a, 203b, 204a, 204b; 303a, 303b, 304a, 304b) of each line segment (203, 204; 303, 304) are determined based on the coordinates of one or more spreading voxels (205; 305); Apparatus comprising the step (S104) of assigning an audio source signal from among the plurality of audio source signals to an audio source position (308a, 308b, 309a, 309b) in the audio scene based on one or more line segments (203, 204; 303, 304).
A program that, when executed by a processor, includes instructions causing the processor to perform the method described in any one of claims 1 to 13.
A computer-readable storage medium storing the program described in claim 15.

Description

Cross-reference to Related Applications This application claims priority to U.S. Provisional Application No. 63/352,360, filed on 15 June 2022, and U.S. Provisional Application No. 63/441,120, filed on 25 January 2023, all of which are incorporated herein by reference in their entirety. Technical field. This disclosure broadly relates to a method for rendering audio within an audio scene, particularly based on a voxel-based audio scene representation of the audio scene. This disclosure further relates to the respective apparatus and computer program products. While this specification describes several embodiments with particular reference to the disclosure, it will be understood that this disclosure is not limited to such domains of use and is applicable in a broader context. Any discussion of background technology throughout this disclosure should not be construed as an acknowledgment that such technology is widely known or constitutes part of common technical knowledge in the art. The Motion Picture Encoding (MPEG) is a union of working groups jointly established by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) to set standards for media encoding, including audio encoding. MPEG is organized under ISO/IEC SC29, and the audio group is currently identified as Working Group (WG) 6. WG6 is currently working on the MPEG-I audio standard. The new MPEG-I standard enables different perspectives and/or viewpoints or listening positions for audio experiences by supporting various degrees of freedom, such as 3-degrees-of-freedom (3DOF) or 6-degrees-of-freedom (6DoF) motion in virtual reality (VR), augmented reality (AR), mixed reality (MR), and/or extended reality (XR) applications, as well as various movements within and around such scenes. 6DoF interaction extends the 3DoF spherical video/audio experience, which is limited to head rotation (pitch, yaw, roll), to include translational motion (forward/backward, up/down, left/right) in addition to head rotation, allowing for navigation within the virtual environment (e.g., physically walking around a room). For audio rendering in VR, AR, MR, and XR applications, object-based approaches are widely used, representing complex auditory scenes as multiple distinct audio objects, each associated with parameters or metadata defining its position/location and trajectory within the scene. Alternatively, audio rendering in such environments also utilizes Higher-Order Ambisonics (HOA). Audio objects are typically represented as point sources (without extent). Where used here, an audio source with "extent" is an audio source waveform associated with a spatial domain (a domain larger than a point). For example, a piano can be represented as an audio source with a cubic extent (e.g., stereo or mono L/R) instead of simply as a point source. The use of extent allows for improved audio experience for the user, for example, when the user is around a virtual piano object in a VR, AR, MR, or XR environment. In this example, the extent representing the piano for audio rendering does not need to have the strict physical details of a real piano. To reflect the sonic effects of audio objects with spatial properties, such audio objects may be represented by voxel-based geometry. Voxels for audio rendering are important for media environments implemented in both hardware and software, such as video games and/or VR, AR, MR, and XR environments. However, there is still an existing need for improved rendering of 3D extended sound effects represented by voxel-based geometry, and in particular, it is desirable to simplify the process and reduce the computational load. Herein, exemplary embodiments of this disclosure will be described simply by reference to the accompanying drawings. An example of a method for rendering audio within an audio scene according to embodiments of this disclosure is shown.An example of a voxel-based audio scene representation of an audio scene according to an embodiment of the present disclosure is shown.This embodiment of the disclosure shows an example of assigning an audio source to an audio source location within an audio scene.Another example of assigning an audio source to an audio source location within an audio scene, according to embodiments of this disclosure, is shown.This disclosure provides an exemplary use case of an example of a method for rendering audio within an audio scene according to embodiments of this disclosure.This disclosure provides an exemplary use case of an example of a method for rendering audio within an audio scene according to embodiments of this disclosure.This disclosure provides an exemplary use case of an example of a method for rendering audio within an audio scene according to embodiments of this disclosure.This disclosure provides an exemplary use case of an example of a method for rendering audio within an audio scene according to embodiments of this disclosure.Thi