CN-121985284-A - Acoustic horn immersive sound field rendering method and system

CN121985284ACN 121985284 ACN121985284 ACN 121985284ACN-121985284-A

Abstract

The embodiment of the application provides an immersion type sound field rendering method and system of an acoustic horn, and relates to the technical field of sound field rendering, wherein the method comprises the steps of obtaining a user position and a head gesture in a virtual space; the method comprises the steps of obtaining the position and intensity of a virtual space sound source, identifying a focus sound source which is focused by a user currently according to the position, the head gesture, the position and intensity of the sound source, identifying the sound source except the focus sound source as an environment sound source, performing sound field rendering on the environment sound source to obtain a shared environment sound field signal, wherein the computing resource of the environment sound source is lower than that of the focus sound source, performing convolution operation on an original audio signal of the focus sound source and a head acoustic transfer function to obtain a personalized high-resolution signal, superposing the shared environment sound field signal and the personalized high-resolution signal to generate a binaural signal, and playing the binaural signal through an acoustic horn array. The application can improve the immersion sense and the actual effect of virtual reality training.

Inventors

LU ZUOHU
YANG YONGGUANG

Assignees

深圳市云科实业有限公司

Dates

Publication Date: 20260505
Application Date: 20260206

Claims (10)

1. An acoustic horn immersive sound field rendering method is characterized by comprising the following steps: Acquiring a user position and a head gesture in a virtual space; Acquiring the position and intensity of a virtual space sound source; Identifying a focus sound source currently focused by a user according to the user position, the head gesture, the sound source position and the intensity; Identifying sound sources other than the focal point sound source as ambient sound sources; performing sound field rendering on the environment sound source to obtain a shared environment sound field signal, wherein the computing resource of the environment sound source is lower than that of the focus sound source; performing convolution operation on the original audio signal of the focus sound source and the head acoustic transfer function to obtain a personalized high-resolution signal; and superposing the shared environment sound field signal and the personalized high-resolution signal to generate a binaural signal, and playing the binaural signal through an acoustic horn array.
2. The method of claim 1, wherein the step of convolving the focal point source original audio signal with the head acoustic transfer function to obtain the personalized high resolution signal comprises: acquiring the head orientation of the user according to the head posture; According to the head orientation, a head acoustic transfer function corresponding to the head orientation is called from a pre-stored head transfer function database; And carrying out convolution operation on the original audio signal of the focus sound source and the extracted head acoustic transfer function to obtain a personalized high-resolution signal.
3. The method of claim 1, wherein the step of superimposing the shared ambient sound field signal with the personalized high resolution signal to generate a binaural signal for playback through an acoustic horn array comprises: Acquiring position and posture information of a virtual object in a virtual space and geometric information of the virtual object; for each focus sound source, calculating a sound propagation path from the focus sound source to both ears of a user; Predicting a possible shielding event according to the position and posture information of the virtual object and the geometric information of the virtual object by combining with a preset operation path of a current training task; when the shielding event is predicted, the shared environment sound field signal and the personalized high-resolution signal are overlapped to generate a binaural signal, the binaural signal is played through an acoustic horn array, and the acoustic horn controls delay and gain by using a beam forming technology so as to ensure that the binaural signal forms a sound pressure peak value at a target position of an ear of a user.
4. The method of claim 1, wherein the step of identifying a focused sound source currently focused by a user based on the user position, the head pose, the sound source position, and the intensity comprises: calculating the distance between the sound source position and the head of the user according to the user position and the sound source position to obtain Euclidean distance; determining a potential focus sound source according to the Euclidean distance; acquiring the head orientation of the user according to the head posture; and if the potential focus sound source is positioned in a preset angle range in front of the head direction of the user or the intensity of the potential focus sound source is larger than a preset threshold value, identifying the potential focus sound source as the focus sound source of current attention of the user.
5. The method according to claim 1, wherein the method further comprises: Acquiring the utilization rate of a processor and the temperature of the processor in real time; And adjusting an environment sound field rendering strategy according to the processor utilization rate and the processor temperature so as to ensure the rendering computing resource of the focus sound source.
6. The method of claim 5, wherein the step of adjusting the strategy of ambient sound field rendering to guarantee the rendering computing resources of the focal sound source according to the processor utilization and the processor temperature comprises: acquiring a processor utilization rate threshold and the processor temperature threshold; When the processor utilization exceeds a utilization threshold or the processor temperature exceeds a temperature threshold, a rendering strategy for progressively reducing the ambient sound field is adopted, wherein the rendering strategy at least comprises one of the following steps: The reflection calculation precision of the environmental sound source is reduced; Reducing the update frequency of background sounds.
7. The method of claim 6, wherein the step of obtaining the processor utilization threshold and the processor temperature threshold comprises: the heat conduction performance of a heat dissipation system of the processor is evaluated, and a heat conduction performance evaluation result is obtained; Determining correction parameters of a processor utilization rate threshold and a processor temperature threshold according to the heat conducting performance evaluation result; And correcting the preset utilization rate and the preset temperature according to the correction parameters to obtain the processor utilization rate threshold and the processor temperature threshold.
8. The method according to claim 1, wherein the method further comprises: Monitoring a virtual environment for non-focal acoustic events; when the non-focus acoustic event occurs, adjusting a sound field rendering strategy to ensure rendering resources of the focus sound source, wherein the sound field rendering strategy comprises at least one of the following steps: progressive degradation rendering is carried out on the shared environment sound field signal; the update frequency of the haptic feedback and the non-key visual special effects is reduced so as to ensure that the focus sound source task can preempt the processor resources.
9. The method of claim 8, wherein the step of monitoring for non-focal acoustic events in the virtual environment comprises: receiving real-time information of all sound sources, wherein the real-time information comprises loudness and frequency of the sound sources; and when the loudness of the sound source exceeds a loudness threshold and the frequency exceeds a frequency threshold, determining that the sound source is a non-focal acoustic event.
10. An acoustic horn immersive sound field rendering system, the system comprising: The user information acquisition module is used for acquiring the position and the head gesture of the user in the virtual space; The sound source information acquisition module is used for acquiring the position and the intensity of a virtual space sound source; A focus sound source determining module, configured to identify a focus sound source currently focused by a user according to the user position, the head pose, the sound source position, and the intensity; An environmental sound source determining module, configured to identify a sound source other than the focal point sound source as an environmental sound source; the shared environment sound field signal obtaining module is used for performing sound field rendering on the environment sound source to obtain a shared environment sound field signal, and the computing resource of the environment sound source is lower than that of the focus sound source; The personalized high-resolution signal obtaining module is used for carrying out convolution operation on the original audio signal of the focus sound source and the head acoustic transfer function to obtain a personalized high-resolution signal; And the binaural signal obtaining module is used for superposing the shared environment sound field signal and the personalized high-resolution signal to generate a binaural signal, and playing the binaural signal through the acoustic horn array.

Description

Acoustic horn immersive sound field rendering method and system Technical Field The application relates to the technical field of sound field rendering, in particular to an acoustic horn immersive sound field rendering method and system. Background In the related art, when the acoustic horn performs sound field rendering, a troublesome problem is often faced, namely, how to ensure that the sound space positioning is accurate enough and avoid that the system becomes sluggish due to excessive calculation. This imbalance between spatial resolution and computational efficiency often results in users having difficulty in accurately determining the source of sound in the virtual world, as well as being overwhelming with the computational units of the support system. Disclosure of Invention The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides an immersive sound field rendering method and system for an acoustic horn, and aims to solve the problems that when the acoustic horn is used for sound field rendering, the spatial resolution and the calculation efficiency are unbalanced, so that a user is difficult to accurately judge the sound source, the load of a system calculation unit is excessive, the sound source positioning is fuzzy, the system response is slow, the audio-visual synchronism is damaged and the like in a multi-user collaborative training scene. In a first aspect, an embodiment of the present application provides a method for rendering an immersion sound field of an acoustic horn, including: Acquiring a user position and a head gesture in a virtual space; Acquiring the position and intensity of a virtual space sound source; identifying a focus sound source currently focused by a user according to the position, the head gesture, the sound source position and the intensity of the user; identifying sound sources other than the focal point sound source as ambient sound sources; performing sound field rendering on the environmental sound source to obtain a shared environmental sound field signal, wherein the computing resource of the environmental sound source is lower than that of the focus sound source; performing convolution operation on the original audio signal of the focus sound source and the head acoustic transfer function to obtain a personalized high-resolution signal; And superposing the shared environment sound field signal and the personalized high-resolution signal to generate a binaural signal, and playing the binaural signal through the acoustic horn array. Further, on the basis of the method, the step of performing convolution operation on the original audio signal of the focus sound source and the head acoustic transfer function to obtain the personalized high-resolution signal comprises the following steps: Acquiring the head orientation of the user according to the head posture; According to the head orientation, a head acoustic transfer function corresponding to the head orientation is called from a pre-stored head transfer function database; and carrying out convolution operation on the original audio signal of the focus sound source and the extracted head acoustic transfer function to obtain a personalized high-resolution signal. In some preferred embodiments, the step of generating a binaural signal by superimposing the shared ambient sound field signal with the personalized high resolution signal, the playing through the acoustic horn array comprises: Acquiring position and posture information of a virtual object in a virtual space and geometric information of the virtual object; for each focus sound source, calculating a sound propagation path from the focus sound source to both ears of the user; Predicting a possible shielding event according to the position and posture information of the virtual object and the geometric information of the virtual object by combining with a preset operation path of the current training task; When a shielding event is predicted, the shared environment sound field signal and the personalized high-resolution signal are overlapped to generate a binaural signal, the binaural signal is played through an acoustic horn array, and the acoustic horn controls delay and gain by using a beam forming technology so as to ensure that the binaural signal forms a sound pressure peak value at a target position of an ear of a user. Still further, the step of identifying the focus sound source of the current interest of the user based on the user position, the head pose, the sound source position, and the intensity includes: calculating the distance between the sound source and the head of the user according to the position of the user and the position of the sound source to obtain Euclidean distance; determining a potential focus sound source according to the Euclidean distance; Acquiring the head orientation of the user according to the head posture; and if the potential focus soun