CN-122002183-A - Sound effect optimization method, device, equipment and storage medium

CN122002183ACN 122002183 ACN122002183 ACN 122002183ACN-122002183-A

Abstract

The invention relates to the technical field of electric digital data processing, and discloses an audio effect optimization method, an audio effect optimization device, audio effect optimization equipment and a storage medium, which are used for improving the self-adaptive capacity of a sound field. The sound effect optimization method comprises the steps of obtaining relative position relation data between each sound box and a user, carrying out acoustic space modeling based on the relative position relation and combining analysis of room impulse response to obtain a real-time space model containing geometric attributes and acoustic propagation characteristics, carrying out calculation and distribution of sound channel mapping weights based on the real-time space model to generate a sound channel distribution scheme which dynamically changes along with the position of the user, and carrying out real-time adjustment of equalizer curves, delay compensation and phase parameters on each sound box according to the sound channel distribution scheme.

Inventors

WANG QI
ZHAO LIFENG

Assignees

南京乐韵瑞信息技术有限公司

Dates

Publication Date: 20260508
Application Date: 20251216

Claims (15)

1. The sound effect optimization method is characterized by comprising the following steps of: acquiring relative position relation data between each sound box and a user; based on the relative position relation, carrying out acoustic space modeling in combination with analysis of room impulse response to obtain a real-time space model containing geometric attributes and acoustic propagation characteristics; calculating and distributing the channel mapping weight based on the real-time space model to generate a channel distribution scheme which dynamically changes along with the position of the user; and according to the channel allocation scheme, performing equalizer curve, delay compensation and real-time adjustment of phase parameters on each sound box.
2. The sound optimizing method according to claim 1, wherein the obtaining the data of the relative positional relationship between each sound box and the user includes: the method comprises the steps of obtaining first relative position data between devices through preliminary calculation through wireless communication signal intensity and flight time between a master sound box and each slave sound box; acquiring visual scene information through an intelligent camera associated with the system, and identifying the position and the outline of the user by utilizing a computer visual algorithm to obtain second relative position data between the user and the main sound box; And inputting the first relative position data and the second relative position data into a Kalman filtering model to perform data fusion and track prediction, and outputting the dynamic relative position relationship between each sound box and a user.
3. The sound effect optimization method according to claim 1, wherein the performing spatial modeling processing according to the relative position relationship data to obtain a real-time spatial model describing a spatial distribution relationship between a sound box and a user comprises: based on the relative position relationship, constructing an initial space network for describing the geometric position relationship between the sound box and the user; Based on the initial space network, playing a known test signal through at least one sound box, receiving the known test signal through at least one microphone, and extracting impulse response of a room by calculating a cross-correlation function between the test signal and a received signal; and analyzing the impulse response, identifying and quantifying the energy and time delay of direct sound, early reflected sound and reverberant sound, mapping to the initial space network, and generating a real-time space model with geometrical properties and acoustic propagation characteristics.
4. A method of optimizing sound effects according to claim 3, wherein said constructing an initial spatial network describing a geometric positional relationship between a sound box and a user based on said relative positional relationship comprises: acquiring absolute position coordinates of each sound box and a user based on the relative position relation; Connecting each sound box with a user position point to construct a space triangular grid describing a sound wave propagation path; labeling the equipment type attribute of each node in the space triangular mesh, and calculating the linear distance between each adjacent node to complete the construction of the initial space network.
5. A method of optimizing sound effects according to claim 3, wherein said extracting the impulse response of the room by calculating the cross-correlation function between the test signal and the received signal based on the initial spatial network by playing the known test signal through at least one speaker and receiving it by at least one microphone, comprises: Dynamically selecting one or more optimal sound boxes as a test signal emission source according to the topological structure and node density of the initial space network, and generating a section of composite excitation signal combining an exponential sweep frequency signal and a pseudo-random sequence for playing; Synchronously collecting reverberation signals reflected by a room through a microphone array distributed in a typical listening area of a user, and adopting a self-adaptive filter based on minimum mean square error to perform echo cancellation and background noise suppression on the collected signals; And carrying out rapid cross-correlation operation on the preprocessed received signal and the original excitation signal, and separating and extracting a high signal-to-noise ratio impulse response which only reflects the acoustic transmission characteristic of the room by adopting a frequency domain weighted wiener deconvolution algorithm.
6. A method of optimizing sound effects according to claim 3, wherein said analyzing said impulse response, identifying and quantifying energy and delay of direct sound, early reflected sound and reverberant sound, and mapping to said initial spatial network, generating a real-time spatial model having both geometric properties and acoustic propagation characteristics, comprises: In the impulse response, a time domain interval of the direct sound, the early reflected sound and the reverberant sound is divided based on a dynamic threshold detection algorithm, and the arrival time, the sound pressure level and the frequency spectrum characteristics of acoustic components in each interval are quantitatively recorded; inputting the quantized early reflected sound sequence into a cluster analysis model, identifying a main reflected sound cluster, and reconstructing a corresponding virtual reflecting surface in the initial space network by a sound ray tracing method according to the arrival time difference and azimuth information; Fusing the decay time parameters of the reverberation sound field with the reconstructed virtual reflecting surface structure, and marking the acoustic properties of the initial space network to generate a real-time space model comprising geometric topology and acoustic propagation paths.
7. The sound optimizing method according to claim 1, wherein the calculating and distributing the channel mapping weights based on the real-time spatial model generates a channel distribution scheme dynamically changing with the user position, and the method comprises: calculating a binaural acoustic transfer function from each sound box to a user position based on the geometric relationship and acoustic propagation characteristics of each sound box and the user in the real-time space model; The binaural acoustic transfer function is taken as a sound field reconstruction target, a vector base amplitude translation algorithm is adopted to decompose a standard multichannel signal based on a sound image positioning principle, and initial weight coefficients corresponding to all sound boxes are calculated; And combining real-time movement data of a user, and applying a psychoacoustic model to dynamically optimize and smooth transition the initial weight according to the constraint relation of the binaural acoustic transfer function to sound image positioning to generate a sound channel allocation scheme.
8. The method of optimizing sound effects according to claim 7, wherein calculating binaural acoustic transfer functions for each speaker to a user position based on the geometric relationship and acoustic propagation characteristics of the user and each speaker in the real-time spatial model comprises: Extracting direct sound propagation paths from each sound box to ears of a user in the real-time space model, and calculating the length difference and azimuth angle of each path; Simulating and calculating the superposition effect of early reflected sound and reverberant sound on each propagation path by combining the room reflection parameters obtained by impulse response analysis; based on the acoustic wave interference principle, the amplitude, time delay and phase relation of direct sound and reflected sound are synthesized, and a binaural acoustic transfer function containing a spatial filtering effect is generated.
9. The sound optimizing method according to claim 7, wherein the calculating the initial weight coefficient corresponding to each sound box by decomposing the standard multi-channel signal with the vector base amplitude translation algorithm based on the sound image positioning principle with the binaural acoustic transfer function as a sound field reconstruction target comprises: Mapping each channel in the standard multichannel signal to a virtual sound source plane to generate target sound field distribution containing the azimuth information of each virtual sound source; Constructing a sound field reconstruction mathematical model based on a vector base amplitude translation algorithm by taking the target sound field distribution as a desired sound image and taking the binaural acoustic transfer function as acoustic constraint of sound field reconstruction; and solving the sound field reconstruction mathematical model, and calculating the reconstruction contribution degree of each sound box to each virtual sound source in the target sound field under the acoustic constraint, so as to obtain the initial weight coefficient of each sound box to each channel signal.
10. The sound optimizing method of claim 7, wherein the generating a channel allocation scheme by combining real-time movement data of the user and dynamically optimizing and smoothing the initial weights by applying a psychoacoustic model according to a constraint relation of the binaural acoustic transfer function to sound image localization comprises: Calculating vector deviation of the current sound image position and the target sound image position based on the user head position and the orientation data acquired in real time by combining the binaural acoustic transfer function; inputting the vector deviation into a psychoacoustic model to generate dynamic compensation coefficients for initial weights of the sound boxes; and applying the dynamic compensation coefficient to the initial weight coefficient by adopting a first-order inertia smoothing algorithm to generate a channel allocation scheme.
11. The sound optimizing method according to claim 1, wherein the performing real-time adjustment of equalizer curves, delay compensation and phase parameters for each sound box according to the channel allocation scheme includes: Based on the weight roles of the sound boxes in the sound channel allocation scheme and the acoustic propagation paths in the real-time space model, generating a personalized target balance curve for each sound box, wherein the near-field human voice sound box improves the medium-frequency definition, and the far-field surrounding sound box expands the high-frequency space sense; Calculating microsecond delay compensation values according to the length difference of acoustic paths of all sound boxes by taking time synchronization of the sound waves transmitted to the user position as a reference, and applying group delay calibration to all sound boxes; And (3) by analyzing the impulse response coherence of each sound box in the cross frequency band, calculating and applying an inverse compensation filter, eliminating phase offset caused by multi-sound box interference, and completing the collaborative optimization of acoustic parameters.
12. The method for optimizing sound effects according to claim 11, wherein calculating microsecond delay compensation values based on the difference in length of acoustic paths of the sound boxes based on time synchronization of the sound waves propagating to the user position, and applying group delay calibration to all sound boxes, comprises: Selecting a sound box with the longest acoustic path reaching the user position as a reference, and calculating the sound wave propagation time difference of other sound boxes relative to the reference; converting the sound wave propagation time difference into a digital delay parameter of sampling point precision, and configuring a corresponding fractional delay filter; The delay calibration of sub-sampling precision is realized by a multi-phase filtering interpolation technology, so that the synchronization of sound waves emitted by all sound boxes at the user position is ensured.
13. An acoustic optimization apparatus, characterized in that the acoustic optimization apparatus comprises: the acquisition module is used for acquiring relative position relation data between each sound box and a user; the modeling module is used for carrying out acoustic space modeling in combination with analysis of room impulse response based on the relative position relation to obtain a real-time space model containing geometric attributes and acoustic propagation characteristics; the generation module is used for calculating and distributing the channel mapping weight based on the real-time space model and generating a channel distribution scheme which dynamically changes along with the position of the user; And the adjusting module is used for carrying out real-time adjustment on equalizer curves, delay compensation and phase parameters on each sound box according to the sound channel allocation scheme.
14. An electronic device comprising a memory and at least one processor, the memory having instructions stored therein; The at least one processor invoking the instructions in the memory to cause the electronic device to perform the sound effect optimization method of any of claims 1-12.
15. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the sound effect optimization method of any one of claims 1-12.

Description

Sound effect optimization method, device, equipment and storage medium Technical Field The present invention relates to the field of electronic digital data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for optimizing sound effects. Background Currently, a main stream multi-speaker system mostly adopts a fixed channel allocation mode, that is, audio rendering is performed according to preset roles such as left and right channels, a center channel, a low-frequency effect channel and the like when equipment is initialized. The system typically relies on static configuration parameters (e.g., speaker distance, channel role) for Equalizer (EQ) and delay compensation, and only supports manual user calibration, not responsive to device displacement or user movement during use. The static sound field optimization mode has obvious limitation that when the sound box is moved or the listener position is changed, the system cannot sense the change of the spatial relationship, so that the problems of sound image positioning deviation, human voice to white definition reduction, overlapping or missing of low-frequency response and the like are caused. In addition, the traditional method lacks real-time analysis capability of room impulse response, is difficult to adapt to acoustic propagation characteristics in complex environments, and finally affects consistency and stability of hearing experience. Disclosure of Invention The invention provides an audio optimization method, an audio optimization device, audio optimization equipment and a storage medium, which are used for solving the problems that in the prior art, because static sound field configuration is adopted, equipment displacement and listener movement cannot be adapted, so that sound image positioning offset, audio definition is reduced and hearing experience is inconsistent. The invention provides an audio optimization method, which comprises the steps of obtaining relative position relation data between each sound box and a user, carrying out acoustic space modeling by combining analysis of room impulse response based on the relative position relation to obtain a real-time space model comprising geometric attributes and acoustic propagation characteristics, carrying out calculation and distribution of channel mapping weights based on the real-time space model to generate a channel distribution scheme which dynamically changes along with the position of the user, and carrying out real-time adjustment of equalizer curves, delay compensation and phase parameters on each sound box according to the channel distribution scheme. In a feasible implementation mode, the method for acquiring the relative position relation data between the sound boxes and the users comprises the steps of obtaining first relative position data between the devices through preliminary calculation of wireless communication signal intensity and flight time between a master sound box and each slave sound box, acquiring visual scene information through an intelligent camera associated with the system, identifying the position and the outline of the users through a computer visual algorithm to obtain second relative position data between the users and the master sound box, inputting the first relative position data and the second relative position data into a Kalman filtering model to conduct data fusion and track prediction, and outputting dynamic relative position relation between each sound box and the users. In a possible implementation manner, the spatial modeling processing is performed according to the relative position relation data to obtain a real-time spatial model describing the spatial distribution relation of the sound box and the user, and the method comprises the steps of constructing an initial spatial network describing the geometric position relation of the sound box and the user based on the relative position relation, playing a known test signal through at least one sound box based on the initial spatial network, receiving the known test signal through at least one microphone, extracting an impulse response of a room by calculating a cross-correlation function between the test signal and a received signal, analyzing the impulse response, identifying and quantifying energy and delay of direct sound, early reflected sound and reverberant sound, mapping the energy and delay to the initial spatial network, and generating the real-time spatial model with geometric properties and acoustic propagation characteristics. In a possible implementation manner, the method for constructing the initial space network describing the geometric position relationship between the sound box and the user based on the relative position relationship comprises the steps of acquiring absolute position coordinates of each sound box and the user based on the relative position relationship, connecting each sound box with a user position point to co