US-12627935-B2 - Deep learning-based method for acoustic feedback suppression in closed-loop system

US12627935B2US 12627935 B2US12627935 B2US 12627935B2US-12627935-B2

Abstract

A deep learning-based method for acoustic feedback suppression in a closed-loop system, the method includes applying an offline trained closed-loop system suppression model, processing an audio signal input, and then feeding the processed audio signal to a sound reproduction unit of the closed-loop system for playback to achieve acoustic feedback suppression, the closed-loop system suppression model being built based on deep learning; and modeling the closed-loop system, generating a unit impulse response of an acoustic feedback path by simulation, and calculating a maximum stable gain according to each simulated unit impulse response, and generating a closed-loop signal based on the maximum stable gain; generating an open-loop target signal under an open-loop condition by using the audio signal input to the closed-loop system; forming parallel training data of the model by putting the closed-loop signal and open-loop target signal together, and training the model by using the generated parallel training data.

Inventors

Chengshi ZHENG
Xiaodong Li

Assignees

INSTITUTE OF ACOUSTICS, CHINESE ACADEMY OF SCIENCES

Dates

Publication Date: 20260512
Application Date: 20220825
Priority Date: 20220714

Claims (8)

1 . A deep learning-based method for acoustic feedback suppression in a closed-loop system, the method comprising: applying an offline trained closed-loop system suppression model to the closed-loop system, processing an audio signal input to the closed-loop system, and then feeding the processed audio signal to a sound reproduction unit of the closed-loop system for playback to achieve acoustic feedback suppression, the closed-loop system suppression model being built based on deep learning; and modeling the closed-loop system, generating a unit impulse response of an acoustic feedback path by simulation, and calculating a maximum stable gain according to the unit impulse response, and generating a closed-loop signal based on the maximum stable gain; generating an open-loop target signal under an open-loop condition by using the audio signal input to the closed-loop system; forming parallel training data of the model by putting the closed-loop signal and the open-loop target signal together, and training the model by using the generated parallel training data, wherein the model is trained in an offline training mode in the method, comprising the following steps in the training: step 1: modeling the closed-loop system of acoustic feedback, and generating a unit impulse response of an acoustic feedback path by simulation according to an application scenario; step 2: establishing a training open-loop system based on deep learning; calculating a maximum stable gain according to the unit impulse response of the acoustic feedback path, determining a forward gain of the open-loop system based on the maximum stable gain, inputting an audio signal and generating an open-loop signal as target audio for training, and in the closed-loop system, inputting a noisy audio signal and generating a noisy closed-loop signal with feedback, the closed-loop signal and the open-loop signal together constituting parallel training data of the model; step 3: performing feature extraction of the training data and target mapping of a deep learning neural network; step 4: designing a deep learning neural network architecture and hyper-parameters; and step 5: selecting an appropriate loss function to train the model to obtain a trained closed-loop system suppression model, wherein generating a unit impulse response of an acoustic feedback path by simulation is expressed as: f ⁡ ( t ) = sin ⁡ ( 2 ⁢ π ⁢ f e ⁢ n ⁢ v ⁢ t + φ e ⁢ n ⁢ v ) ⁢ r ⁡ ( t ) ⁢ exp ⁡ ( - σ ⁢ P ⁡ ( t - t f ) ) P ⁡ ( t - t f ) = { 0 , t < t f t - t f , t ≥ t f where f env is a modulation frequency, φ env is a random phase, r(t) is a zero-mean Gaussian process, σ is a decay function, where σ≥0, and t f represents the time when exponential decay of the transfer function starts; in the closed-loop system, a forward path amplification module is expressed as: g ⁡ ( t ) = G ⁢ δ ⁡ ( t - τ ⁢ f s ) where δ(⋅) represents the Dirac function, and G is in the range of G∈[0.5G max , 0.999G max ]; and a signal u(t) not subject to feedback suppression processing fed to a speaker and a microphone pickup signal y(t) are obtained from f(t), g(t) and v(t), where v(t) is an external audio signal.
2 . The deep learning-based method for acoustic feedback suppression in a closed-loop system according to claim 1 , wherein the closed-loop system comprises a forward path amplification module and a delay module; and the modeling of the closed-loop system of acoustic feedback is expressed as: y ⁡ ( t ) = v ⁡ ( t ) + u ⁡ ( t ) * f ⁡ ( t ) where t is sampling time, * is convolution operation, v (t) is the external audio signal, u(t)=y(t)*g(t), u(t), with the forward path g(t) being a time-domain signal fed to a speaker, y(t) being a pickup signal of the closed-loop system, and f(t) is the unit impulse response of the acoustic feedback path.
3 . The deep learning-based method for acoustic feedback suppression in a closed-loop system according to claim 2 , wherein generating a unit impulse response of an acoustic feedback path by simulation comprises: the delay module performing Fourier transform on signals in the closed-loop system, which is expressed as: Y ⁡ ( ω ) = V ⁡ ( ω ) + U ⁡ ( ω ) ⁢ F ⁡ ( ω ) ⁢ U ⁡ ( ω ) = Y ⁡ ( ω ) ⁢ G ⁡ ( ω ) where ω is an angular frequency, Y(ω) is Fourier transform of y(t), F(ω) is Fourier transform of f(t), V(ω) is Fourier transform of v(t), U(ω) is Fourier transform of u(t), G(ω) is Fourier transform of g(t), and frequency-related gains in the forward path are unified in the feedback path F(ω).
4 . The deep learning-based method for acoustic feedback suppression in a closed-loop system according to claim 3 , wherein G(ω) is set to a constant G, and if G is related to the angular frequency, a transfer function in the closed-loop system is: U ⁡ ( ω ) V ⁡ ( ω ) = G 1 - GF ⁡ ( ω ) according to the Nyquist instability criterion, if a loop gain function meets the following conditions: { ∠ ⁢ GF ⁡ ( ω ) = 2 ⁢ n ⁢ π ❘ "\[LeftBracketingBar]" GF ⁡ ( ω ) ❘ "\[RightBracketingBar]" ≥ 1 , n = 0 , TagBox[",", "NumberComma", Rule[SyntaxForm, "0"]] 1 ⁢ , TagBox[RowBox[List[",", " "]], "NumberComma", Rule[SyntaxForm, "0"]] ⁢ 2 ⁢ … where ∠● represents taking a phase, |⋅| represents taking a modulus; that is, at a specific frequency where the angular frequency is ω, if the modulus of the loop gain function is greater than or equal to 1, and a phase angle of the loop gain function is integer n times of 2π, a sound reinforcement system oscillates, resulting in howling, and thus the maximum stable gain G max of the closed-loop system is obtained, which is expressed as: G max = 1 max ω ∈ Ω ( ❘ "\[LeftBracketingBar]" F ⁡ ( ω ) ❘ "\[RightBracketingBar]" ) Ω = { ω ❘ ∠ ⁡ ( F ⁡ ( ω ) ⁢ exp ⁡ ( - j ⁢ ω ⁢ τ ⁢ f s ) ) = 2 ⁢ n ⁢ π } where Ω is a combination of frequencies that satisfy the phase condition of the Nyquist instability criterion, τ corresponds to delays of all signal processing systems in the sound reinforcement system, f s is a sampling frequency, and j is an imaginary symbol.
5 . The deep learning-based method for acoustic feedback suppression in a closed-loop system according to claim 4 , wherein if the closed-loop system further comprises an adaptive feedback cancellation module and a post-processing module, the transfer function in the closed-loop system is expressed as: U ⁡ ( ω ) V ⁡ ( ω ) = GH ⁡ ( ω ) 1 - GH ⁡ ( ω ) ⁢ ( F ⁡ ( ω ) - F ˆ ( ω ) ) where {circumflex over (F)}(ω) is Fourier transform of {circumflex over (f)}(f), with {circumflex over (f)}(t) being a unit impulse response of the feedback path identified by an adaptive method; and H(ω) is Fourier transform of h(t), with h(t) being a unit impulse response of the post-processing module; the closed-loop system with adaptive feedback cancellation and post-processing becomes unstable if a loop gain function satisfies the following conditions: { ∠ ⁢ G ⁢ H ⁡ ( ω ) ⁢ ( F ⁡ ( ω ) - F ˆ ( ω ) ) = 2 ⁢ n ⁢ π ❘ "\[LeftBracketingBar]" GH ⁡ ( ω ) ⁢ ( F ⁡ ( ω ) - F ˆ ( ω ) ) ❘ "\[RightBracketingBar]" ≥ 1 , n = 0 , TagBox[",", "NumberComma", Rule[SyntaxForm, "0"]] 1 , TagBox[",", "NumberComma", Rule[SyntaxForm, "0"]] 2 ⁢ … ; and in this case, the maximum stable gain of the closed-loop system is expressed as: G max = 1 max ω ∈ Ω ( ❘ "\[LeftBracketingBar]" H ⁡ ( ω ) ⁢ ( F ⁡ ( ω ) - F ˆ ( ω ) ) ❘ "\[RightBracketingBar]" ) Ω = { ω ❘ ∠ ⁡ ( ( H ⁡ ( ω ) ⁢ ( F ⁡ ( ω ) - F ˆ ( ω ) ) ) ⁢ exp ⁡ ( - j ⁢ ωτ ⁢ f s ) ) = 2 ⁢ n ⁢ π } .
6 . The deep learning-based method for acoustic feedback suppression in a closed-loop system according to claim 1 , wherein mapping target of the deep learning neural network comprises: mixing the external audio signal v(t) and a noise signal n(t) according to a certain signal-to-noise ratio to obtain a mixed noisy audio input signal z(t): z ⁡ ( t ) = v ⁡ ( t ) + α ⁢ n ⁡ ( t ) where α is the amount of injected noise calculated according to the signal-to-noise ratio; using z(t) as an input to the closed-loop system to obtain a noisy signal u(t) with feedback; and using u(t) as an input signal to the neural network, and mapping a target signal s(t), which is expressed as: s ⁡ ( t ) = Gv ⁡ ( t - τ ⁢ f s ) performing K-point short-time Fourier transforms on u(t) and s(t), respectively, to obtain complex spectra U(k,l) and S(k,l) thereof at a time frame l and a frequency band k, the complex spectra being expressed as: S ⁡ ( k , l ) = ∑ μ = 0 K - 1 s ⁡ ( lR + μ ) ⁢ w ⁡ ( μ ) ⁢ e - j ⁢ 2 ⁢ π ⁢ k ⁢ μ / K , U ⁡ ( k , l ) = ∑ μ = 0 K - 1 u ⁡ ( lR + μ ) ⁢ w ⁡ ( μ ) ⁢ e - j ⁢ 2 ⁢ π ⁢ k ⁢ μ / K where w(t) is a window function, K is a frame shift, and μ is a sum variable; expressing S(k,l) and U(k,l) as the form of a real part and an imaginary part: S ⁡ ( k , l ) = S r ( k , l ) + i ⁢ S i ( k , l ) ⁢ U ⁡ ( k , l ) = U r ( k , l ) + i ⁢ U i ( k , l ) where S r (k,l) and S i (k,l) are the real part and the imaginary part of S(k,l), respectively, and U r (k,l) and U i (k,l) are the real part and the imaginary part of U(k,l), respectively; using a complex spectral mapping learning method, training the neural network to learn mapping from {U e (k,l),U i (k,l)} to {S r (k,l),S i (k,l)}, which process is expressed as: { S ~ r c , S ~ i c } = G ⁡ ( U r c , U i c ; Φ ) S c = ❘ "\[LeftBracketingBar]" S ❘ "\[RightBracketingBar]" β c ⁢ exp ⁡ ( j ⁢ ∠ ⁡ ( S ) ) where G (≡,≡; Φ) is a mapping function of the deep learning neural network, with Φ being a network parameter, (●) c represents a compression operation function, S is an independent variable for the compression operation function, β c ∈[0,1], and β c is a compression coefficient; and r and 15 are real and imaginary parts of a compressed complex spectrum and S ~ r c and S ~ i c are real and imaginary parts of a compressed complex spectrum {tilde over (S)} c (k,l) of an estimated signal, respectively, and U r c ⁢ and ⁢ U i c are real and imaginary parts of a compressed complex spectrum of an input signal, respectively.
7 . The deep learning-based method for acoustic feedback suppression in a closed-loop system according to claim 6 , wherein a mean squared error between an estimated result and a training target is directly used as the loss function, and the complex spectra and magnitude spectra are limited on the loss function; and a magnitude spectrum and complex spectrum mixed loss function L Mag+RI , a magnitude spectrum loss function L Mag and a complex spectrum loss function L RI are respectively expressed as: L Mag + RI = λ ⁢ L RI + ( 1 - λ ) ⁢ L Mag , L Mag =  ❘ "\[LeftBracketingBar]" S ~ c ❘ "\[RightBracketingBar]" - ❘ "\[LeftBracketingBar]" S c ❘ "\[RightBracketingBar]"  F 2 , L RI =  S ~ r c - S r c  F 2 +  S ~ r c - S i c  F 2 where λ is a weight coefficient with a value between 0 and 1, and ∥□∥ F represents a Frobenius norm, abbreviated as F-norm.
8 . The deep learning-based method for acoustic feedback suppression in a closed-loop system according to claim 1 , wherein when the trained model is applied to the closed-loop system, the model outputs a compressed complex spectrum {tilde over (S)} c (k,l) of an estimated target signal, and {tilde over (S)} c (k,l) is decompressed to recover a complex spectrum {tilde over (S)}(k,l), which is expressed as: S ~ ( k , l ) = ❘ "\[LeftBracketingBar]" S ~ c ( k , l ) ❘ "\[RightBracketingBar]" 1 / β c ⁢ exp ⁡ ( j ⁢ Ð ⁡ ( S ~ c ( k , l ) ) ) where β c is a compression coefficient; j is an imaginary symbol, and ∠● represents taking a phase; and inverse Fourier transform is performed on the complex spectrum and an overlap-add method is then applied to obtain a time-domain form {tilde over (s)}(t) s of the estimated signal.

Description

TECHNICAL FIELD The present invention relates to the field of acoustic feedback suppression of closed-loop systems. The closed-loop systems mentioned in the present invention are a category of systems whose system inputs are influenced by system outputs, including, for example, hearing aid systems and public address systems. The present invention specifically relates to a deep learning-based method for acoustic feedback suppression in a closed-loop system. BACKGROUND Sound reinforcement systems are widely used in multimedia electric classrooms, local conference systems and hearing aids as well as artificial cochlea, etc. Such an electro-acoustic system at least includes one microphone, one amplifier and one sound generating unit such as speaker, etc. Acoustic feedback means that when microphone and the speaker are in the same acoustic environment, there exists acoustic coupling due to a small distance therebetween. That is, the microphone picks up an external audio signal, and the audio signal passes through the amplifier and then is played back by the speaker, subsequently passes through a feedback path, is collected by the microphone and amplified by the amplifier again, and then is played back by the speaker again, thereby forming a positive feedback in a continuously cyclic manner. When a frequency meets Nyquist instability conditions, the signal magnitude increases continuously and howling occurs. Too large a signal magnitude can even cause a serious damage to audio equipment. Therefore, suppression of acoustic feedback can not only improve the sound reinforcement performance of the system, but also can ensure the stability and safety of the sound reinforcement system. SUMMARY OF THE INVENTION An object of the present invention is to overcome the problem in the prior art that the signal magnitude is too large and can cause a serious damage to audio equipment. To achieve the above object, the present invention is implemented by the following technical solution. The present invention proposes a deep learning-based method for acoustic feedback suppression in a closed-loop system, the method including: applying an offline trained closed-loop system suppression model to the closed-loop system, processing an audio signal input to the closed-loop system, and then feeding the processed audio signal to a sound reproduction unit of the closed-loop system for playback to achieve acoustic feedback suppression, the closed-loop system suppression model being built based on deep learning; andmodeling the closed-loop system, generating a unit impulse response of an acoustic feedback path by simulation, and calculating a maximum stable gain according to the unit impulse response, and generating a closed-loop signal based on the maximum stable gain; generating an open-loop target signal under an open-loop condition by using the audio signal input to the closed-loop system; forming parallel training data of the model by putting the closed-loop signal and the open-loop target signal together, and training the model by using the simulated parallel training data. As one of improvements of the above technical solution, the model is trained in an offline training mode in the method, including the following steps in the training: step 1: modeling the closed-loop system of acoustic feedback, and generating a unit impulse response of an acoustic feedback path by simulation according to an application scenario;step 2: establishing a training open-loop system based on deep learning; calculating a maximum stable gain according to the unit impulse response of the acoustic feedback path, determining a forward path gain of the open-loop system based on the maximum stable gain, inputting an audio signal and generating an open-loop signal as target audio for training, and in the closed-loop system, inputting a noisy audio signal and generating a noisy closed-loop signal with feedback, the closed-loop signal and the open-loop signal together constituting parallel training data of the model;step 3: performing feature extraction of the training data and target mapping of a deep learning neural network;step 4: designing a deep learning neural network architecture and hyper-parameters; andstep 5: selecting an appropriate loss function to train the model to obtain a trained closed-loop system suppression model. As one of improvements of the above technical solution, the closed-loop system includes a forward path amplification module and a delay module; and the modeling of the closed-loop system of acoustic feedback is expressed as: y⁡(t)=v⁡(t)+u⁡(t)*f⁡(t)where t is sampling time, * is convolution operation, v(t) is an external audio signal, u(t)=y(t)*g(t), with u(t) being a time-domain signal fed to a speaker, g(t) being a unit impulse response of a forward path of the closed-loop system, and y(t) being a pickup signal, and f(t) is the unit impulse response of the acoustic feedback path. As one of the improvements of the above technical solut