CN-121999792-A - Plan sampling non-autoregressive learning method for acoustic feedback inhibition
Abstract
The invention discloses a plan sampling non-autoregressive learning method for acoustic feedback inhibition, which belongs to the technical field of audio signal processing and deep learning and comprises the steps of constructing an open-loop training framework of non-autoregressive, constructing a two-stage finite boundary plan sampling mechanism, constructing a finite order Newman series approximation operator, performing approximate modeling on an infinite impulse response recursion structure of acoustic feedback, generating feedback estimation characteristics consistent with closed-loop operation behaviors, and training a staged model to enable the model to gradually adapt to closed-loop reasoning conditions according to a preset probability scheduling strategy. According to the invention, by introducing plan sampling and finite-order Newman series approximation, the model can self-adaptively learn the internal mode of the feedback signal evolving along with time, so that the stability performance is maintained under different system gains, acoustic coupling strengths and operation scenes, and the method has good engineering adaptability and application prospects.
Inventors
- LIANG RUIYU
- NI YE
- WANG QINGYUN
- TANG GUICHEN
- XIE YUE
- BAO YONGQIANG
- ZOU CAIRONG
- ZHAO XIAOYAN
Assignees
- 南京工程学院
Dates
- Publication Date
- 20260508
- Application Date
- 20260205
Claims (6)
- 1. The plan sampling non-autoregressive learning method for acoustic feedback inhibition is characterized by comprising the following steps of: Step A, constructing a non-autoregressive open-loop training frame, namely generating reference input features which do not contain recursion dependence based on a teacher forced strategy under a set boundary stable gain condition, and constructing an initial training sample pair; Step B, a two-stage finite boundary plan sampling mechanism is constructed, wherein the training process is divided into two stages, the first stage adopts teacher forced input, and the second stage dynamically selects the input source of the current frame between the teacher forced input and the model prediction result according to a preset probability scheduling strategy; step C, constructing a finite order Newman series approximation operator, performing approximate modeling on an infinite impulse response recursion structure of acoustic feedback, and generating feedback estimation characteristics consistent with closed-loop operation behaviors; Step D, training a staged model, which comprises the following steps: Step D1, a first stage, namely performing non-autoregressive preliminary training on the acoustic feedback suppression model structure by using the training sample constructed in the step A to obtain a basic model; And D2, a second stage, namely constructing a new training sample on the basic model according to the input signal selected by the preset probability scheduling strategy in the step B and by combining feedback estimation characteristics generated by the finite-order Newman series approximation operator in the step C, and continuously training the basic model by using the constructed new training sample to enable the model to gradually adapt to closed-loop reasoning conditions so as to obtain a final trained acoustic feedback suppression model.
- 2. The acoustic feedback suppression-oriented planned sampling non-autoregressive learning method according to claim 1, wherein the non-autoregressive open-loop training framework constructed in the step a specifically comprises: A1, determining a boundary stable gain which is 2-4dB lower than the maximum stable gain; A2, based on the boundary stable gain and the pure target voice, constructing a mixed sample containing the acoustic feedback and the pure target voice through convolution simulation acoustic feedback, and constructing a sample pair of the mixed sample and the pure target voice; a3, determining a multi-scale Fourier transform loss function in the training process, and performing supervised learning by taking pure target voice as a label.
- 3. The acoustic feedback suppression-oriented plan sampling non-autoregressive learning method according to claim 2, wherein the two-stage finite boundary plan sampling mechanism constructed in the step B specifically comprises: B1, freezing structural parameters of an acoustic feedback suppression model in a first stage, adopting a teacher to forcedly input and drive a neural network to operate, and generating an acoustic feedback signal in a non-autoregressive form under a non-recursion condition to serve as a candidate feedback signal in a training process; In the second stage, a preset sampling probability is used for dynamically selecting between the forced input of a teacher and a model prediction result, wherein the sampling probability changes along with training rounds and is specifically expressed as follows: , Wherein, the Is the first The probability of sampling in a round of time, And Respectively representing initial probability and lower limit probability which are forcedly input by teacher and the lower limit probability is not zero, parameters And For controlling the offset position and decay rate of the sampling probability curve.
- 4. The method for non-autoregressive learning of acoustic feedback suppression-oriented program sampling as defined in claim 3, wherein in step C, the finite-order Newman series approximation operator is constructed by performing infinite-order Newman series expansion on the output signal of the closed-loop feedback system and cutting the output signal at a high order, the cutting order is determined by the characteristics of the acoustic feedback path and the system gain, and the input voice sequence and the finite-order Newman series approximation operator are convolved and overlapped to obtain the acoustic feedback signal of the approximated closed-loop feedback system.
- 5. The method for learning the planned sampling non-autoregressive for the acoustic feedback suppression according to claim 4 is characterized in that the acoustic feedback suppression model structure is an asymmetric coding and decoding model structure and comprises asymmetric double-path encoders, an attention fusion module, a parallel time-frequency-LSTM module, a multi-angle time-frequency modeling and a decoder, wherein the asymmetric double-path encoders are used for carrying out feature extraction and downsampling on input microphone mixed signals through two layers of convolution modules, the attention fusion module is used for fusing output features of the two paths of encoders and outputting fused features, the parallel time-frequency-LSTM module is used for carrying out multi-angle time-frequency modeling by taking the fused features as input, the output features of each time-frequency-LSTM module are spliced to form rich joint time-frequency features, and the decoder is used for estimating spectrum masking and acting on the input microphone mixed signals and outputting enhanced voice with acoustic feedback removed through step-up sampling and recovering time-frequency dependency of the two cascaded inverse convolution modules.
- 6. The acoustic feedback suppression-oriented planned sampling non-autoregressive learning method according to claim 5, wherein the step D of phased training is specifically: Step D1, a first stage, namely inputting the mixed sample constructed in the step A into a first path of encoder, inputting a sample pair into a second path of encoder, taking pure target voice as a training target, and training an acoustic feedback suppression model structure to obtain a basic model; And D2, in the second stage, selecting an input signal source according to a preset sampling probability on the basic model, using feedback estimation characteristics generated by finite-order Newman series approximation to replace candidate feedback estimation characteristics to construct a new training sample, respectively taking a sample pair consisting of the training sample, the training sample and pure target voice as the input of a two-way encoder, taking the pure target voice as a training target, continuously training the obtained basic model, and gradually adapting the model to closed-loop reasoning conditions to obtain a final trained acoustic feedback suppression model.
Description
Plan sampling non-autoregressive learning method for acoustic feedback inhibition Technical Field The invention belongs to the technical field of audio signal processing and deep learning, and particularly relates to a planned sampling non-autoregressive learning method oriented to acoustic feedback inhibition. Background In an audio amplification system, the system comprises a hearing aid, a wireless earphone, a vehicle-mounted voice system and the like, wherein a loudspeaker output signal is coupled back to a microphone through an acoustic path to form a closed loop feedback structure, and when the system gain is high, the problems of spectrum dyeing, howling, voice distortion and the like are easily caused, so that the user experience is seriously affected. With the gradual adoption of a high amplification gain, an open wearing structure and a transparent listening mode of modern audio equipment, the acoustic coupling effect is further amplified, and the stability margin of the system is obviously reduced, so that the traditional feedback suppression method is difficult to meet the actual requirements. The existing acoustic feedback suppression technology mainly comprises phase modulation feedback control, howling suppression based on a notch filter and an adaptive feedback cancellation method. The phase modulation method breaks a positive feedback path by introducing frequency-dependent phase disturbance, the notch suppression method suppresses the narrowband resonant frequency which causes howling, and the adaptive feedback cancellation method achieves suppression by estimating the feedback path and eliminating feedback components. However, under the condition of high system gain, the performance of the conventional method tends to be rapidly degraded, and spectral artifacts are easily introduced or voice components are erroneously deleted, so that the voice intelligibility is reduced. In recent years, with the wide application of deep learning in speech related tasks, acoustic feedback suppression methods based on deep neural networks are attracting attention. The modeling capability of the neural network on the complex nonlinear acoustic mapping relation is utilized, and the feedback inhibition performance is improved to a certain extent. However, the existing feedback suppression method based on deep learning still has obvious limitations in training strategies, and is mainly characterized in the following two aspects: 1) The method has the advantages that the model output recursion feedback is used as the subsequent input in the training process, so that the closed loop feedback behavior in a real system is simulated, and the feedback signal required by training is constructed, and although the mode can reflect the accumulation characteristic of the feedback signal more truly, the calculation complexity is high, the training process is unstable, a long time sequence is required to be unfolded, the model output recursion feedback is difficult to expand to a large-scale data set and a long voice scene, and the practical application is severely restricted. 2) In order to improve training efficiency, the other class adopts a teacher forced training strategy, namely, the feedback is assumed to be completely suppressed in each frame, so that recursive execution is avoided, non-autoregressive training similar to a speech enhancement model is realized, and although the strategy obviously reduces training cost, the model is easy to generate exposure deviation problems due to the fact that the model completely depends on self historical prediction results during reasoning due to the fact that the training stage and a real closed-loop reasoning stage have obvious differences, prediction errors are accumulated and spread continuously in closed-loop feedback, and therefore the model is poor in performance in a real system. Therefore, how to design a training method can not only keep the high efficiency of non-autoregressive training, but also effectively reduce the gap between the training method and autoregressive closed-loop reasoning, and becomes a key for improving the actual performance of the acoustic feedback suppression model. Disclosure of Invention Aiming at the technical problems in the prior art, the invention provides a plan sampling non-autoregressive learning method for acoustic feedback inhibition, which keeps the efficiency of forced training of teachers, and gradually adapts to real closed-loop running conditions in the training process by introducing a controlled plan sampling mechanism and approximation of infinite impulse response to acoustic feedback based on finite order Newman series. In order to solve the technical problems, the invention adopts the following technical scheme: a plan sampling non-autoregressive learning method facing acoustic feedback inhibition comprises the following steps: Step A, constructing a non-autoregressive open-loop training frame, namely generating reference