CN-121983017-A - Multi-domain fusion voice noise reduction method, system and computer readable storage medium

CN121983017ACN 121983017 ACN121983017 ACN 121983017ACN-121983017-A

Abstract

The invention discloses a multi-domain fusion voice noise reduction method, a system and a computer readable storage medium, which relate to the technical field of voice enhancement and signal processing and comprise the following steps of identifying candidate line spectrums by constructing power spectrum peak comprehensive scores and carrying out self-adaptive notch; and then combining wavelet self-adaptive thresholding with improved logarithmic spectrum amplitude estimation to compensate, and further carrying out weighted fusion on the signals after the notch and compensation through spectrum energy entropy. The rapid suppression of the narrow-band line spectrum interference is realized through the frequency spectrum line peak detection and the self-adaptive notch technology, the broadband background noise is effectively eliminated by means of the wavelet coefficient energy ratio eta self-adaptive threshold algorithm and the OM-LSA spectrum compensation technology, the recursion robust estimation in the time-frequency domain is completed by adopting the frequency domain three-dimensional Kalman filtering, and the signal detail reconstruction and the frequency spectrum equalization are finally realized by utilizing the lightweight residual CNN network without training weight, so that the optimal balance of high gain and low distortion is realized in the complex noise environment.

Inventors

WANG YIMING
CHEN ZHIYUAN
SHI YUTIAN
LIU SHUCHANG
WAN XIANG
GENG QINGQING
ZOU HUI
XU XINYUAN
LIU YUZHOU
HE CHANGXUN
Qiao Tianchang
LI PEIQI
LI YIHANG
WANG HANJIE

Assignees

南京邮电大学

Dates

Publication Date: 20260505
Application Date: 20260209

Claims (10)

1. A multi-domain fusion voice noise reduction method is characterized by comprising the following steps: S1, estimating power spectrum density of an input signal, constructing a comprehensive score by combining peak amplitude ratio, adjacent peak contrast, noise peak ratio and local power spectrum smoothness, and identifying candidate line spectrum frequency and corresponding bandwidth thereof; S2, performing adaptive notch processing based on the identified candidate frequencies, and taking the ratio of the correlation coefficient to the root mean square amplitude as an iteration termination condition; S3, carrying out wavelet decomposition on the signals after the notch, adaptively determining a threshold value and a retention coefficient according to the energy distribution of each decomposition layer, and respectively carrying out threshold value processing and reconstruction on the detail coefficient and the approximation coefficient; S4, performing spectrum compensation on the signal reconstructed in the S3 by adopting an improved logarithmic spectrum amplitude estimation method; S5, carrying out self-adaptive weighted fusion on the signal after the notch in S2 and the spectrum energy entropy of the signal after spectrum compensation in S4; s6, in a short-time Fourier transform domain, carrying out Kalman filtering on the signals fused in the S5 frequency-point-by-frequency-point and frame-by-frame, wherein the process noise variance and the measurement noise variance are dynamically adjusted according to wavelet energy interpolation, frequency spectrum gradient and instantaneous signal-to-noise ratio; And S7, calculating residual errors between the output signals in S6 and the output signals in S5, and carrying out detail recovery and signal equalization on the residual errors by using a fixed-weight light convolution network to obtain a final noise reduction signal.
2. The method of claim 1, wherein the composite score in S1 is calculated by the following formula: ; Wherein, the The composite score is represented by a composite score, Representing the peak amplitude ratio after normalization, Representing the normalized adjacent peak ratio, Representing the normalized noise-to-peak ratio, Representing the normalized local power smoothness, In the spectrum screening process, a peak height threshold value and a minimum frequency interval threshold value are set simultaneously and are used for determining effective spectrum components from candidate frequencies.
3. A multi-domain fusion speech noise reduction method according to claim 1, wherein the notch processing in S2 employs an infinite impulse response notch, and the bandwidth scaling factor of the notch is dynamically adjusted in the range of 1.0 to 1.3.
4. The method for denoising multi-domain fusion speech according to claim 1, wherein the number of wavelet decomposition layers in S3 is determined by the following formula: ; Wherein L is the number of wavelet decomposition layers, sam is the sampling rate of the input signal, and the wavelet basis function adopted by the wavelet decomposition is coif4 or sym4.
5. The method of multi-domain fusion speech noise reduction according to claim 1, wherein the step S3 sets a threshold for a j-th layer wavelet coefficient The calculation mode is that ; Wherein, the The threshold scaling factor is used for adjusting the overall threshold intensity, and when the value of the threshold scaling factor is larger than 1, the denoising intensity is enhanced, and when the value is smaller than 1, the denoising intensity is weakened; estimating the noise standard deviation of the j-th layer wavelet coefficient based on the median absolute deviation of the wavelet coefficient; For the number of wavelet coefficients of the j-th layer, At the same time, the retention coefficient of the j-th layer is set The calculation mode is as follows: ; Wherein, the To preserve the upper coefficient limit for protecting signal structural integrity, the value range is 0.95 to 0.98; the basic retention coefficient is used for avoiding the excessive distortion of the high noise layer signal or introducing artifacts, and the value range is 0.1 to 0.3; The energy gain factor is used for adaptively adjusting the denoising intensity according to the energy duty ratio of each layer, and the value range is 8 to 12; The energy ratio of the wavelet coefficient of the j layer is calculated by the following method: ; Wherein, the For the total energy of the wavelet coefficient of the j-th layer, the calculation formula is as follows: ; for the total energy of all layers of wavelet coefficients, the calculation formula is: 。
6. the method for multi-domain fusion speech noise reduction according to claim 1, wherein the fusion weights in S5 are The calculation mode of (a) is as follows: ; Wherein H represents a shannon entropy function of the average spectral energy, As the notch processed output signal in S2, And (5) the frequency spectrum compensated output signal is obtained in the S4.
7. The method for multi-domain fusion speech noise reduction according to claim 1, wherein the process noise variance Q and the measured noise variance R in S6 are set as follows: ; Wherein, the ; In the formula, For the process noise reference value, To measure a noise reference value; Representing the proportion of energy of each frequency band to the total energy of the signal as a parameter obtained by spectrum energy distribution estimation; for the current frame signal energy, prevE is the previous frame signal energy; p is the process noise power; the coefficients determined by the optimization algorithm are used for adjusting the influence intensities of the physical quantities on Q and R.
8. The multi-domain fusion voice noise reduction method of claim 1, wherein the residual convolution network in S7 is composed of 2-5 layers of one-dimensional convolutions, the length of each layer of convolution kernel is 3-7, a residual gating mechanism is included in the network structure, and all network parameters are kept fixed in the deployment and use processes without training and optimization.
9. A multi-domain fusion voice noise reduction system for realizing the multi-domain fusion voice noise reduction method according to any one of claims 1 to 8, which is characterized by comprising a line spectrum identification unit, a notch processing unit, a wavelet threshold processing unit, a frequency spectrum compensation unit, an entropy fusion unit, a frequency domain kalman filtering unit, a residual convolution unit and a control module; The line spectrum identification unit is used for detecting candidate line spectrum frequencies and bandwidths thereof in the input signals; the notch processing unit is connected with the line spectrum identification unit and is used for executing self-adaptive notch filtering on the candidate line spectrum frequency; The wavelet threshold processing unit is connected with the notch processing unit and is used for carrying out wavelet decomposition and self-adaptive threshold processing on the notch signal; the frequency spectrum compensation unit is connected with the wavelet threshold processing unit and is used for carrying out logarithmic spectrum amplitude compensation on the reconstruction signal; The entropy fusion unit is respectively connected with the notch processing unit and the frequency spectrum compensation unit and is used for carrying out self-adaptive weighted fusion on the two paths of signals based on the spectrum energy entropy; the frequency domain Kalman filtering unit is connected with the entropy fusion unit and is used for executing frequency point Kalman filtering in a short-time Fourier transform domain; The residual convolution unit is connected with the frequency domain Kalman filtering unit and the entropy fusion unit and is used for realizing detail recovery and signal equalization through a convolution network with fixed weight; The control module is respectively connected with the line spectrum identification unit, the notch processing unit, the wavelet threshold processing unit, the frequency spectrum compensation unit, the entropy fusion unit, the frequency domain Kalman filtering unit and the residual convolution unit and is used for coordinating processing time sequence and dynamically adjusting system parameters.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a multi-domain fusion speech noise reduction method according to any of claims 1 to 8.

Description

Multi-domain fusion voice noise reduction method, system and computer readable storage medium Technical Field The invention relates to the technical field of voice enhancement and signal processing, in particular to a multi-domain fusion voice noise reduction method, a system and a computer readable storage medium. Background In the current practical application scenes such as industrial sites, vehicle-mounted/airborne environments, conference acquisition and communication links, the voice signals are commonly subjected to composite interference of two typical noises, namely a narrow-band line spectrum and harmonic interference thereof caused by equipment characteristics such as motor operation, inverter operation and rectification residues, and broadband Gaussian or near Gaussian noises from environmental background, sensor background and transmission channels. Against such composite noise challenges, the prior art scheme has obvious limitations that the traditional adaptive filtering (such as NLMS) can effectively inhibit line spectrum components, but has limited processing effect on broadband noise, the method based on spectral subtraction or improved log spectrum amplitude estimation can process broadband noise, but always has obvious line spectrum peak, the statistical modeling method and the single Kalman filtering technology have insufficient adaptability to non-stationary line spectrums, and the end-to-end deep learning model has certain effect, but has inherent defects of dependence on a large amount of data in model training, high calculation resource requirement, weak system interpretability, insufficient engineering mobility and the like. Therefore, there is a need in the industry for a composite noise reduction solution that does not require training dependency, has good interpretability, supports real-time deployment, and can effectively process narrowband spectrum and wideband noise at the same time. Disclosure of Invention In order to overcome the defects in the prior art, the invention provides a multi-domain fusion voice noise reduction method, a system and a computer readable storage medium, which solve the problems in the prior art. In order to achieve the purpose, the invention is realized by the following technical scheme that the multi-domain fusion voice noise reduction method comprises the following steps: S1, estimating power spectrum density of an input signal, constructing a comprehensive score by combining peak amplitude ratio, adjacent peak contrast, noise peak ratio and local power spectrum smoothness, and identifying candidate line spectrum frequency and corresponding bandwidth thereof; S2, performing adaptive notch processing based on the identified candidate frequencies, and taking the ratio of the correlation coefficient to the root mean square amplitude as an iteration termination condition; S3, carrying out wavelet decomposition on the signals after the notch, adaptively determining a threshold value and a retention coefficient according to the energy distribution of each decomposition layer, and respectively carrying out threshold value processing and reconstruction on the detail coefficient and the approximation coefficient; S4, performing spectrum compensation on the signal reconstructed in the S3 by adopting an improved logarithmic spectrum amplitude estimation method; S5, carrying out self-adaptive weighted fusion on the signal after the notch in S2 and the spectrum energy entropy of the signal after spectrum compensation in S4; s6, in a short-time Fourier transform domain, carrying out Kalman filtering on the signals fused in the S5 frequency-point-by-frequency-point and frame-by-frame, wherein the process noise variance and the measurement noise variance are dynamically adjusted according to wavelet energy interpolation, frequency spectrum gradient and instantaneous signal-to-noise ratio; And S7, calculating residual errors between the output signals in S6 and the output signals in S5, and carrying out detail recovery and signal equalization on the residual errors by using a fixed-weight light convolution network to obtain a final noise reduction signal. Preferably, the composite score in S1 is calculated by the following formula: Wherein, the The composite score is represented by a composite score,Representing the peak amplitude ratio after normalization,Representing the normalized adjacent peak ratio,Representing the normalized noise-to-peak ratio,Representing normalized local power smoothness. In the spectrum screening process, a peak height threshold value and a minimum frequency interval threshold value are set simultaneously and are used for determining effective spectrum components from candidate frequencies. Preferably, the notch processing in S2 employs an infinite impulse response notch, and a bandwidth scaling factor of the notch is dynamically adjusted in a range of 1.0 to 1.3. Preferably, the number of wavelet decomposition layers in S3 is determined