CN-120690222-B - Method and system for real-time noise reduction of call voice based on dynamic noise perception

CN120690222BCN 120690222 BCN120690222 BCN 120690222BCN-120690222-B

Abstract

The invention discloses a method and a system for real-time noise reduction of call voice based on dynamic noise perception, which are used for collecting original voice signals with noise, carrying out framing and windowing pretreatment on the digital voice signals with noise, extracting the frequency spectrum characteristics of the noise, dividing the noise into stable noise, non-stable noise and burst noise according to the frequency spectrum characteristics of the noise, constructing a lightweight neural network structure through model pruning and quantization, carrying out parameter and strategy adjustment on the types of the stable noise, the non-stable noise and the burst noise to realize optimal reduction, carrying out windowing and overlap-adding post-treatment on the noise-reduced voice signals, and outputting noise-reduced voice signals.

Inventors

WANG ZIBIN
WU BIN
ZHANG HAIBIN
NING MUXUAN
ZHANG MINGZHI

Assignees

江西亚瑞科技有限责任公司

Dates

Publication Date: 20260512
Application Date: 20250527

Claims (6)

1. The lightweight call voice real-time noise reduction method based on dynamic noise perception is characterized by comprising the following steps of: Step 1, collecting an original voice signal with noise, and converting an analog voice signal into a digital voice signal through an analog-to-digital converter; step 2, carrying out framing and windowing pretreatment on the digital voice signal with noise to obtain a digital voice signal to be treated; step 3, analyzing the digital voice signal to be processed, extracting the frequency spectrum characteristic of noise, and dividing the noise into stable noise, non-stable noise and burst noise according to the frequency spectrum characteristic of the noise; Step4, constructing a lightweight neural network structure through model pruning and quantization, and training a model through a large number of noisy voices and pure voices to obtain a lightweight RNN model; step 5, parameters and strategies are adjusted to the types of steady-state noise, unsteady-state noise and burst noise by using a lightweight RNN model so as to realize optimized noise reduction; Step 6, carrying out post-processing of windowing and overlap-adding on the noise-reduced voice signal, and outputting the noise-reduced voice signal; The method for constructing the lightweight neural network structure is characterized by improving the existing RNN structure, and the improvement method comprises the following steps: step 1, carrying out foundation structure weight reduction, compressing the dimension of a hidden layer, and adopting a bidirectional structure to reduce the dimension of a unidirectional hidden layer to 32-64 dimensions; introducing a parameter sharing mechanism, sharing a weight matrix among time steps, and coupling a forgetting gate and an input gate into a logic circuit ; Step 2, building a lightweight unit, removing an output gate in the traditional LSTM, and introducing step jump connection; step 3, carrying out gate control combination, and combining an input gate and a forget gate into a complementary relationship; step 4, carrying out structured pruning, carrying out blocky pruning on 4 gating matrixes of the LSTM, and removing 30% -50% of parameters based on sensitivity analysis of gradient amplitude; step 5, using dynamic fixed-point quantization to the hidden state so as to obtain a lightweight neural network; the method for compressing the hidden layer dimension is that the hidden layer dimension is compressed by the following expression, , Wherein the method comprises the steps of In order to input the dimensions of the device, Expressed as hidden layer dimensions, where Compressed to 1/4 of the original design, the FLPs represent floating point operation times.
2. The method for real-time noise reduction of light-weight call voice based on dynamic noise perception according to claim 1, wherein the method for carrying out frame-division preprocessing on the noisy digital voice signal is to carry out frame-division on the noisy digital voice signal according to a fixed length and overlap a certain length between frames, and the method for carrying out windowing preprocessing on the noisy digital voice signal is to add a hamming window to each frame of voice signal.
3. The method for reducing noise in real time of lightweight speech communication based on dynamic noise perception according to claim 1, wherein the method for introducing step-by-step connection is to add a step-by-step connection item The hidden state of the first two steps is directly introduced, the following formula is adopted, , Wherein alpha is a learnable attenuation factor, W is a weight matrix input to the hidden layer, U is the weight matrix from the hidden layer to the hidden layer consistent with the traditional RNN and is responsible for fusion of the current input and the latest state, wherein For inputting vectors Is a linear transformation of (a).
4. The method for real-time noise reduction of lightweight call voice based on dynamic noise perception according to claim 1, wherein the specific method in step 3 is that the lightweight call voice is divided into 3 independent gates after linear transformation, namely input gates Forgetful door Output door , Wherein each gate calculates the way: · ; · ; ; wherein the sigmoid function combines the input gate and the forget gate into a complementary relationship by And Replacing the independent parameters; Then there is ; Wherein in the formula above Representing the state of the cell at the current time; Representing the state of the cell at the previous time; The output value of the forget gate is represented as it is generated by a Sigmoid function, Representing candidate states, generated by linear combination of input and hidden states via an activation function, Represented is an element-by-element multiplication.
5. The method for real-time noise reduction of lightweight speech for conversation based on dynamic noise perception according to claim 3, wherein the method for using dynamic fixed-point quantization for hidden states is that; wherein b=4, Wherein in the above formula B is the number of quantization bits, Is the original floating point value, such as the hidden state in a neural network.
6. A system for implementing the dynamic noise perception-based lightweight call voice real-time noise reduction method according to any one of claims 1 to 5, which is characterized by comprising a voice acquisition module (1), a preprocessing module (2), a dynamic noise perception module (3), a lightweight noise reduction module (4), a post-processing module (5) and an output module (6); The system comprises a voice acquisition module (1) for acquiring voice signals and converting analog signals into digital signals through an analog-to-digital converter, a preprocessing module (2) for carrying out framing and windowing preprocessing on the acquired digital voice signals with noise, a dynamic noise perception module (3) for extracting the spectral characteristics of the noise in real time through Mel spectral analysis, dividing the noise into stable noise, non-stable noise and burst noise according to the noise spectral characteristics, dynamically adjusting parameters and strategies of a noise reduction model according to the noise types, a lightweight noise reduction module (4) for constructing a lightweight neural network structure through model pruning and quantization, training the model through a large number of noisy voices and pure voices to obtain a lightweight RNN model, carrying out windowing and overlap-add post-processing on the noise signals after noise reduction, and an output module (6) for outputting the noise signals after noise reduction to a loudspeaker or a storage device.

Description

Method and system for real-time noise reduction of call voice based on dynamic noise perception Technical Field The invention relates to the technical field of voice signal processing, in particular to a method and a system for real-time noise reduction of call voice based on dynamic noise perception. Background Background noise is one of the main factors affecting speech quality and speech recognition accuracy in the field of speech signal processing. With the rapid development of voice communication, voice recognition and voice control technologies, there is an increasing demand for real-time noise reduction technologies. Particularly in voice real-time conversation scenarios, voice signals are often subject to interference from complex background noise (e.g., industrial noise, environmental noise, and device noise). Furthermore, embedded devices (e.g., industrial robots, smart home devices, in-vehicle systems, etc.) often have limited computing power and memory resources. The existing deep learning noise reduction model has superior performance, but has higher computational complexity and memory occupation, and is difficult to be directly deployed in lightweight equipment. Traditional noise reduction methods (such as spectral subtraction, wiener filtering, etc.) have limited effectiveness in processing dynamic and non-stationary noise, and are difficult to effectively remove complex noise, resulting in reduced speech quality. In addition, conventional noise reduction methods are generally optimized for specific types of noise, and lack the adaptive capacity to dynamic noise environments. In practical applications, the noise type and intensity may change at any time, and a method capable of sensing the noise characteristics in real time and dynamically adjusting the noise reduction strategy is required. Disclosure of Invention In order to overcome the defects in the prior art, the invention provides a method and a system for real-time noise reduction of conversation voice based on dynamic noise perception. In order to solve the technical problems, the invention provides the following technical scheme: the invention discloses a dynamic noise perception-based lightweight call voice real-time noise reduction method, which comprises the following steps of: Step 1, collecting an original voice signal with noise, and converting an analog voice signal into a digital voice signal through an analog-to-digital converter; step 2, carrying out framing and windowing pretreatment on the digital voice signal with noise to obtain a digital voice signal to be treated; step 3, analyzing the digital voice signal to be processed, extracting the frequency spectrum characteristic of noise, and dividing the noise into stable noise, non-stable noise and burst noise according to the frequency spectrum characteristic of the noise; and 4, constructing a lightweight neural network structure through model pruning and quantization, and training the model through a large number of noisy voices and pure voices to obtain a lightweight RNN model. Step 5, the lightweight RNN model carries out parameter and strategy adjustment on the types of steady-state noise, unsteady-state noise and burst noise so as to realize optimized noise reduction; and 6, performing post-processing of windowing and overlap-adding on the noise-reduced voice signal, and outputting the noise-reduced voice signal. As a preferable technical scheme, the method for carrying out frame pretreatment on the noisy digital voice signals comprises the steps of carrying out frame pretreatment on the noisy digital voice signals according to fixed length, wherein certain length of overlapping exists between frames, and the method for carrying out windowing pretreatment on the noisy digital voice signals comprises the step of adding a Hamming window to each frame of voice signals. As a preferable technical scheme of the invention, the method for constructing the lightweight neural network structure is to improve the existing RNN structure, and the improvement method comprises the following steps: step 1, carrying out foundation structure weight reduction, compressing the dimension of a hidden layer, and adopting a bidirectional structure to reduce the dimension of a unidirectional hidden layer to 32-64 dimensions; introducing a parameter sharing mechanism, sharing a weight matrix among time steps, and coupling a forgetting gate and an input gate into a logic circuit ; Step 2, building a lightweight unit, removing an output gate in the traditional LSTM, and introducing step jump connection; step 3, carrying out gate control combination, and combining an input gate and a forget gate into a complementary relationship; step 4, carrying out structured pruning, carrying out blocky pruning on 4 gating matrixes of the LSTM, and removing 30% -50% of parameters based on sensitivity analysis of gradient amplitude; and step 5, using dynamic fixed-point quantization to the hidden state so as to