EP-4485458-B1 - NOISE SIGNAL PROCESSING METHOD

EP4485458B1EP 4485458 B1EP4485458 B1EP 4485458B1EP-4485458-B1

Inventors

WANG, ZHE

Dates

Publication Date: 20260513
Application Date: 20141009

Claims (3)

A linear prediction-based noise signal processing method, wherein the method comprises: acquiring (S51) a noise signal; obtaining, by using a Levinson-Durbin algorithm, a linear prediction coefficient according to the noise signal; filtering (S52) the noise signal according to the linear prediction coefficient, to obtain a linear prediction residual signal; obtaining energy of the linear prediction residual signal according to the linear prediction residual signal; obtaining (S53) a spectral envelope of the linear prediction residual signal according to the linear prediction residual signal; obtaining a spectral detail of the linear prediction residual signal according to the spectral envelope of the linear prediction residual signal; and encoding (S54) the linear prediction coefficient, the energy of the linear prediction residual signal, and the spectral detail of the linear prediction residual signal; characterised in that the spectral detail of the linear prediction residual signal is a difference between the spectral envelope of the linear prediction residual and a spectral envelope of random noise excitation.
A computer-readable storage medium, tangibly embodying computer program code, which, when executed by a computer unit, causes the computer unit to perform the method according to claim 1.
A computer program configured to cause a computer to execute the method of claim 1.

Description

TECHNICAL FIELD The present invention relates to the audio signal processing field, and in particular, to a noise processing method, a noise generation method, an encoder, a decoder, and an encoding and decoding system. BACKGROUND There is speech in approximately only 40% of time of voice communication, and there is silence or background noise (collectively referred to as background noise below) in all other time. To reduce transmission bandwidth of the background noise, a discontinuous transmission (DTX) system and a comfort noise generation (CNG) technology appear. DTX means that an encoder intermittently encodes and sends an audio signal in a background noise period according to a policy, instead of continuously encoding and sending an audio signal of each frame. Such a frame that is intermittently encoded and sent is generally referred to as a silence insertion descriptor (SID) frame. The SID frame generally includes some characteristic parameters of background noise, such as an energy parameter and a spectrum parameter. On a decoder side, a decoder may generate consecutive background noise recreation signals according to a background noise parameter obtained by decoding the SID frame. A method for generating consecutive background noise in a DTX period on the decoder side is referred to as CNG. An objective of the CNG is not accurately recreating a background noise signal on an encoder side, because a large amount of time-domain background noise information is lost in discontinuous encoding and transmission of the background noise signal. The objective of the CNG is that background noise that meets a subjective auditory perception requirement of a user can be generated on the decoder side, thereby reducing discomfort of the user. In an existing CNG technology, comfort noise is generally obtained by using a linear prediction-based method, that is, a method for using random noise excitation on a decoder side to excite a synthesis filter. Although background noise can be obtained by using such a method, there is a specific difference between generated comfort noise and original background noise in terms of subjective auditory perception of a user. When a continuously encoded frame is transited to a CN (Comfort Noise) frame, such a difference in the subjective perception of the user may cause subjective discomfort of the user. A method for using CNG is specifically stipulated in the adaptive multi-rate wideband (AMR-WB) standard in the 3rd Generation Partnership Project (3GPP), and a CNG technology of the AMR-WB is also based on linear prediction. In the AMR-WB standard, a SID frame includes a quantized background noise signal energy coefficient and a quantized linear prediction coefficient, where the background noise energy coefficient is a logarithmic energy coefficient of background noise, and the quantized linear prediction coefficient is expressed by a quantized immittance spectral frequency (ISF) coefficient. On a decoder side, energy and a linear prediction coefficient that are of current background noise are estimated according to energy coefficient information and linear prediction coefficient information that are included in the SID frame. A random noise sequence is generated by using a random number generator, and is used as an excitation signal for generating comfort noise. A gain of the random noise sequence is adjusted according to the estimated energy of the current background noise, so that energy of the random noise sequence is consistent with the estimated energy of the current background noise. Random sequence excitation obtained after the gain adjustment is used to excite a synthesis filter, where a coefficient of the synthesis filter is the estimated linear prediction coefficient of the current background noise. Output of the synthesis filter is the generated comfort noise. In a method for generating comfort noise by using a random noise sequence as an excitation signal, although relatively comfortable noise can be obtained, and a spectral envelope of original background noise can also roughly recovered, a spectral detail of the original background noise may be lost. As a result, there is still a specific difference between generated comfort noise and the original background noise in terms of subjective auditory perception. Such a difference may cause subjective auditory discomfort of a user when a continuously encoded speech segment is transited to a comfort noise segment. Khaled Helmi El-Maleh, "Classification-Based Techniques for Digital Coding of Speech-plus-Noise", thesis submitted to McGill University, January 2004, discloses a linear prediction-based noise coding method. SUMMARY In view of this, to resolve the foregoing problem, embodiments of the present invention provide a comfort noise generation method, an apparatus, and a system. According to a noise processing method, a noise generation method, an encoder, a decoder, and an encoding-decoding system that are in the embodime