JP-7855043-B2 - Harmonic conversion

JP7855043B2JP 7855043 B2JP7855043 B2JP 7855043B2JP-7855043-B2

Inventors

エクストランド，ペール
ヴィレモエス，ラルス，ファルック

Assignees

ドルビー・インターナショナル・アーベー

Dates

Publication Date: 20260507
Application Date: 20240924
Priority Date: 20090918

Claims (13)

An audio signal processing device that converts an input audio signal with a conversion factor T to generate an output audio signal, wherein the audio signal processing device: The steps include: extracting frames of L time-domain sample values of the input audio signal using a decomposition window of length L with the function v(n) = sin((π/L)(n + 0.5)) and 0 ≤ n <L; A step of converting the L time-domain samples into M complex frequency-domain coefficients, the step of converting the L time-domain samples into M complex frequency-domain coefficients, the step of determining a frequency-domain oversampling factor F, and determining M according to L and F ; A step of changing the phase of the complex frequency domain coefficient using the conversion factor T; The steps include: converting the modified frequency-domain coefficients into M modified time-domain samples; It has one or more components that perform the steps of generating L frames of time-domain output sample values of the output audio signal from the M modified time-domain sample values using a composite window, M = F * L, where F is the frequency-domain oversampling factor determined in response to the frequency -domain oversampling information received in the encoded bitstream. The frame of L time-domain output sample values of the output audio signal contains multiple high-frequency components that are not present in the frame of L time-domain sample values of the input audio signal, at least one of the high-frequency components is generated using a conversion factor T, and at least one of the high-frequency components is generated using a second conversion factor T2 , where T is not equal to T2 . Audio signal processing unit.
The audio signal processing device according to claim 1, wherein the oversampling factor F is (T+1)/2 or greater, and the conversion factor T is an integer greater than 1.
The audio signal processing apparatus according to claim 1, wherein the phase change includes multiplying the phase by a conversion factor T.
The audio signal processing apparatus according to claim 1, wherein the decomposition window has a length L, along with zero padding consisting of an additional (F-1)*L zeros.
The aforementioned one or more components further: The steps include: shifting the decomposition window by the decomposition stride along the input audio signal to generate a series of frames of the input audio signal; The first step is to shift a series of frames of L time-domain output sample values by the combined stride; The process involves performing the steps of generating the output audio signal by superimposing and summing a series of shifted frames of L time-domain output sample values. The audio signal processing device according to claim 1.
The audio signal processing apparatus according to claim 5, wherein one or more of the components further increase the sampling rate of the output audio signal by a conversion factor T to produce a converted output audio signal.
The audio signal processing apparatus according to claim 6, wherein the combined stride is T times the decomposed stride.
A method performed by an audio signal processing device, which converts an input audio signal with a conversion factor T to generate an output audio signal, wherein the method is: The steps include: extracting frames of L time-domain sample values of the input audio signal using a decomposition window of length L with the function v(n) = sin((π/L)(n + 0.5)) and 0 ≤ n <L; A step of converting the L time-domain samples into M complex frequency-domain coefficients, the step of converting the L time-domain samples into M complex frequency-domain coefficients, the step of determining a frequency-domain oversampling factor F, and determining M according to L and F ; A step of changing the phase of the complex frequency domain coefficient using the conversion factor T; The steps include: converting the modified frequency-domain coefficients into M modified time-domain samples; The process includes the step of generating a frame of L time-domain output sample values of the output audio signal from the M modified time-domain sample values using a composite window, M = F * L, where F is a frequency-domain oversampling factor determined in response to the frequency-domain oversampling information received in the encoded bitstream. The frame of L time-domain output sample values of the output audio signal contains multiple high-frequency components that are not present in the frame of L time-domain sample values of the input audio signal, at least one of the high-frequency components is generated using a conversion factor T, and at least one of the high-frequency components is generated using a second conversion factor T2 , where T is not equal to T2 . method.
The method according to claim 8, wherein converting the L time-domain sample values into M complex frequency-domain coefficients is performed by executing one of the Fourier transform, fast Fourier transform, discrete Fourier transform, or wavelet transform.
The method according to claim 8, wherein the oversampling factor F is (T+1)/2 or greater, and the conversion factor T is an integer greater than 1.
The method according to claim 8, wherein the input audio signal includes low-frequency components of the audio signal.
A non-temporary computer-readable medium having instructions for execution by an audio signal processing device, wherein, when executed by the audio signal processing device, the instructions cause the audio signal processing device to perform the method described in claim 8.
A computer program for causing a computer to perform the method described in claim 8 .

Description

This invention relates to the encoding of audio signals, particularly to the conversion of signals in frequency and/or the expansion/compression of signals in time. In other words, this invention relates to the modification of time scales and/or frequency scales. More specifically, this invention relates to high-frequency reconstruction (HFR) including a frequency-domain harmonic transposer. High-frequency (HFR) technologies, such as Spectral Band Replication (SBR), can significantly improve the encoding efficiency of traditional perceptual audio codecs. Combined with MPEG-4 Advanced Audio Coding (AAC), HFR technology forms a highly efficient audio codec. It is already in use within the XM Satellite Radio system and Digital Radio Mondiale, and is standardized within organizations such as 3GPP® and the DVD Forum. The combination of AAC and SBR is called aacPlus, which is part of the MPEG-4 standard, where it is referred to as the High Efficiency AAC Profile. Generally, HFR technology can be combined with any perceptual audio codec in a backward-compatible manner, thus offering the potential to upgrade established broadcast systems such as MPEG-2 Layer 2 used in the Eureka DAB system. The HFR conversion method, when combined with an audio codec, can enable wide-bandwidth audio at ultra-low bitrates. The fundamental idea behind HRF is the observation that there is usually a strong correlation between the characteristics of a signal in its high-frequency range and those of the same signal in its low-frequency range. Therefore, a good approximation for representing the original high-frequency range of the signal can be achieved by converting the signal from the low-frequency range to the high-frequency range. The concept of conversion was established in WO98/57436 as a method for regenerating high-frequency bands from lower-frequency bands of audio signals. Applying this concept in acoustic coding and/or speech coding results in substantial bitrate savings. While the following refers to acoustic coding (audio coding), it should be noted that the methods and systems described are equally applicable to speech coding and unified speech and audio coding. In HFR-based audio coding systems, low-bandwidth signals are presented to the core waveform encoder, while higher frequencies are regenerated on the decoder side using the conversion of the low-bandwidth signals and additional sub-information. This sub-information is typically encoded at a very low bitrate and describes the target spectral shape. Due to the narrow bandwidth and low bitrate of the core coded signal, it becomes increasingly important to regenerate or synthesize the high-band, i.e., the high-frequency range of the audio signal, with perceptually pleasing characteristics. Conventional techniques include several methods for reconstructing harmonic frequencies, such as using harmonic transposition or time stretching. One method is based on a phase vocoder, operating on the principle of performing frequency analysis with sufficiently high frequency resolution. Signal modification is performed in the frequency domain before the signal is resynthesized. Signal modification may be time stretching or transposition. One of the underlying problems with these methods is the conflicting constraints between the intended high-frequency resolution for obtaining high-quality conversion for steady-state sounds and the system's temporal response to transient or percussive sounds. In other words, while high-frequency resolution is beneficial for the conversion of steady-state signals, such high-frequency resolution typically requires a large window size, which becomes detrimental when dealing with the transient portion of a signal. One approach to address this problem is to adaptively vary the converter window as a function of the input signal characteristics, for example, by using window switching. Typically, for the steady portion of a signal, a long window is useful to achieve high frequency resolution. On the other hand, for the transient portion of a signal, a short window is used to implement a good transient response of the converter, i.e., good temporal resolution. However, this approach has the drawback that signal analysis measures, such as transient detection, must be incorporated into the conversion system. Such signal analysis measures often involve a decision step that triggers the switching of signal processing, for example, a decision on the presence of a transient signal. Furthermore, such measures typically affect the reliability of the system and can introduce signal artifacts when switching signal processing, for example, when switching the window size. This invention solves the aforementioned problems related to the transient performance of harmonic conversion without the need for window switching. Furthermore, improved harmonic conversion is achieved without adding significant complexity. EP0940015B1/WO98/57436 This figure shows the Dirac at a speci