EP-4738349-A2 - METHODS FOR PHASE ECU F0 INTERPOLATION SPLIT AND RELATED CONTROLLER

EP4738349A2EP 4738349 A2EP4738349 A2EP 4738349A2EP-4738349-A2

Abstract

A method and an apparatus for increased frequency resolution in a frequency domain concealment operation for a lost audio frame is provided. At least one bin vector of a spectral representation for at least one tone is obtained, wherein the at least one bin vector includes three consecutive bin values for the at least one tone. The method comprises selecting, responsive to whether each of the three consecutive bin values has a complex value or a real value, between a first interpolation method using absolute values of the three consecutive bin values when at least one bin value has a real value, and a second interpolation method using complex-valued coefficients of the three consecutive bin values when all three bin values have complex values. The selected interpolation method is used to calculate a fractional offset for estimating a frequency of the at least one tone for frame reconstruction.

Inventors

SEHLSTEDT, MARTIN

Assignees

Telefonaktiebolaget LM Ericsson (publ)

Dates

Publication Date: 20260506
Application Date: 20200220

Claims (15)

A method for increased frequency resolution in a frequency domain concealment operation for a lost audio frame, in which concealment operation an estimate of at least one tone in a spectral representation of a prototype frame is refined by performing a frequency interpolation using three consecutive bin values, the method comprising: performing a Fast Fourier Transform (FFT) on the prototype frame to generate a spectral representation having complex-valued coefficients for frequency bins between DC and half sampling frequency and real-valued coefficients for DC and half sampling frequency bins; obtaining at least one bin vector of the spectral representation for at least one tone, wherein the at least one bin vector includes three consecutive bin values for the at least one tone; responsive to whether each of the three consecutive bin values has a complex value or a real value, selecting between a first interpolation method using absolute values of the three consecutive bin values when at least one bin value has a real value, and a second interpolation method using complex-valued coefficients of the three consecutive bin values when all three bin values have complex values; and using the selected interpolation method to calculate a fractional offset for estimating a frequency of the at least one tone for frame reconstruction.
The method of claim 1, wherein the prototype frame is extracted from a previously decoded signal.
The method of claim 1 or 2, wherein the second interpolation method uses complex-valued coefficients of the three consecutive bin values to calculate the fractional offset using the equation: δ = K jacob RE X k − 1 − X k + 1 2 X k − X k − 1 − X k + 1 where K jacob is a scaling coefficient, RE{} denotes taking the real part, and X k -1 , X k , and X k +1 are the three consecutive complex-valued coefficients.
The method of claim 3, wherein the scaling coefficient K jacob is 1.1429.
The method of any one of claims 1 to 4, wherein the first interpolation method calculates the fractional offset using absolute values of the three consecutive bin values according to: δ = X k + 1 − X k − 1 4 X k − 2 X k − 1 − 2 X k + 1 .
The method of any one of claims 1 to 5, wherein the fractional offset is combined with a coarse resolution peak location k to estimate the frequency according to: f k = ( k + δ ) C, where C is the coarse resolution in Hz/bin.
The method of claim 6, wherein the coarse resolution C equals fs / N FFT , where fs is the sampling frequency and N FFT is the FFT length.
An apparatus for increased frequency resolution in a frequency domain concealment operation for a lost audio frame, in which concealment operation an estimate of at least one tone in a spectral representation of a prototype frame is refined by performing a frequency interpolation using three consecutive bin values, the apparatus adapted to: perform a Fast Fourier Transform (FFT) on the prototype frame to generate a spectral representation having complex-valued coefficients for frequency bins between DC and half sampling frequency and real-valued coefficients for DC and half sampling frequency bins; obtain at least one bin vector of the spectral representation for at least one tone, wherein the at least one bin vector includes three consecutive bin values for the at least one tone; select, responsive to whether each of the three consecutive bin values has a complex value or a real value, between a first interpolation method using absolute values of the three consecutive bin values when at least one bin value has a real value, and a second interpolation method using complex-valued coefficients of the three consecutive bin values when all three bin values have complex values; and use the selected interpolation method to calculate a fractional offset for estimating a frequency of the at least one tone for frame reconstruction.
The apparatus of claim 8, adapted to extract the prototype frame from a previously decoded signal.
The apparatus of claim 8 or 9, wherein the second interpolation method uses complex-valued coefficients of the three consecutive bin values to calculate the fractional offset using the equation: δ = K jacob RE X k − 1 − X k + 1 2 X k − X k − 1 − X k + 1 where K jacob is a scaling coefficient, RE{} denotes taking the real part, and X k -1 , X k , and X k+ 1 are the three consecutive complex-valued coefficients.
The apparatus of claim 10, wherein the scaling coefficient K jacob is 1.1429.
The apparatus of any one of claims 8 to 11, wherein the first interpolation method calculates the fractional offset using absolute values of the three consecutive bin values according to: δ = X k + 1 − X k − 1 4 X k − 2 X k − 1 − 2 X k + 1 .
The apparatus of any one of claims 8 to 12, adapted to combine the fractional offset with a coarse resolution peak location k to estimate the frequency according to: f k = ( k + δ ) C, where C is the coarse resolution in Hz/bin.
The apparatus of claim 13, wherein the coarse resolution C equals fs / N FFT , where fs is the sampling frequency and N FFT is the FFT length.
An audio decoder comprising the apparatus according to any one of claims 8 to 14.

Description

TECHNICAL FIELD The present disclosure relates generally to a method for controlling a concealment method for a lost audio frame. The present disclosure also relates to a controller configured to control a concealment method for a lost audio frame of a received audio signal. BACKGROUND Transmission of speech/audio over modern communications channels/networks is mainly done in the digital domain using a speech/audio codec. This may involve taking the analog signal and digitalizing it using sampling and analog to digital converter (ADC) to obtain digital samples. These digital samples may be further grouped into frames that contain samples from a consecutive period of 10 - 40 ms depending on the application. These frames may then be processed using a compression algorithm, which reduces the number of bits that needs to be transmitted and which may still achieve as high quality as possible. The encoded bit stream is then transmitted as data packets over the digital network to the receiver. In the receiver, the process is reversed. The data packets may first be decoded to recreate the frame with digital samples which may then be inputted to a digital to analog converter (DAC) to recreate the approximation of the input analog signal at the receiver. Figure 1 provides an example of a block diagram of an audio transfer using audio encoder and decoder over a network, such as a digital network, using the above-described approach. When the data packets are transmitted over the network there can be data packets that may either be dropped by the network due to traffic load or dropped as a result of bit errors making the digital data invalid for decoding. When these events happen, the decoder needs to replace the output signal during periods where it is impossible to do the actual decoding. This replacement process is typically called frame/packet loss concealment (PLC). Figure 2 illustrates a block diagram of a decoder 200 including packet loss concealment. When a Bad Frame Indicator (BFI) indicates lost or corrupted frame, PLC 202 may create a signal to replace the lost/corrupted frame. Otherwise, i.e. when BFI does not indicate lost or corrupted frame, the received signal is decoded by a stream decoder 204. A frame erasure may be signalled to the decoder by setting the bad frame indicator variable for the current frame active, i.e. BFI=1. The decoded or concealed frame is then input to DAC 206 to output an analog signal. Frame/packet loss concealment may also be referred to as error concealment unit (ECU). There are numerous ways of doing packet loss concealment in a decoder. Some examples are replacing the lost frame with silence and repeating the last frame (or decoding of the last frame parameters). Other approaches try to replace the frame with the most likely continuation of the audio signal. For noise like signals, one approach may generate noise with a similar spectral structure. For tonal signals, one may first estimate the characteristics of present tones (frequency, amplitude, and phase) and use these parameters to generate a continuation of the tones at the corresponding temporal locations of lost frames. Another approach for an ECU is the Phase ECU, described in 3GPP TS 26.477 V15.0.0 clause 5.4.3.5 and WO2014/123471A1, where the decoder may continuously save a prototype of the decoded signal during normal decoding. This prototype may be used in case of a lost frame. The prototype is spectrally analyzed, and the noise and tonal ECU functions are combined in the spectral domain. The Phase ECU identifies tones in the spectrum and calculates a spectral temporal replacement of related spectral bins. The other bins (non-tonal) may be handled as noise and are scrambled to avoid tonal artifacts in these spectral regions. The resulting recreated spectrum is inverse FFT (fast Fourier transform) transformed into time domain and the signal is processed to create a replacement of the lost frame. When the audio codec is based on modified discrete cosine transform (MDCT), the creation of the replacement includes the windowing, TDA (Time Domain aliasing) and ITDA (Inverse TDA) related to lapped MDCT to create an integrated continuation of the already decoded signal. This method may ensure continued use of MDCT memory and creation of MDCT memory that is to be used when normal decoding is to be resumed. The first correctly decoded frame in the transition from PLC to normal operation is also known as the first good frame. Figure 3 illustrates a time alignment and signal diagram of Phase ECU and recreation of the signal from a PLC prototype. Figure 3 also shows the timing relation between the encoder (top after encoder input) and in the decoder (second line after Decoder synthesis) at the point of the first lost frame in the decoder. It also illustrates how the Phase ECU time advanced recreated signal is positioned and used to continue using windowing TDA, ITDA and synthesis windowing to recreate the missing frame segment a