Search

CN-113963705-B - Audio encoder and decoder for frequency domain processor and time domain processor

CN113963705BCN 113963705 BCN113963705 BCN 113963705BCN-113963705-B

Abstract

An audio encoder for encoding an audio signal comprises a first encoding processor (600) for encoding a first audio signal portion in the frequency domain, the first encoding processor (600) comprising a time-to-frequency converter (602), an analyzer (604) for analyzing a frequency domain representation up to a maximum frequency to determine a first spectral portion to be encoded with a first spectral resolution, and a second spectral region to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution. A spectral encoder (606) for encoding the first spectral portion with a first spectral resolution and for encoding the second spectral portion with a second spectral resolution. A second encoding processor (610) for encoding the second different audio signal portion in the time domain, a controller (620), and an encoded signal former (630).

Inventors

  • Sasha Dish
  • MARTIN DIETZ
  • Marcus Matras
  • Guillaume fox
  • Marley Laveli
  • MATTHIAS NEUSINGER
  • Marcus Schneier
  • Benjamin schubert
  • Bernhard Grey

Assignees

  • 弗劳恩霍夫应用研究促进协会
  • 弗劳恩霍夫应用研究促进协会

Dates

Publication Date
20260421
Application Date
20150724
Priority Date
20150724

Claims (20)

  1. 1. An audio encoder for encoding an audio signal, comprising: A first encoding processor for encoding a first audio signal portion of an audio signal in the frequency domain, wherein the first encoding processor comprises: a time-to-frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion, and A spectral encoder for encoding the frequency domain representation to obtain an encoded spectral representation of the first audio signal portion; a second encoding processor for encoding a second audio signal portion of the audio signal in the time domain, wherein the second audio signal portion is different from the first audio signal portion; A cross processor for computing initialization data of the second encoding processor from the encoded spectral representation of the first audio signal portion such that the second encoding processor is initialized to encode a second audio signal portion of the audio signal temporally immediately following the first audio signal portion in the time domain; A controller configured to analyze the audio signal and to determine which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; An encoded signal former for forming an encoded audio signal comprising a first encoded signal portion for a first audio signal portion and a second encoded signal portion for a second audio signal portion, and A preprocessor configured to preprocess the first audio signal portion and the second audio signal portion, Wherein the pre-processor comprises a resampler for resampling the audio signal to the sampling rate of the second encoding processor to obtain a resampled audio signal, and a prediction analyzer configured to determine a prediction coefficient using the resampled audio signal, or Wherein the pre-processor comprises a long-term prediction analysis stage for determining one or more long-term prediction parameters for the first audio signal portion.
  2. 2. The audio encoder of claim 1, wherein the audio signal comprises a high frequency band and a low frequency band, and Wherein the second encoding processor comprises: a sample rate converter for converting the second audio signal portion into a representation having a lower sample rate than the sample rate of the audio signal, wherein the representation having the lower sample rate does not include a high frequency band of the audio signal; A time-domain low-band encoder for time-domain encoding the representation with the lower sampling rate, and A time domain bandwidth extension encoder for encoding the high frequency band in a parametric manner.
  3. 3. The audio encoder of claim 1, Wherein the preprocessor comprises a prediction analyzer for determining prediction coefficients, and Wherein the encoded signal former is configured for introducing an encoded version of the prediction coefficients into the encoded audio signal.
  4. 4. The audio encoder of claim 1, wherein the cross processor comprises: A spectrum decoder for calculating a decoded version of the first encoded signal portion, and A delay stage for delaying the decoded version of the first encoded signal portion to obtain a delayed version and feeding said delayed version into a de-emphasis stage of the second encoding processor for initialization.
  5. 5. The audio encoder of claim 1, wherein the cross processor comprises: A spectrum decoder for calculating a decoded version of the first encoded signal portion, and A weighted prediction coefficient analysis filter block for filtering the decoded version of the first encoded signal portion to obtain a filter output and feeding the filter output to an innovative codebook determiner of the second encoding processor for initialization.
  6. 6. The audio encoder of claim 1, wherein the cross processor comprises: A spectrum decoder for calculating a decoded version of the first encoded signal portion, and An analysis filtering stage for filtering a decoded version of the first encoded signal part or a pre-emphasis version derived from the decoded version of the first encoded signal part by a pre-emphasis stage to obtain a filter residual signal and feeding the filter residual signal to an adaptive codebook determiner of a second encoding processor for initialization.
  7. 7. The audio encoder of claim 1, wherein the cross processor comprises: A spectrum decoder for calculating a decoded version of the first encoded signal portion, and A pre-emphasis filter for filtering the decoded version of the first encoded signal portion to obtain a pre-emphasis version and feeding said pre-emphasis version or delayed pre-emphasis version to a synthesis filtering stage of the second encoding processor for initialization.
  8. 8. The audio encoder of claim 1, wherein the first encoding processor is configured to perform shaping of spectral values of the frequency domain representation using prediction coefficients derived from the first audio signal portion to obtain shaped spectral values, and wherein the first encoding processor is further configured to perform quantization and entropy encoding operations of said shaped spectral values of the frequency domain representation.
  9. 9. The audio encoder of claim 1, wherein the cross processor comprises: a noise shaper for shaping quantized spectral values of the frequency domain representation using LPC coefficients derived from the first audio signal portion; a spectral decoder for decoding a spectrally shaped spectral portion of the frequency domain representation with a high spectral resolution to obtain a decoded spectral representation, and A frequency-to-time converter for converting the decoded spectral representation into the time domain to obtain a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different from a sampling rate of the audio signal and a sampling rate associated with an output signal of the frequency-to-time converter is different from a sampling rate associated with the audio signal input into the time-to-frequency converter.
  10. 10. The audio encoder of claim 1, wherein the second encoding processor comprises at least one element of the following group of elements: a predictive analysis filter; an adaptive codebook stage; A codebook stage is innovated; an estimator for estimating an innovation codebook entry; ACELP/gain coding stage; predicting a synthesis filter stage; De-emphasis stage, and A bass post-filter analysis stage.
  11. 11. The audio encoder of claim 1, Wherein the second encoding processor includes an associated second sampling rate, Wherein the first encoding processor has a first sampling rate associated therewith, the first sampling rate being different from the second sampling rate, Wherein the cross processor comprises a frequency-to-time converter for generating a time domain signal at a second sampling rate, and Wherein the frequency-to-time converter comprises: A selector for selecting a portion of the frequency spectrum input to the frequency-to-time converter according to a ratio of the first sampling rate and the second sampling rate, A transform processor including a transform length different from that of the time-frequency converter, and A composite windower for windowing with a window comprising a different number of window coefficients than the window used by the time-to-frequency converter.
  12. 12. An audio decoder for decoding an encoded audio signal, comprising: a first decoding processor for decoding a first encoded audio signal portion of the encoded audio signal in the frequency domain, Wherein the first decoding processor is configured to reconstruct the first set of first spectral portions in a waveform preserving manner to generate a spectrum with gaps, wherein the gaps in the spectrum are filled with smart gap-filling IGF techniques, including frequency regeneration using application parameter data and the reconstructed first spectral portions using the first set of first spectral portions to obtain a decoded spectral representation, and Wherein the first decoding processor comprises a frequency-to-time converter for converting the decoded spectral representation into the time domain to obtain a decoded first audio signal portion; A second decoding processor for decoding a second encoded audio signal portion of the encoded audio signal in the time domain to obtain a decoded second audio signal portion; a cross processor for computing initialization data of a second decoding processor from the decoded spectral representation of the first encoded audio signal portion such that the second decoding processor is initialized to decode in the time domain a second encoded audio signal portion of the encoded audio signal temporally following the first encoded audio signal portion, and A combiner for combining the decoded first audio signal portion and the decoded second audio signal portion to obtain a decoded audio signal.
  13. 13. The audio decoder of claim 12, wherein the second decoding processor comprises: a time domain low frequency band decoder for decoding to obtain a low frequency band time domain signal; A resampler for resampling the low-band time-domain signal; A time domain bandwidth extension decoder for synthesizing a high frequency band of the time domain output signal, and A mixer for mixing the high-band and resampled low-band time-domain signals of the synthesized time-domain output signal.
  14. 14. The audio decoder of claim 12, Wherein the first decoding processor comprises an adaptive long-term prediction post-filter for post-filtering the decoded first audio signal portion, wherein the post-filter is controlled by one or more long-term prediction parameters comprised in the encoded audio signal.
  15. 15. The audio decoder of claim 12, wherein the cross processor further comprises: A frequency-to-time converter operating at a first effective sampling rate different from a second effective sampling rate associated with the frequency-to-time converter of the first decoding processor to obtain a further decoded first audio signal portion in the time domain, Wherein the signal output by the frequency-to-time converter comprises a second sampling rate different from the first sampling rate, wherein the first sampling rate is associated with the output of the frequency-to-time converter of the second decoding processor, Wherein the additional frequency-to-time converter comprises a selector for selecting a portion of the frequency spectrum input into the additional frequency-to-time converter according to a ratio of the first sampling rate and the second sampling rate; a transform processor including a transform length different from the transform length of the frequency-time converter, and A composite windower uses a window that includes a different number of coefficients than the window used by the frequency-to-time converter.
  16. 16. The audio decoder of claim 12, wherein the cross processor comprises: A delay stage for delaying the further decoded first audio signal portion and feeding a delayed version of said further decoded first audio signal portion into a de-emphasis stage of the second decoding processor for initialization.
  17. 17. The audio decoder of claim 12, wherein the cross processor comprises a pre-emphasis filter and a delay stage for filtering and delaying the further decoded first audio signal portion and feeding the delay stage output into a predictive synthesis filter of the second decoding processor for initialization.
  18. 18. An audio decoder as defined in claim 12, wherein the cross processor comprises a prediction analysis filter for generating a prediction residual signal from the further decoded first audio signal portion or from the pre-emphasized further decoded first audio signal portion and feeding the prediction residual signal into a codebook combiner of the second decoding processor.
  19. 19. The audio decoder of claim 12, wherein the cross processor comprises a switch for feeding the further decoded first audio signal portion into an analysis stage of a resampler of the second decoding processor for initialization.
  20. 20. The audio decoder of claim 12, wherein the second decoding processor comprises at least one element of a group of elements, the group of elements comprising: a stage for decoding ACELP gain and the innovation codebook; an adaptive codebook synthesis stage; an ACELP post-processor; Predictive synthesis filter, and And (5) removing the weight stage.

Description

Audio encoder and decoder for frequency domain processor and time domain processor The present application is a divisional application of the chinese application patent application "audio encoder and decoder using a frequency domain processor with full band gap filling and a time domain processor" with application number 201580049740.7, day 3, day 15 of 2017. Technical Field The present invention relates to audio signal encoding and decoding, and in particular to audio signal processing using parallel frequency and time domain encoder/decoder processors. Background Perceptual coding of audio signals is a widely used practice for the purpose of data reduction for efficient storage or transmission of audio signals. In particular, when the lowest bit rate is to be achieved, the employed encoding results in a reduction of the audio quality, which is usually mainly caused by encoder-side limitations of the bandwidth of the audio signal to be transmitted. Here, the audio signal is typically low-pass filtered such that no spectral waveform content remains above some predetermined cut-off frequency. In contemporary codecs there are well known methods for decoder-side signal recovery by audio signal bandwidth extension (BWE), e.g. Spectral Band Replication (SBR) operating in the frequency domain or so-called time domain bandwidth extension (TD-BWE) is a post-processor in a speech encoder operating in the time domain. In addition, there are several combined time/frequency domain coding concepts, such as those known under the terms AMR-wb+ or USAC. All these combined time domain/coding concepts have in common that the frequency domain encoder relies on bandwidth extension techniques that introduce band limitation into the input audio signal and that the parts above the crossover or boundary frequencies are encoded with a low resolution coding concept and synthesized at the decoder side. These concepts therefore rely mainly on pre-processor technology at the encoder side and corresponding post-processing functions at the decoder side. In general, a time-domain encoder is selected for a useful signal (e.g., a speech signal) encoded in the time domain, and a frequency-domain encoder is selected for a non-speech signal, a music signal, or the like. However, especially for non-speech signals having prominent harmonics in the high frequency band, the prior art frequency domain encoder has a reduced accuracy and thus a reduced audio quality due to the fact that such prominent harmonics can only be encoded separately in a parametric manner or completely eliminated in the encoding/decoding process. Furthermore, there is a concept in which the time domain coding/decoding branch additionally relies on bandwidth extension that also parametrically codes the higher frequency range, whereas the lower frequency range is typically coded using ACELP or any other CELP related coder (e.g. speech coder). This bandwidth extension functionality increases the bit rate efficiency but introduces a further inflexibility due to the fact that on the other hand the two coding branches, namely the frequency domain coding branch and the time domain coding branch, are band limited due to a spectral band replication process or a bandwidth extension process operating above a certain crossover frequency substantially below the maximum frequency comprised in the input audio signal. Related subject matter of the prior art includes SBR as post-processor for waveform decoding [1-3] MPEG-D USAC core switching [4] - MPEG-H 3D IGF [5] The following papers and patents describe methods which are believed to constitute the prior art of the present application: [1] M.Dietz, L.Liljeryd, K.Kj Ci rling and O.Kunz, "Spectral Band Replication, a novel approach in audio coding," at 112 th of the university of AES, munich, germany, 2002. [2] The S.Meltzer, R.B. hm and F.Henn, "SBR enhanced audio codecs for digital broadcasting such as" Digital Radio Mondiale "(DRM)," at 112 th AES conference, munich, germany, 2002. [3] T.Ziegler, A.Ehret, P.Ekstrand and M.Lutzky, "ENHANCING MP with SBR: features and Capabilities of the new mp PRO Algorithm," at 112 th of the university of AES, munich, germany, 2002. [4] MPEG-D USAC standard. [5] PCT/EP2014/065109。 In MPEG-D USAC, a switchable core encoder is described. However, in USAC, the band-limited core is limited to always send a low-pass filtered signal. Therefore, some music signals such as full-band scanning, triangular sounds, etc. containing prominent high-frequency content cannot be faithfully reproduced. Disclosure of Invention It is an object of the present invention to provide an improved concept for audio coding. This object is achieved by an audio encoding device encoder, an audio decoder, an audio encoding method, an audio decoding method or a machine-readable storage medium according to an embodiment of the invention. The invention is based on the finding that a time domain encoding/decoding processor may be combined w