CA-3292081-C - INTEGRATION OF HIGH FREQUENCY AUDIO RECONSTRUCTION TECHNIQUES
Abstract
A method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method also includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata in accordance with the flag. The high frequency regeneration is performed as a post-processing operation with a delay of 3010 samples per audio channel.
Inventors
- Kristofer Kjoerling
- Lars Villemoes
- Heiko Purnhagen
- Per Ekstrand
Assignees
- DOLBY INTERNATIONAL AB
Dates
- Publication Date
- 20260505
- Application Date
- 20190425
- Priority Date
- 20180425
Claims (5)
- 96601505 47 CLAIMS: 1. A method for performing high frequency reconstruction of an audio signal, the method comprising: receiving an encoded audio bitstream, the encoded audio bitstream including audio data representing a lowband portion of the audio signal a 5 nd high frequency reconstruction metadata, wherein the high frequency reconstruction metadata includes envelope scale factors; decoding the audio data to generate a decoded lowband audio signal; extracting from the encoded audio bitstream the high frequency reconstruction 10 metadata, the high frequency reconstruction metadata including operating parameters for a high frequency reconstruction process, the operating parameters including a patching mode parameter located in a backward-compatible extension container of the encoded audio bitstream, wherein a first value of the patching mode parameter indicates spectral translation and a second value of the patching mode 15 parameter indicates harmonic transposition by phase-vocoder frequency spreading; filtering the decoded lowband audio signal to generate a filtered lowband audio signal; regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata, wherein the 20 regenerating includes spectral translation if the patching mode parameter is the first value and the regenerating includes harmonic transposition by phase-vocoder frequency spreading if the patching mode parameter is the second value; and combining the filtered lowband audio signal with the regenerated highband portion to form a wideband audio signal, 96601505 48 wherein the filtering, regenerating, and combining are performed as a postprocessing operation with a delay of 3010 samples per audio channel, so that a composition time applies to a 3011-th audio sample within an audio composition unit.
- 2. The method of claim 1 wherein the harmonic transposition by phase-vocoder frequency spreading is performed with an estimated complexity a 5 t or below 4.5 million of operations per second and at or below 3 kWords of memory.
- 3. A non-transitory computer-readable medium having instructions which, when executed by a computing device or system, cause said computing device or system to execute the method of claim 1. 10
- 4. An audio processing unit for performing high frequency reconstruction of an audio signal, the audio processing unit comprising: an input interface for receiving an encoded audio bitstream, the encoded audio bitstream including audio data representing a lowband portion of the audio signal and high frequency reconstruction metadata, wherein the high frequency reconstruction 15 metadata includes envelope scale factors; a core audio decoder for decoding the audio data to generate a decoded lowband audio signal; a deformatter for extracting from the encoded audio bitstream the high frequency reconstruction metadata, the high frequency reconstruction metadata 20 including operating parameters for a high frequency reconstruction process, the operating parameters including a patching mode parameter located in a backwardcompatible extension container of the encoded audio bitstream, wherein a first value of the patching mode parameter indicates spectral translation and a second value of the patching mode parameter indicates harmonic transposition by phase-vocoder 25 frequency spreading; 96601505 49 an analysis filterbank for filtering the decoded lowband audio signal to generate a filtered lowband audio signal; a high frequency regenerator for reconstructing a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata, wherein the reconstructing includes a spectral 5 translation if the patching mode parameter is the first value and the reconstructing includes harmonic transposition by phase-vocoder frequency spreading if the patching mode parameter is the second value; and a combiner for combining the filtered lowband audio signal with the 10 regenerated highband portion to form a wideband audio signal, wherein the analysis filterbank and high frequency regenerator are performed in a post-processor with a delay of 3010 samples per audio channel, so that a composition time applies to a 3011-th audio sample within an audio composition unit.
- 5. The audio processing unit of claim 4 wherein the harmonic transposition by 15 phase-vocoder frequency spreading is performed with an estimated complexity at or below 4.5 million of operations per second and at or below 3 kWords of memory.
Description
96601505 1 INTEGRATION OF HIGH FREQUENCY AUDIO RECONSTRUCTION TECHNIQUES This application is a divisional of Canadian Patent Application No. 3,098,064, filed on April 25, 2019. TECHNICAL FIELD Embodiments pertain to audio signal processing, and more specifically, to encoding, decoding, or transcoding of audio bitstreams with control data specifying that either a base form of high frequency reconstruction (“HFR”) or an enhanced form of HFR is to be performed 10 on the audio data. BACKGROUND OF THE INVENTION A typical audio bitstream includes both audio data (e.g., encoded audio data) indicative of one or more channels of audio content, and metadata indicative of at least 15 one characteristic of the audio data or audio content. One well known format for generating an encoded audio bitstream is the MPEG-4 Advanced Audio Coding (AAC) format, described in the MPEG standard ISO/IEC 14496-3:2009. In the MPEG-4 standard, AAC denotes “advanced audio coding” and HE-AAC denotes “high-efficiency advanced audio coding.” 20 The MPEG-4 AAC standard defines several audio profiles, which determine which objects and coding tools are present in a complaint encoder or decoder. Three of these audio profiles are (1) the AAC profile, (2) the HE-AAC profile, and (3) the HE-AAC v2 profile. The AAC profile includes the AAC low complexity (or “AAC-LC”) object type. The AAC-LC object is the counterpart to the MPEG-2 AAC low complexity profile, with some 25 adjustments, and includes neither the spectral band replication (“SBR”) object type nor the parametric stereo (“PS”) object type. The HE-AAC profile is a superset of the AAC profile and additionally includes the SBR object type. The HE-AAC v2 profile is a superset of the HE-AAC profile and additionally includes the PS object type. The SBR object type contains the spectral band replication tool, which is an 30 important high frequency reconstruction (“HFR”) coding tool that significantly improves the compression efficiency of perceptual audio codecs. SBR reconstructs the high WO 2019/207036 PCT /EP2019/060600 2 frequency components of an audio signal on the receiver side (e.g., in the decoder). Thus, the encoder needs to only encode and transmit low frequency components, allowing for a much higher audio quality at low data rates. SBR is based on replication of the sequences of harmonics, previously truncated in order to reduce data rate, from 5 the available bandwidth limited signal and control data obtained from the encoder. The ratio between tonal and noise-like components is maintained by adaptive inverse filtering as well as the optional addition of noise and sinusoidals. In the MPEG-4 AAC standard, the SBR tool performs spectral patching (also called linear translation or spectral translation), in which a number of consecutive Quadrature Mirror Filter (QMF) 10 subbands are copied (or "patched" or) from a transmitted lowband portion of an audio signal to a highband portion of the audio signal, which is generated in the decoder. Spectral patching or linear translation may not be ideal for certain audio types, such as musical content with relatively low cross over frequencies. Therefore, techniques for improving spectral band replication are needed. Brief Description of Embodiments of the Invention A first class of embodiments relates to a method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The 20 method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method further includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a high band portion of the audio signal using the filtered lowband 25 audio signal and the high frequency reconstruction metadata in accordance with the flag. Finally, the method includes combining the filtered lowband audio signal and the regenerated highband portion to form a wideband audio signal. A second class of embodiments relates to an audio decoder for decoding an encoded audio bitstream. The decoder includes an input interface for receiving the 30 encoded audio bitstream where the encoded audio bitstream includes audio data representing a lowband portion of an audio signal and a core decoder for decoding the audio data to generate a decoded lowband audio signal. The decoder also includes a demultiplexer for extracting from the encoded audio bitstream high frequency WO 2019/207036 PCT /EP2019/060600 3 reconstruction metadata where the high frequency reconstruction metadata includes operating parameters for a high frequency reconstruction process that linearly translates a consecutive number of sub bands from a lowband portion of the audio signal to a high band portion of the audio s