RU-2861606-C1 - INTEGRATION OF HIGH-FREQUENCY RECONSTRUCTION TECHNIQUES WITH REDUCED POST-PROCESSING DELAY

RU2861606C1RU 2861606 C1RU2861606 C1RU 2861606C1RU-2861606-C1

Abstract

FIELD: computing. SUBSTANCE: invention relates to computing for processing audio data. Filtering a decoded low-band audio signal to generate a filtered low-band audio signal; and reconstructing a high-band related part of the audio signal using the filtered low-band audio signal and high-frequency reconstruction metadata, wherein the reconstruction includes spectral band replication if an insertion mode parameter has a first value, and the reconstruction includes harmonic transposition using frequency range stretching by a phase vocoder if the insertion mode parameter has a second value, wherein the filtering, reconstruction and combining are performed as a post-processing operation with a delay of 3010 discrete values per audio channel, and wherein the spectral band replication includes preserving the ratio between tonal and noise-like components using adaptive inverse filtering. EFFECT: increasing the accuracy of high-frequency reconstruction. 8 cl, 7 dwg, 4 tbl

Inventors

KJOERLING, KRISTOFER
VILLEMOES, LARS
PURNHAGEN, HEIKO
EKSTRAND, PER

Dates

Publication Date: 20260506
Application Date: 20251211
Priority Date: 20180425

Claims (20)

1. A method for reconstructing high frequencies of an audio signal, the method including:
receiving an encoded audio bitstream, wherein the encoded audio bitstream comprises audio data representing a low-range portion of an audio signal and high-frequency reconstruction metadata; wherein the high-frequency reconstruction metadata comprises envelope scale factors;
decoding the audio data to generate a decoded low-range audio signal;
extracting high frequency reconstruction metadata from an encoded audio bitstream, the high frequency reconstruction metadata comprising operating parameters for a high frequency reconstruction process, the operating parameters including an insertion mode parameter located in a backward compatible extension container of the encoded audio bitstream, wherein a first value of the insertion mode parameter indicates a spectral transfer, and a second value of the insertion mode parameter indicates a harmonic transposition using frequency range stretching by a phase vocoder;
filtering the decoded low-range audio signal to generate a filtered low-range audio signal; and
reconstructing the high-range portion of the audio signal using the filtered low-range audio signal and high-frequency reconstruction metadata, wherein the reconstruction includes spectral translation if the insert mode parameter has a first value, and the reconstruction includes harmonic transposition using frequency range stretching by a phase vocoder if the insert mode parameter has a second value,
wherein the filtering, restoration and combining are performed as a post-processing operation with a delay of 3010 discrete values per audio channel, and wherein the spectral transfer includes maintaining the ratio between tonal and noise-like components using adaptive inverse filtering.
2. The method according to claim 1, characterized in that the backward compatible expansion container further comprises a flag indicating whether additional pre-processing is used to avoid discontinuities in the shape of the spectral envelope of the upper range portion when the insert mode parameter is equal to the first value, wherein the first value of the flag enables additional pre-processing, and the second value of the flag disables additional pre-processing.
3. The method according to claim 2, characterized in that the additional pre-processing includes calculating a pre-amplification curve using a linear prediction filter coefficient.
4. The method according to claim 1, characterized in that the backward compatible extension container further comprises a flag indicating whether it is necessary to apply signal-adaptive resampling in the frequency domain when the insert mode parameter is equal to the second value, wherein the first value of the flag enables signal-adaptive resampling in the frequency domain, and the second value of the flag disables signal-adaptive resampling in the frequency domain.
5. The method according to paragraph 4, characterized in that the signal-adaptive resampling in the frequency domain is applied only to frames containing a transient signal.
6. The method of claim 1, wherein the harmonic transposition using frequency range stretching by a phase vocoder is performed with an estimated complexity of 4.5 million or less operations per second and 3 kilowords of memory or less.
7. A non-volatile machine-readable medium containing instructions that, when executed by a processor, perform the method according to paragraph 1.
8. An audio processing unit for reconstructing high frequencies of an audio signal, wherein the audio processing unit comprises:
an input interface for receiving an encoded audio bitstream, wherein the encoded audio bitstream comprises audio data representing a low-range portion of an audio signal and high-frequency reconstruction metadata; wherein the high-frequency reconstruction metadata comprises envelope scale factors;
a main audio decoder for decoding audio data to generate a decoded low-range audio signal;
a formatting removal device for extracting high frequency reconstruction metadata from an encoded audio bitstream, the high frequency reconstruction metadata comprising operating parameters for a high frequency reconstruction process, the operating parameters including an insertion mode parameter located in a backward compatible extension container of the encoded audio bitstream, wherein a first value of the insertion mode parameter indicates a spectral transfer, and a second value of the insertion mode parameter indicates a harmonic transposition using frequency range stretching by a phase vocoder;
an analysis filter unit for filtering the decoded low-range audio signal to generate a filtered low-range audio signal; and
a high frequency reconstruction device for reconstructing a high-range portion of an audio signal using a filtered low-range audio signal and high frequency reconstruction metadata, wherein the reconstruction includes spectral translation if the insert mode parameter has a first value, and the reconstruction includes harmonic transposition using frequency range stretching by a phase vocoder if the insert mode parameter has a second value,
wherein the block of analyzing filters, the high-frequency restoration device and the block of synthesizing filters are executed in the post-processor with a delay of 3010 discrete values per audio channel, and wherein the spectral transfer includes the preservation of the ratio between tonal and noise-like components using adaptive inverse filtering.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Patent Application No. 62/662,296, filed April 25, 2018, which is incorporated herein by reference in its entirety. AREA OF TECHNOLOGY Embodiments of the present invention relate to audio signal processing, and in particular to encoding, decoding, or transcoding audio bitstreams with control data specifying whether to perform either a basic form of high frequency reconstruction ("HFR") or an enhanced form of HFR on the audio data. BACKGROUND OF THE INVENTION A typical audio bitstream contains both audio data (e.g., encoded audio data) characterizing one or more channels of audio content and metadata specifying at least one characteristic of the audio data or audio content. One well-known format for generating an encoded audio bitstream is the MPEG-4 Advanced Audio Coding (AAC) format, described in the MPEG ISO/IEC 14496-3:2009 standard. In the MPEG-4 standard, AAC stands for Advanced Audio Coding, and HE-AAC stands for High Efficiency Advanced Audio Coding. The MPEG-4 AAC standard defines several audio profiles that determine which objects and coding tools are present in a compliant encoder or decoder. Three of these audio profiles are (1) the AAC profile, (2) the HE-AAC profile, and (3) the HE-AAC v2 profile. The AAC profile contains the AAC Low Complexity (or "AAC-LC") object type. The AAC-LC object is an analog of the MPEG-2 AAC Low Complexity profile with some enhancements and does not contain either the Spectral Band Reproduction ("SBR") object type or the Parametric Stereo ("PS") object type. The HE-AAC profile is a superset of the AAC profile and additionally contains the SBR object type. The HE-AAC v2 profile is a superset of the HE-AAC profile and additionally contains the PS object type. The SBR object type contains a spectral band copying tool, which is an important tool in high-frequency reconstruction (HFR) coding, significantly increasing the compression efficiency of audio codecs. SBR reconstructs the high-frequency components of the audio signal at the receiver (e.g., in the decoder). This means the encoder only needs to encode and transmit the low-frequency components, enabling significantly higher audio quality at low data rates. SBR is based on copying harmonic sequences, previously truncated to reduce the data rate, from the available bandwidth-limited signal and control data received from the encoder. The relationship between tonal and noise-like components is preserved using adaptive inverse filtering, as well as the optional addition of noise and sine waves. In the MPEG-4 AAC standard, the SBR tool performs spectral insertion (also called "linear transfer" or "spectral transfer"), in which a series of consecutive quadrature mirror filter (QMF) subbands are copied (or "inserted") from the transmitted low-band portion of the audio signal into the high-band portion of the audio signal generated at the decoder. Spectral insertion, or linear transfer, may not be ideal for certain types of audio, such as musical content with relatively low crossover frequencies. Therefore, techniques for improving spectral band copying are needed. Brief description of embodiments of the invention In a first aspect, the claimed invention relates to a method for performing high frequency reconstruction of an audio signal, the method comprising: receiving an encoded audio bitstream, wherein the encoded audio bitstream comprises audio data representing a low-range portion of an audio signal and high-frequency reconstruction metadata; wherein the high-frequency reconstruction metadata comprises envelope scale factors; decoding the audio data to generate a decoded low-range audio signal; extracting high frequency reconstruction metadata from an encoded audio bitstream, the high frequency reconstruction metadata comprising operating parameters for a high frequency reconstruction process, the operating parameters including an insertion mode parameter located in a backward compatible extension container of the encoded audio bitstream, wherein a first value of the insertion mode parameter indicates a spectral transfer, and a second value of the insertion mode parameter indicates a harmonic transposition using frequency range stretching by a phase vocoder; filtering the decoded low-range audio signal to generate a filtered low-range audio signal; and reconstructing the high-range portion of the audio signal using the filtered low-range audio signal and high-frequency reconstruction metadata, wherein the reconstruction includes spectral translation if the insert mode parameter has a first value, and the reconstruction includes harmonic transposition using frequency range stretching by a phase vocoder if the insert mode parameter has a second value, wherein the filtering, restoration and combining are performed as a post-processing operation with a delay of 3010 discrete values per audio channel, and wherein the spectral tran