EP-4738352-A2 - PROCESSING OF AUDIO SIGNALS DURING HIGH FREQUENCY RECONSTRUCTION

EP4738352A2EP 4738352 A2EP4738352 A2EP 4738352A2EP-4738352-A2

Abstract

The application relates to HFR (High Frequency Reconstruction/Regeneration) of audio signals. In particular, the application relates to a method and system for performing HFR of audio signals having large variations in energy level across the low frequency range which is used to reconstruct the high frequencies of the audio signal. A system configured to generate a plurality of high frequency subband signals covering a high frequency interval from a plurality of low frequency subband signals is described. The system comprises means for receiving the plurality of low frequency subband signals; means for receiving a set of target energies, each target energy covering a different target interval within the high frequency interval and being indicative of the desired energy of one or more high frequency subband signals lying within the target interval; means for generating the plurality of high frequency subband signals from the plurality of low frequency subband signals and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals, respectively; and means for adjusting the energy of the plurality of high frequency subband signals using the set of target energies.

Inventors

KJOERLING, KRISTOFER

Assignees

Dolby International AB

Dates

Publication Date: 20260506
Application Date: 20110714

Claims (7)

A system (601, 703) for generating a plurality of high frequency audio subband signals (604) covering a high frequency interval from a plurality of low frequency audio subband signals (602), the system (601, 703) configured to: - receive the plurality of low frequency subband signals (602); - receive a set of target energies, each target energy covering a different target interval (130) within the high frequency interval and being indicative of the desired energy of one or more high frequency subband signals lying within the target interval (130); - generate the plurality of high frequency subband signals (604) from the plurality of low frequency subband signals (602) and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals (602), respectively, wherein generating the plurality of high frequency subband signals (604) comprises scaling the plurality of low frequency subband signals (602) using the respective plurality of spectral gain coefficients; and - adjust the energy (203) of the plurality of high frequency subband signals (604) using the set of target energies, wherein adjusting the energy comprises determining, for each target interval (130), a different envelope adjustment value for each of the high frequency subband signals within the target interval (130).
A system for generating a bitstream (904), the system configured to: - receive an audio signal (903); - generate an audio bitstream (906) from the audio signal (903); - generate control data (905) from the audio signal (903), wherein generating control data (905) comprises: - analyzing the spectral shape of the audio signal (903) to determine a degree of spectral envelope discontinuities introduced when re-generating a high frequency component of the audio signal (903) from a low frequency component of the audio signal (903); and - generating control data (905) for controlling the re-generation of the high frequency component based on the degree of discontinuities, wherein determining said degree of spectral envelope discontinuities comprises determining a ratio information, the ratio information determined by studying the lowest frequencies of the low frequency component and the highest frequencies of the low frequency component, wherein a high value of the determined ratio information is indicative of a high degree of spectral envelope discontinuities and a low value of the determined ratio information is indicative of a low degree of spectral envelope discontinuities; and - combine the control data (905) with the audio bitstream (906) to form the bitstream (904).
A method for generating a plurality of high frequency audio subband signals (604) covering a high frequency interval from a plurality of low frequency audio subband signals (602), the method comprising: - receiving the plurality of low frequency subband signals (602); - receiving a set of target energies, each target energy covering a different target interval (130) within the high frequency interval and being indicative of the desired energy of one or more high frequency subband signals (604) lying within the target interval (130); - generating the plurality of high frequency subband signals (604) from the plurality of low frequency subband signals (602) and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals (602), respectively, wherein generating the plurality of high frequency subband signals (604) comprises scaling the plurality of low frequency subband signals (602) using the respective plurality of spectral gain coefficients; and - adjusting the energy of the plurality of high frequency subband signals (604) using the set of target energies, wherein adjusting the energy comprises determining, for each target interval (130), a different envelope adjustment value for each of the high frequency subband signals within the target interval (130).
A method for generating a bitstream (904), the method comprising: - receiving an audio signal (903); - generating an audio bitstream (906) from the audio signal (903); - generating control data (905) from the audio signal (903), wherein generating control data (905) comprises: - analysing the spectral shape of the audio signal (903) to determine a degree of spectral envelope discontinuities introduced when re-generating a high frequency component of the audio signal (903) from a low frequency component of the audio signal (903); and - generating control data (905) for controlling the re-generation of the high frequency component based on the degree of discontinuities, wherein determining said degree of spectral envelope discontinuities includes determining a ratio information by studying the lowest frequencies of the low frequency component and the highest frequencies of the low frequency component, wherein a high value of the determined ratio information is indicative of a high degree of spectral envelope discontinuities and a low value of the determined ratio information is indicative of a low degree of spectral envelope discontinuities; and combining the control data (905) with the audio bitstream (906) to form the bitstream (904).
A software program adapted for execution on a processor and for performing the method steps of claim 3 or claim 4 when carried out on a computing device.
A storage medium comprising an encoded bitstream, wherein the encoded bitstream comprises control data generated by performing the method steps of claim 4.
A storage medium comprising a software program adapted for execution on a processor and for performing the method steps of claim 3 or claim 4 when carried out on a computing device.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a European divisional application of European patent application EP 23157011.0 (reference: D10060EP11), for which EPO Form 1001 was filed 16 February 2023. TECHNICAL FIELD The application relates to HFR (High Frequency Reconstruction/Regeneration) of audio signals. In particular, the application relates to a method and system for performing HFR of audio signals having large variations in energy level across the low frequency range which is used to reconstruct the high frequencies of the audio signal. BACKGROUND OF THE INVENTION HFR technologies, such as the Spectral Band Replication (SBR) technology, allow to significantly improve the coding efficiency of traditional perceptual audio codecs. In combination with MPEG-4 Advanced Audio Coding (AAC) HFR forms a very efficient audio codec, which is already in use within the XM Satellite Radio system and Digital Radio Mondiale, and also standardized within 3GPP, DVD Forum and others. The combination of AAC and SBR is called aacPlus. It is part of the MPEG-4 standard where it is referred to as the High Efficiency AAC Profile (HE-AAC). In general, HFR technology can be combined with any perceptual audio codec in a back and forward compatible way, thus offering the possibility to upgrade already established broadcasting systems like the MPEG Layer-2 used in the Eureka DAB system. HFR methods can also be combined with speech codecs to allow wide band speech at ultra low bit rates. The basic idea behind HFR is the observation that usually a strong correlation between the characteristics of the high frequency range of a signal and the characteristics of the low frequency range of the same signal is present. Thus, a good approximation for the representation of the original input high frequency range of a signal can be achieved by a signal transposition from the low frequency range to the high frequency range. This concept of transposition was established in WO 98/57436 which is incorporated by reference, as a method to recreate a high frequency band from a lower frequency band of an audio signal. A substantial saving in bit-rate can be obtained by using this concept in audio coding and/or speech coding. In the following, reference will be made to audio coding, but it should be noted that the described methods and systems are equally applicable to speech coding and in unified speech and audio coding (USAC). High Frequency Reconstruction can be performed in the time-domain or in the frequency domain, using a filterbank or transform of choice. The process usually involves several steps, where the two main operations are to firstly create a high frequency excitation signal, and to subsequently shape the high frequency excitation signal to approximate the spectral envelope of the original high frequency spectrum. The step of creating a high frequency excitation signal may e.g. be based on single sideband modulation (SSB) where a sinusoid with frequency ω is mapped to a sinusoid with frequency ω + Δω where Δω is a fixed frequency shift. In other words, the high frequency signal may be generated from the low frequency signal by a "copy - up" operation of low frequency subbands to high frequency subbands. A further approach to creating a high frequency excitation signal may involve harmonic transposition of low frequency subbands. Harmonic transposition of order T is typically designed to map a sinusoid of frequency ω of the low frequency signal to a sinusoid with frequency Tω, with T > 1, of the high frequency signal. The HFR technology may be used as part of source coding systems, where assorted control information to guide the HFR process is transmitted from an encoder to a decoder along with a representation of the narrow band / low frequency signal. For systems where no additional control signal can be transmitted, the process may be applied on the decoder side with the suitable control data estimated from the available information on the decoder side. The aforementioned envelope adjustment of the high frequency excitation signal aims at accomplishing a spectral shape that resembles the spectral shape of the original highband. In order to do so, the spectral shape of the high frequency signal has to be modified. Put differently, the adjustment to be applied to the highband is a function of the existing spectral envelope and the desired target spectral envelope. For systems that operate in the frequency domain, e.g. HFR systems implemented in a pseudo-QMF filterbank, prior art methods are suboptimal in this regard, since the creation of the highband signal, by means of combining several contributions from the source frequency range, introduces an artificial spectral envelope into the highband to be envelope adjusted. In other words, the highband or high frequency signal generated from the low frequency signal during the HFR process typically exhibits an artificial spectral envelope (typically comprising spectral di