US-RE50881-E1 - Encoding an information signal

Abstract

The transient problem may be sufficiently addressed, and for this purpose, a further delay on the side of the decoding may be reduced if a new SBR frame class is used wherein the frame boundaries are not shifted, i.e. the grid boundaries are still synchronized with the frame boundaries, but wherein a transient position indication is additionally used as a syntax element so as to be used, on the encoder and/or decoder sides, within the frames of these new frame class for determining the grid boundaries within these frames.

Inventors

Markus Schnell
Michael Schuldt
Manfred Lutzky
Manuel Jander

Assignees

FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Dates

Publication Date: 20260505
Application Date: 20230724

Claims (20)

1 . An encoder comprising a low-frequency portion encoder for encoding a low-frequency portion of an information signal in units of frames of the information signal; a localizer for localizing transients within the information signal; an associator for, as a function of the localization, associating a respective reconstruction mode from among at least two possible reconstruction modes with the frames of the information signal, and, for frames which have associated therewith a first one of the at least two possible reconstruction modes, associating a respective transient position indication with these frames; a generator for generating a representation of a spectral envelope of a high-frequency portion of the information signal in a temporal grid which depends on reconstruction modes associated with the frames, such that frames which have the first one of the at least two possible reconstruction modes associated therewith, the frame boundaries of these frames coincide with grid boundaries of the grid, and the grid boundaries of the grid within these frames depend on the transient position indication; and a combiner for combining the encoded low-frequency portion, the representation of the spectral envelope and information on the associated reconstruction modes and the transient position indications into an encoded information signal, wherein at least one of the low-frequency portion encoder, the localizer, the associator, the generator and the combiner comprises a hardware implementation.
2 . The encoder as claimed in claim 1 , wherein the generator is formed such that the grid boundaries within the frame, which have the first one of the at least two possible reconstruction modes associated therewith, are located such that they specify at least a first grid area whose position within the respective frame depends on the transient position indication, and whose temporal extension is smaller than ⅓ of a length of the frames, as well as a second and/or a third grid area(s) which take(s) up the remaining part of the respective frame from the first grid area to the frame boundary, which is leading in terms of time and/or trailing in terms of time, of the respective frame.
3 . The encoder as claimed in claim 2 , wherein the generator and the combiner are formed to introduce, for a frame having the first reconstruction mode associated with it which comprises three grid areas and wherein the first grid area among the three grid areas is closer to a preceding frame than a predetermined value, one or several spectral envelope values describing the spectral envelope with a respective frequency resolution, only for the first and third grid areas, into the encoded information signal, and to introduce no spectral envelope value into the encoded information signal for the second grid area of this frame.
4 . The encoder as claimed in claim 2 , wherein the generator and the combiner are formed to introduce, for a frame having the first reconstruction mode associated with it, which comprises only two grid areas and wherein the first grid area borders on the frame boundary which is trailing in terms of time, one or several spectral envelope values, for both grid areas, said one or several spectral envelope value(s) describing the spectral envelope with a respective frequency resolution, into the encoded information signal, and to also use, for determining the spectral envelope value(s) for the first grid area, parts of the information signal located in the extension grid area in the subsequent frame which borders on the trailing frame boundary, and to shorten a grid area, which is leading in terms of time, of the subsequent frame as is specified by the reconstruction mode of the subsequent frame, so as to start only at the extension grid area.
5 . The encoder as claimed in claim 3 , wherein the generator and the combiner are formed to introduce one or several spectral envelope values into the encoded information signal for a frame having the second reconstruction mode associated with it or having the first reconstruction mode associated with it, but for which neither the condition that it comprises three grid areas and that, at the same time, the first grid area among the three grid areas is located closer to the preceding frame than the predetermined value, nor the condition that it comprises only two grid areas and that, at the same time, the first grid area borders on the frame boundary which is trailing in terms of time, are fulfilled, for each grid area of this frame.
6 . The encoder as claimed in claim 2 , wherein the generator is formed such that the first grid area borders on the frame boundary, leading in terms of time, of the respective frame if there is no second grid area, and wherein the first grid area borders on the frame boundary, trailing in terms of time, of the respective frame if no third grid area exists.
7 . The encoder as claimed in claim 1 , wherein the generator is formed such that the grid boundaries within frames which have the second of the at least two possible reconstruction modes associated with them are located such that they are equally distributed over time, so that these frames only comprise one grid area or are subdivided into equally sized grid areas.
8 . The encoder as claimed in claim 1 , wherein the associator is formed to associate a frame subdivision number indication with each frame which has the second of the at least two possible reconstruction modes associated with it, the generator being formed such that the grid boundaries within these frames subdivide these frames into a number of grid areas, said number depending on the respective frame subdivision number indication.
9 . The encoder as claimed in claim 1 , wherein the generator is formed such that the frame boundaries of the frames coincide with grid boundaries of the grid independently of the possible reconstruction modes associated with the frames.
10 . The encoder as claimed in claim 1 , wherein the generator comprises an analysis filter bank which generates a set of spectral values for each filter bank time slot of the information signal, each frame with a length of several filter bank time slots, and the generator further comprising an averager for averaging the energy spectral values in the resolution of the grid.
11 . A decoder comprising an extractor for extracting, from the encoded information signal, an encoded low-frequency portion of an information signal, a representation of a spectral envelope of a high-frequency portion of the information signal, information on reconstruction modes associated with frames of the information signal and corresponding with one, respectively, of at least two reconstruction modes, and transient position indications associated with frames, in each case, which have a first one of the at least two reconstruction modes associated with them; a low-frequency portion decoder for decoding the encoded low-frequency portion of the information signal in units of frames of the information signal; a provider for providing a preliminary high-frequency portion signal on the basis of the decoded low-frequency portion; and an adaptor for spectrally adapting the preliminary high-frequency portion signal to the spectral envelopes by means of spectral weighting of the preliminary high-frequency portion signal as a function of the representation of the spectral envelopes in a temporal grid which depends on the reconstruction modes associated with the frames, such that for frames having the first one of the at least two possible reconstruction modes associated with them, the frame boundaries of these frames coincide with grid boundaries of the grid, and the grid boundaries of the grid within these frames depend on the transient position indication, wherein at least one of the extractor, the low-frequency portion decoder, the provider, and the adaptor comprises a hardware implementation.
12 . The decoder as claimed in claim 11 , wherein the adaptor for spectrally adapting is formed such that the grid boundary, or grid boundaries, within a frame having the first one of the at least two possible reconstruction modes associated with it is/are located such that it/they specify/specifies at least a first grid area whose position within the respective frame depends on the transient position indication, and whose temporal extension is smaller than ⅓ of a length of the frames, as well as a second and/or third grid area(s) which take(s) up the remaining part of the respective frame from the first grid area up to the frame boundary, which is leading in terms of time, or trailing in terms of time, of the respective frame.
13 . The decoder as claimed in claim 12 , wherein the extractor is formed to expect one or several spectral envelope values in the encoded information signal, and to extract same from the encoded information signal, only for the first and third grid areas, for a frame having the first reconstruction mode associated with it which comprises three grid areas and wherein the first grid area among the three grid areas is more to a preceding frame than a predetermined value, said one or several spectral envelope values describing the spectral envelope with a respective frequency resolution, and to obtain, for the second grid area, one or several spectral envelope values for the representation of the spectral envelope from the grid area, which is the last in terms of time, of the preceding frame.
14 . The decoder as claimed in claim 12 , wherein the extractor is formed to expect one or several spectral envelope values in the encoded information signal, and to extract same from the encoded information signal, for both grid areas, for a frame having the first reconstruction mode associated with it which comprises two grid areas and wherein the first grid area borders on the frame boundary, trailing in terms of time, of the frame, said one or several spectral envelope values describing the spectral envelope with a respective frequency solution, and to obtain from the spectral envelope value(s) for the first grid area one or several spectral envelope value(s) for a supplemental grid area in the subsequent frame, said supplementary grid area bordering on the trailing frame boundary, and to shorten accordingly a grid area, leading in terms of time, of the subsequent frame, as is defined by the reconstruction mode of the subsequent frame, so as to start only at the supplementary grid area, wherein the temporal grid within the subsequent frame is subdivided, the adaptor for spectral adaptation being formed to perform the adaptation in the subdivided temporal grid.
15 . The decoder as claimed in claim 13 , wherein the extractor is formed to introduce one or several spectral envelope values into the encoded information signal, or to extract same from the encoded information signal, for a frame having the second reconstruction mode associated with it or having the first reconstruction mode associated with it, but for which neither the condition that it comprises three grid areas and that, at the same time, the first grid area among the three grid areas is located closer to the preceding frame than the predetermined value, nor the condition that it comprises only two grid areas and that, at the same time, the first grid area borders on the frame boundary which is trailing in terms of time, are fulfilled, for each grid area of this frame.
16 . The decoder as claimed in claim 15 , wherein the adaptor for spectrally adapting is formed such that the first grid area borders on the frame boundary, leading in terms of time, of the respective frame if there is no second grid area, and wherein the first grid area borders on the frame boundary, trailing in terms of time, of the respective frame if no third grid area exists.
17 . The decoder as claimed in claim 11 , wherein the adaptor for spectrally adapting is formed such that the grid boundaries within frames which have the second of the at least two possible reconstruction modes associated with them are located such that they are equally distributed over time, so that these frames only comprise one grid area or are subdivided into equally sized grid areas.
18 . The decoder as claimed in claim 11 , wherein the extractor is formed to extract, from the encoded information signal, also a frame subdivision number indication which is associated, in each case, with frames which have the second of the possible reconstruction modes associated with them, the adaptor for spectrally adaptating being formed such that the grid boundaries within these frames are subdivided into a number of grid areas, said number depending on the respective frame subdivision number indication.
19 . The decoder as claimed in claim 11 , wherein the adaptor for spectrally adapting is formed such that the frame boundaries of the frames coincide with grid boundaries of the grid independently of the possible reconstruction modes associated with the frames.
20 . The decoder as claimed in claim 11 , wherein the adaptor for spectrally adapting comprises an analysis filter bank which generates a set of spectral values for each filter bank time slot of the information signal, each frame with a length of several filter bank time slots, and the adaptor for spectrally adapting further comprising a determinator for determining the energy of the spectral values in the resolution of the grid.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS Notice: More than one reissue application has been filed on Jul. 24, 2023 for the reissue of U.S. Pat. No. 8,126,721, including pending subject reissue application Ser. No. 18/225,619, allowed reissue patent application Ser. No. 18/225,488, and pending reissue application Ser. Nos. 18/225,505, 18/225,527, 18/225,553, and 18/225,568. This application, Ser. No. 18/225,619, filed Jul. 24, 2023, is a reissue application of issued U.S. Pat. No. 8,126,721, issued Feb. 28, 2012, which claims priority from Provisional U.S. patent application Ser. No. 60/862,033, which was filed on Oct. 18, 2006, and is incorporated herein its entirety by reference. This application claims priority from Provisional U.S. patent application Ser. No. 60/862,033, which was filed on Oct. 18, 2006, and is incorporated herein in its entirety by reference. TECHNICAL FIELD The present invention relates to information signal encoding such as audio encoding, and, in that context, in particular to SBR (spectral band replication) encoding. BACKGROUND In applications having a very small bit rate available, it is known, in the context of encoding audio signals, to use an SBR technique for encoding. Only the low-frequency portion is encoded fully, i.e. at an adequate temporal and spectral resolution. For the high-frequency portion, only the spectral envelope, or the envelope of the spectral temporal curve of the audio signal, is detected and encoded. On the decoder side, the low-frequency portion is retrieved from the encoded signal and is subsequently used to reconstruct, or “replicate”, the high-frequency portion therefrom. However, to adapt the energy of the high-frequency portion, which has thus been preliminarily reconstructed, to the actual energy within the high-frequency portion of the original audio signal, the spectral envelope transmitted is used, on the decoder side, for spectral weighting of the high-frequency portion reconstructed preliminarily. For the above effort to be worthwhile, it is important, of course, that the number of bits used for transmitting the spectral envelopes be as small as possible. It is therefore desirable for the temporal grid within which the spectral envelope is encoded to be as coarse as possible. On the other hand, however, too coarse a grid leads to audible artefacts, which is notable, in particular, with transients, i.e. at locations where the high-frequency portions will predominate rather than, as usual, the low-frequency portions, or where there is at least a rapid increase in the amplitude of the high-frequency portions. In audio signals, such transients correspond, for example, to the beginnings of a note, such as actuation of a piano string or the like. If the grid is too coarse over the time period of a transient, this may lead to audible artefacts in the decoder-side reconstruction of the entire audio signal. For, as one knows, on the decoder side, the high-frequency signal is reconstructed from the low-frequency portion in that, within the grid area, the spectral energy of the decoded low-frequency portion is normalized and then adapted to the spectral envelope transmitted by means of weighting. In other words, spectral weighting is simply performed within the grid area so as to reproduce the high-frequency portion from the low-frequency portion. However, if the grid area around the transient is too large, a lot of energy will be located, within this grid area, in addition to the energy of the transient, in the background and/or chord portion in the low-frequency portion which is used for reproducing the high-frequency portion. Said low-frequency portion is co-amplified by the weighting factor, even though this does not result in a good estimation of the high-frequency portion. Across the entire grid area, this will lead to an audible artefact which, in addition, will set in even before the actual transient. This problem may also be referred to as “pre-echo”. The problem could be solved when the grid area around the transient is fine enough so that the transient/background ratio of the part of the low-frequency portion within this grid area is improved. Small grid areas or small grid boundary distances, however, are obstacles on the way to the above-outlined desire for a low bit consumption for encoding the spectral envelopes. In the ISO/IEC 14496-3 standard—simply referred to as “the standard” below—an SBR encoding is described in the context of the AAC encoder. The AAC encoder encodes the low-frequency portion in a frame-by-frame manner. For each such SBR frame, the above-specified time and frequency resolution is defined at which the spectral envelope of the high-frequency portion is encoded in this frame. To address the problem that transients may also fall on SBR frame boundaries, the standard allows that the temporal grid may temporarily be defined such that the grid boundaries do not necessarily coincide with the frame boundaries. Rather, in this sta