CN-116631417-B - Audio decoder, method of providing a decoded audio signal, and computer program

CN116631417BCN 116631417 BCN116631417 BCN 116631417BCN-116631417-B

Abstract

An audio decoder for providing a decoded audio signal representation based on an encoded audio signal representation is disclosed, which is configured to adjust decoding parameters according to configuration information and to decode one or more audio frames using current configuration information. The audio decoder is configured to compare configuration information in a configuration structure associated with one or more frames to be decoded with current configuration information and, if the configuration information in the configuration structure, or a relevant part of the configuration information in the configuration structure, is different from the current configuration information, to convert to decode using the configuration information in the configuration structure as new configuration information. The audio decoder is configured to consider the stream identifier information comprised in the configuration structure when comparing the configuration information such that a difference between a stream identifier previously acquired by the audio decoder and a stream identifier represented by the stream identifier information in the configuration structure results in said conversion. Corresponding methods and computer programs are also disclosed.

Inventors

Max Noyndov
MARCUS FELIX
Marseius HildenBrand
Lucas Schuster
INGO HOFFMAN
Bernd Herman
Nicholas Riterbosch

Assignees

弗劳恩霍夫应用研究促进协会

Dates

Publication Date: 20260512
Application Date: 20180110
Priority Date: 20170110

Claims (9)

1. An audio decoder (100; 200) for providing a decoded audio signal representation (112; 212) based on an encoded audio signal representation (110; 210;312;412;550;600;700; 800), Wherein the audio decoder is configured to adjust decoding parameters according to configuration information (110 a;222c;332;424;1010, 1030), Wherein the audio decoder is configured to decode one or more audio frames using the current configuration information (140; 240), and Wherein the audio decoder is configured to compare configuration information (110 a;222c;332;424;1010, 1030) in a configuration structure associated with one or more frames (222) to be decoded with current configuration information (140; 240) and, if the configuration information in the configuration structure associated with the one or more frames to be decoded or a relevant part (10200 a, 10200 b,1022a,1024 b,1026a, 1050a) of the configuration information in the configuration structure associated with the one or more frames to be decoded is different from the current configuration information, to convert to decode using the configuration information in the configuration structure associated with the one or more frames to be decoded as new configuration information; Wherein the audio decoder is configured to consider stream identifier information (230; streamID,1050a, streamidentifier) included in the configuration structure when comparing the configuration information such that a difference between a stream identifier previously acquired by the audio decoder and a stream identifier represented by stream identifier information in the configuration structure associated with the one or more frames to be decoded results in the conversion, Wherein the stream identifier is represented by a bitstream syntax element, the bitstream syntax element being represented by a 16-bit value.
2. A method for providing a decoded audio signal representation based on an encoded audio signal representation, Wherein the method comprises adjusting decoding parameters according to configuration information (110 a;222c;332;424;1010, 1030), Wherein the method comprises decoding one or more audio frames using the current configuration information (140; 240), and Wherein the method comprises comparing configuration information (110 a;222c;332;424;1010, 1030) in a configuration structure associated with one or more frames (222) to be decoded with current configuration information, and wherein the method comprises converting to decode using configuration information in the configuration structure associated with the one or more frames to be decoded as new configuration information if configuration information in the configuration structure associated with the one or more frames to be decoded, or a relevant portion (10200 a, 10200 b,1022a,1024 b,1026a, 1050a) of configuration information in the configuration structure associated with the one or more frames to be decoded, is different from the current configuration information; Wherein the method comprises considering stream identifier information (230; streamID,1050a, streamidentifier) included in the configuration structure when comparing the configuration information such that a difference between a stream identifier previously acquired in audio decoding and a stream identifier represented by stream identifier information in the configuration structure associated with the one or more frames to be decoded results in the conversion, Wherein the stream identifier is represented by a bitstream syntax element, the bitstream syntax element being represented by a 16-bit value.
3. A computer program product comprising a computer program for performing the method according to claim 2 when the computer program is run on a computer.
4. An audio decoder (100; 200) for providing a decoded audio signal representation (112; 212) based on an encoded audio signal representation (110; 210;312;412;550;600;700; 800), Wherein the audio decoder is configured to adjust decoding parameters according to configuration information (110 a;222c;332;424;1010, 1030), Wherein the audio decoder is configured to decode one or more audio frames using the current configuration information (140; 240), and Wherein the audio decoder is configured to compare configuration information (110 a;222c;332;424;1010, 1030) in a configuration structure associated with one or more frames (222) to be decoded with current configuration information (140; 240) and, if the configuration information in the configuration structure associated with the one or more frames to be decoded or a relevant part (10200 a, 10200 b,1022a,1024 b,1026a, 1050a) of the configuration information in the configuration structure associated with the one or more frames to be decoded is different from the current configuration information, to convert to decode using the configuration information in the configuration structure associated with the one or more frames to be decoded as new configuration information; Wherein the audio decoder is configured to consider stream identifier information (230; streamID,1050a, streamidentifier) included in the configuration structure when comparing the configuration information such that a difference between a stream identifier previously acquired by the audio decoder and a stream identifier represented by stream identifier information in the configuration structure associated with the one or more frames to be decoded results in the conversion, Wherein the audio decoder is configured to perform a fade-in fade-out if a configuration change is detected.
5. An audio decoder (100; 200) for providing a decoded audio signal representation (112; 212) based on an encoded audio signal representation (110; 210;312;412;550;600;700; 800), Wherein the audio decoder is configured to adjust decoding parameters according to configuration information (110 a;222c;332;424;1010, 1030), Wherein the audio decoder is configured to decode one or more audio frames using the current configuration information (140; 240), and Wherein the audio decoder is configured to compare configuration information (110 a;222c;332;424;1010, 1030) in a configuration structure associated with one or more frames (222) to be decoded with current configuration information (140; 240) and, if the configuration information in the configuration structure associated with the one or more frames to be decoded or a relevant part (10200 a, 10200 b,1022a,1024 b,1026a, 1050a) of the configuration information in the configuration structure associated with the one or more frames to be decoded is different from the current configuration information, to convert to decode using the configuration information in the configuration structure associated with the one or more frames to be decoded as new configuration information; Wherein the audio decoder is configured to consider stream identifier information (230; streamID,1050a, streamidentifier) included in the configuration structure when comparing the configuration information such that a difference between a stream identifier previously acquired by the audio decoder and a stream identifier represented by stream identifier information in the configuration structure associated with the one or more frames to be decoded results in the conversion, Wherein the audio decoder is configured to obtain and process an audio frame representation comprising random access information (222 b), Wherein the random access information comprises a configuration structure; Wherein the audio decoder is configured to fade-in and-out between audio information (272) represented by an audio frame (220) processed before reaching an audio frame representation comprising the random access information and audio information (276) obtained based on an audio frame representation (222) comprising the random access information after initializing the audio decoder with the configuration structure of the random access information if the audio decoder finds configuration information in the configuration structure of the random access information or a relevant part of the configuration information in the configuration structure of the random access information is different from the current configuration information (240).
6. The audio decoder according to claim 5, wherein the audio decoder is configured to continue decoding without performing an initialization of the audio decoder if the audio decoder has decoded an audio frame immediately preceding an audio frame represented by an audio frame representation comprising the random access information, and if the audio decoder finds that the relevant part of the configuration information (222 c) in the configuration structure of the random access information is identical to the current configuration information (240).
7. The audio decoder according to claim 5 or 6, wherein the audio decoder is configured to perform an initialization of the audio decoder using the configuration structure of the random access information if the audio decoder has not decoded an audio frame immediately preceding an audio frame represented by an audio frame representation comprising the random access information.
8. A method for providing a decoded audio signal representation based on an encoded audio signal representation, Wherein the method comprises adjusting decoding parameters according to configuration information (110 a;222c;332;424;1010, 1030), Wherein the method comprises decoding one or more audio frames using the current configuration information (140; 240), and Wherein the method comprises comparing configuration information (110 a;222c;332;424;1010, 1030) in a configuration structure associated with one or more frames (222) to be decoded with current configuration information, and wherein the method comprises converting to decode using configuration information in the configuration structure associated with the one or more frames to be decoded as new configuration information if configuration information in the configuration structure associated with the one or more frames to be decoded, or a relevant portion (10200 a, 10200 b,1022a,1024 b,1026a, 1050a) of configuration information in the configuration structure associated with the one or more frames to be decoded, is different from the current configuration information; Wherein the method comprises considering stream identifier information (230; streamID,1050a, streamidentifier) included in the configuration structure when comparing the configuration information such that a difference between a stream identifier previously acquired in audio decoding and a stream identifier represented by stream identifier information in the configuration structure associated with the one or more frames to be decoded results in the conversion, Wherein the method comprises performing a fade in and out if a configuration change is detected.
9. A computer program product comprising a computer program for performing the method according to claim 8 when the computer program is run on a computer.

Description

Audio decoder, method of providing a decoded audio signal, and computer program The present application is a divisional application of application having a filing date of 2018, 1/10, international application number PCT/EP2018/050575, chinese application number "201880017357.7", entitled "audio decoder, audio encoder, method of providing a decoded audio signal, method of providing an encoded audio signal, audio stream using a stream identifier, audio stream provider and computer program". Technical Field Embodiments according to the invention relate to an audio decoder for providing a decoded audio signal representation based on an encoded audio signal representation. Other embodiments according to the invention relate to an audio encoder for providing an encoded audio signal representation. Other embodiments according to the invention relate to a method of providing a decoded audio signal representation. Other embodiments according to the invention relate to a method of providing a representation of an encoded audio signal. Other embodiments according to the invention relate to audio streaming. Other embodiments according to the invention relate to audio stream providers. Other embodiments according to the invention relate to a computer program for performing one of these methods. Background Hereinafter, the problems behind the various aspects of the present invention and possible use scenarios according to embodiments of the present invention will be described. There are situations where transitions between different audio streams or between different coded sequences of audio frames. For example, different sequences of audio frames may comprise different audio content between which a transition should be made. For example, when using MPEG-D USAC (ISO/IEC 23003-3+amd.1+amd.2+amd.3) in an adaptive streaming use case, a situation may occur where two streams within a so-called adaptation set (e.g. which may enable two or more streams in which a user may switch to form a group) have exactly the same configuration structure (even though their bit rates are different). This may occur, for example, if the encoder chooses to operate the encoder using only the exact same encoding tools for both bit rate settings. For example, the audio encoder may use the same basic encoding settings (which are also signaled to the audio decoder), but may still provide a different representation of the audio values. For example, when it is desired to achieve a lower bit rate, the audio encoder may use coarser spectral value quantization, which results in less bit requirements, even if the basic encoder settings or decoder settings remain unchanged. However, this is not a problem (e.g., the occurrence of the case where two streams within an adaptation set have exactly the same configuration structure, even though the bit rates of the two streams are different). However, it has been found that in the adaptive streaming use case, the decoder should know whether the subsequently received access units (or "frames") originate from the same stream or whether a stream change has occurred. It has been found that if a change in stream has been detected, the audio decoder will in some cases run a specified sequence of operation steps to ensure the following steps: correctly shut down a decoder instance and feed the temporarily internally stored decoded signal portion to the decoder output, a process known as "refresh". The decoder will re-instantiate and reconfigure itself using the configuration information associated with the changed stream. The decoder will "pre-roll" embedded access units, which are piggybacked in immediate play-out frames (IPFs). This pre-scrolling of the access unit places the decoder in a fully initialized state such that decoding the output of the first frame yields a fully compatible decoded audio signal. Optionally, for example, the audio output from the decoder refresh process and the output from the first access unit of the decoder decoding the reconfiguration are faded in and out in a short period of time, depending on the respective bitstream signaling element. For example, all of the above steps may be performed to achieve the sole goal of obtaining a "seamless" transition from decoded audio of one stream to decoded audio of another stream. "seamless" means that the stream conversion itself is free of audible artifacts and minor failures. In fact, stream conversion may be perceptually noticeable because of, for example, overall coding quality or audio bandwidth or tone color variations. However, the actual point of the transition (point in time) itself does not give rise to an audible impression. In other words, there is no "click" or "noise burst" or similar objectionable sound at the transition point. It has been found that the information whether a stream change has occurred can be obtained by analyzing the configuration structure embedded in the immediate play-out frame and comparing it with the configurati