US-20260128049-A1 - METHOD, APPARATUS, AND MEDIUM FOR ENCODING AND DECODING OF AUDIO BITSTREAMS WITH FLEXIBLE BLOCK-BASED SYNTAX

US20260128049A1US 20260128049 A1US20260128049 A1US 20260128049A1US-20260128049-A1

Abstract

A method for transmitting audio signals of an immersive audio program, the method comprising: generating packets of data comprising portions of a bitstream of the audio signals, wherein the bitstream comprises a plurality of frames, wherein each frame of the plurality of frames comprises a plurality of blocks, wherein the generating comprises: assembling a packet of data comprising one or more blocks of the plurality of blocks, wherein blocks from different frames are combined into a single packet and/or blocks are transmitted out of order; and transmitting the packets of data via a packet-based network.

Inventors

Holger Hoerich
Malte Schmidt
Heiko Purnhagen
Kristofer Kjorling
Christof Joseph FERSCH
Daniel Fischer

Assignees

DOLBY INTERNATIONAL AB

Dates

Publication Date: 20260507
Application Date: 20230915

Claims (18)

1 . A method for transmitting audio signals of an immersive audio program, the method comprising: generating packets of data comprising portions of a bitstream of the audio signals, wherein the bitstream comprises a plurality of frames, wherein each frame of the plurality of frames comprises a plurality of blocks, wherein the generating comprises: assembling a packet of data comprising one or more blocks of the plurality of blocks, wherein blocks from different frames are combined into a single packet and/or blocks are transmitted out of order; and transmitting the packets of data via a packet-based network.
2 . The method of claim 1 , wherein each block of the plurality of blocks comprises identifying information.
3 . The method of claim 2 , wherein the identifying information comprises at least one of a block ID, where the block ID indicates which set of signals of the entire immersive audio program is carried by that block, a corresponding frame number associated with the block, and/or a priority for retransmission, where a high priority for retransmission signals that this block shall be preferred at the decoder over another block with the same block ID and frame counter but lower priority for retransmission.
4 . The method of claim 1 , wherein each frame of the plurality of frames carries audio data, preferably all audio data, that represents a continuous segment, such as a time period, of the audio signals of the immersive audio program with a start time, an end time, and a duration.
5 . A method for decoding an audio signal, the method comprising: receiving packets of data comprising portions of a bitstream of the audio signals of the immersive audio program, wherein the bitstream comprises a plurality of frames, wherein each frame of the plurality of frames comprises a plurality of blocks; determining a set of blocks of the plurality of blocks addressed to a device; and decoding the set of blocks addressed to the device and skipping decoding the blocks of the plurality of blocks not addressed to the device.
6 . A method for transmitting an audio stream, the method comprising: transmitting the audio stream, wherein the audio stream comprises a plurality of frames, wherein each frame of the plurality of frames comprises a plurality of blocks, wherein the transmitting comprises transmitting configuration information for the audio stream out of band.
7 . The method of claim 6 , wherein transmitting configuration information for the audio stream out of band comprises: transmitting the audio stream via a first network and/or a first network protocol; and transmitting the configuration information via a second network and/or a second network protocol.
8 . The method of claim 7 , wherein the first network protocol is a User Datagram Protocol (UDP) and the second network protocol is a Transmission Control Protocol (TCP).
9 . A method for decoding audio signals, the method comprising: receiving a bitstream of the audio signals of an immersive audio program, the bitstream comprising: information corresponding to a signaling of static configuration aspects, and static metadata; and mapping one or more channel elements to one or more devices based on the information and/or the static metadata.
10 . The method of claim 9 , wherein the bitstream is received by a plurality of decoders configured to decode the bitstream, wherein each decoder of the plurality of decoders is configured to decode a portion of the bitstream.
11 . The method of claim 9 , wherein the bitstream further comprises dynamic metadata.
12 . The method of claim 9 , wherein the bitstream comprises a plurality of blocks, wherein each block of the plurality of blocks comprises: information that enables for a portion of the block to be skipped during decoding, wherein the portion is not needed for a device; and dynamic metadata.
13 . A method for re-transmitting blocks of audio signals of an immersive audio program, the method comprising: transmitting one or more blocks of a bitstream of the audio signals, wherein the bitstream comprises a plurality of blocks, wherein each of the one or more blocks of the bitstream has been previously transmitted; and wherein each of the one or more blocks comprises a decoding priority indicator.
14 . The method of claim 13 , wherein the decoding priority indicator indicates to a decoder an order of priority for decoding the one or more blocks of the bitstream.
15 . The method of claim 13 , wherein each block of the one or more blocks comprises a same block ID.
16 . The method of claim 13 , wherein transmitting one or more blocks of the bitstream comprises reducing a data rate in comparison to the previous transmission.
17 . The method of claim 16 , wherein reducing the data rate comprises at least one of reducing a signal to noise ratio of the audio signal, reducing a bandwidth of the audio signal, and/or reducing a channel count of the audio signal.
18 . A method for receiving audio signals of an immersive audio program, the method comprising: receiving, by at least one device, packets of data comprising portions of a bitstream of the audio signals from a packet-based network; extracting blocks of the bitstream from a packet of the packets of data; skipping over blocks not addressed to the at least one device; ordering extracted blocks based on their decode or presentation time; identifying whether multiple versions of a block, each having a different priority, are present in the ordered extracted blocks, and when multiple versions of the block are present in the ordered extracted blocks, retaining the highest priority version of the block and removing any lower priority versions of the block to produce a stream of blocks; and providing the stream of blocks to a decoder.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of priority from U.S. Provisional Application Ser. No. 63/378,499, filed on 5 Oct. 2022, and U.S. Provisional Application Ser. No. 63/578,543, filed on 24 Aug. 2023, each of which is incorporated by reference herein in its entirety. TECHNICAL FIELD This disclosure relates generally to audio signal processing, and more specifically to audio source coding and decoding for low latency interchange of audio signals of immersive audio programs between devices. BACKGROUND Streaming of audio is common in today's society. The audio streaming is becoming more and more demanding with the users' expectations rising on quality but also the user's setup becoming more complex with a number of speakers but also types of speakers. The streaming is normally done at some part at least over a wireless link which then puts some requirement on the wireless link to have good quality and as many probably have experience, this is not always the case. Therefore, there is a need to define an interchange format for use-cases where a certain format is streamed from the cloud/server and subsequently on a device transcoded to a more suitable low latency format for distribution over wireless (or, in some cases wired) links. Exemplary use-cases are in-home connectivity, as well as phone-to-automotive connectivity, although the format may be beneficial in any scenario where low latency distribution of audio signals from a single device to a one or more connected devices is desired. In addition to the audio information being sent wirelessly and streamed there could also be other types of information being incorporated into the stream. Such other type of information will then also be affected by the quality of the wireless link and might have similar drawbacks as for the audio. It would therefore be advantageous to overcome problems associated with wireless streaming for different types of streamed audio combined with other types of information or signals. BRIEF SUMMARY An object of the present disclosure is to overcome the above problem at least partly with wireless streaming of audio combined with other types of information. According to a first aspect of the disclosure a method for transmitting audio signals of an immersive audio program is provided, the method comprising generating packets of data comprising portions of a bitstream of the audio signals, wherein the bitstream comprises a plurality of frames, wherein each frame of the plurality of frames comprises a plurality of blocks, wherein the generating comprises assembling a packet of data comprising one or more blocks of the plurality of blocks, wherein blocks from different frames are combined into a single packet and/or blocks are transmitted out of order; and transmitting the packets of data via a packet-based network. According to a second aspect of the disclosure a method for decoding an audio signal is provided, the method comprising receiving packets of data comprising portions of a bitstream of the audio signals of the immersive audio program, wherein the bitstream comprises a plurality of frames, wherein each frame of the plurality of frames comprises a plurality of blocks; determining a set of blocks of the plurality of blocks addressed to a device; and decoding the set of blocks addressed to the device and skipping decoding the blocks of the plurality of blocks not addressed to the device. According to a third aspect of the disclosure a method for transmitting an audio stream is provided, the method comprising transmitting the audio stream, wherein the audio stream comprises a plurality of frames, wherein each frame of the plurality of frames comprises a plurality of blocks, wherein the transmitting comprises transmitting configuration information for the audio stream out of band. According to a fourth aspect of the disclosure a method for decoding audio signals is provided, the method comprising receiving a bitstream of the audio signals of an immersive audio program, the bitstream comprising information corresponding to a signaling of static configuration aspects and static metadata; and mapping one or more channel elements to one or more devices based on the information and/or the static metadata. According to a fifth aspect of the disclosure a method for re-transmitting blocks of audio signals of an immersive audio program is provided, the method comprising transmitting one or more blocks of a bitstream of the audio signals, wherein the bitstream comprises a plurality of blocks, wherein each of the one or more blocks of the bitstream has been previously transmitted; and wherein each of the one or more blocks comprises a decoding priority indicator. According to a sixth aspect a method a method for receiving audio signals of an immersive audio program, the method comprising, receiving, by at least one device, packets of data comprising portions of a bitstream of the audio signals from a packet-