EP-4348639-B1 - AUDIO ENCODING BASED ON LINK DATA

EP4348639B1EP 4348639 B1EP4348639 B1EP 4348639B1EP-4348639-B1

Inventors

SHAHBAZI MIRZAHASANLOO, TAHER
LINSKY, JOEL
OLIVIERI, FERDINANDO
BATRA, MAYANK

Dates

Publication Date: 20260506
Application Date: 20220525

Claims (15)

A device (102) comprising: a memory; and one or more processors (120) configured to: obtain link data (168) corresponding to a communication link (172) to a second device (202), the link data received from the second device (202) of the communication link and including a received signal strength indicator, RSSI (304); generate a link budget (150) based on the link data, wherein generating the link budget comprises determining a difference between a transmission power used by the device (102) to transmit over the communication link (172) and a received signal strength determined from the RSSI (304) included in the link data (168) received from the second device (202); select, at least partially based on the link budget, between an ambisonics mode (140, 330, 332) and a stereo mode (142, 334); and encode audio data (123) according to the selected ambisonics mode or stereo mode to generate encoded audio data (129).
The device of claim 1, wherein the one or more processors are configured to: select a higher order ambisonics mode (330) as a coding mode based on the link budget exceeding a first threshold (322); select a lower order ambisonics mode (332) as the coding mode based on the link budget exceeding a second threshold (324) and not exceeding the first threshold; select the stereo mode (334) as the coding mode based on the link budget exceeding a third threshold (326) and not exceeding the second threshold; and select a mono mode (336) as the coding mode based on the link budget not exceeding the third threshold.
The device of claim 1, wherein the device further comprises one or more microphones (616) coupled to the one or more processors, the one or more microphones configured to capture audio data (123) for encoding, wherein the device further comprises a modem (130, 626) coupled to the one or more processors, the modem configured to send the encoded audio data to the second device.
The device of claim 3, wherein the encoded audio data is sent to the second device via a media packet that includes metadata indicating a coding mode.
The device of claim 3, wherein the one or more processors are further configured to: receive a media packet from the second device via the modem; and based on the media packet including the link data, extract the link data from the media packet.
The device of claim 5, wherein the media packet is processed at a link layer of a multi-layer software stack.
The device of claim 5, wherein the one or more processors are further configured to: extract audio data from the media packet; and provide the audio data and the link data to a shared memory coupled to the one or more processors.
The device of claim 1, wherein selection of a coding mode is further based on whether a count of successive mode transition indicators exceeds a transition count threshold.
The device of claim 1, wherein the one or more processors are further configured to: process perceptual data (509) using a psychoacoustic model (510) to generate perceptual importance information (511); and based on selection of the ambisonics mode and the perceptual importance information (511), compress ambisonics data based on a spatio-temporal priority model to generate compressed ambisonics data for transmission.
The device of claim 1, wherein the one or more processors are integrated in at least one of a mobile phone, a tablet computer device, a camera device, a virtual reality headset, a mixed reality headset, an augmented reality headset, or a vehicle.
A method performed at a device (102) that comprises one or more processors (120), the method comprising: obtaining, at the one or more processors, link data (168) corresponding to a communication link (172) to a second device (202), the link data received from the second device (202) of the communication link and including a received signal strength indicator, RSSI (304); generating, at the one or more processors, a link budget (150) based on the link data, wherein generating the link budget comprises determining a difference between a transmission power used by the device (102) to transmit over the communication link (172) and a received signal strength determined from the RSSI (304) included in the link data (168) received from the second device (202); selecting, at the one or more processors and at least partially based on the link budget, between an ambisonics mode (140, 330, 332) and a stereo mode (142, 334); and encoding, at the one or more processors, audio data (123) according to the selected ambisonics mode or stereo mode to generate encoded audio data (129).
The method of claim 11, further comprising sending the encoded audio data to the second device, wherein the encoded audio data is sent to the second device via a media packet that includes metadata indicating the selected coding mode.
The method of claim 11, further comprising: receiving a media packet from the second device; and based on the media packet including the link data, extracting the link data from the media packet.
The method of claim 11, wherein the selecting is further based on whether a count of successive mode transition indicators exceeds a transition count threshold.
The method of claim 11, wherein the communication link (172) comprises an Institute of Electrical and Electronic Engineers, IEEE, 802.11 type network.

Description

Field The present disclosure is generally related to encoding audio data. Description of Related Art Advances in technology have resulted in smaller and more powerful computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless telephones such as mobile and smart phones, tablets and laptop computers that are small, lightweight, and easily carried by users. These devices can communicate voice and data packets over wireless networks. Further, many such devices incorporate additional functionality such as a digital still camera, a digital video camera, a digital recorder, and an audio file player. Also, such devices can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these devices can include significant computing capabilities. One application of such devices includes providing wireless immersive audio to a user. As an example, a headphone device worn by a user can receive streaming audio data from a remote server for playback to the user. To illustrate, the headphone device detects a rotation of the user's head and transmits head tracking information to the remote server. The remote server updates an audio scene based on the head tracking information, generates binaural audio data based on the updated audio scene, and transmits the binaural audio data to the headphone device for playback to the user. Performing audio scene updates and binauralization at the remote server enables the user to experience an immersive audio experience via a headphone device that has relatively limited processing resources. However, due to latencies associated with transmitting information such as head motion data between the headphone device and the remote server, updating the audio data at the remote server based on the received information, and transmitting the updated binaural audio data to the headphone device, such a system can result in an unnaturally high latency. To illustrate, the time delay between the rotation of the user's head and the corresponding modified spatial audio being played out at the user's ears can be unnaturally long, which may diminish the user's experience. Although latency may be reduced by transmitting the audio scene data to the headphone device and performing adjustments to the audio scene at the headphone device, the amount of audio data that can be transferred between the audio source and the headphone device can be limited by the quality of the communication link between the audio source and the headphone device. However, the quality of the communication link can fluctuate during a communication session. Fluctuations in the quality of the communication link can cause additional delays and playback interruptions at the headphone device when the amount of audio data being transmitted exceeds the capacity of the communication link. EP 1,176,750 A1 relates to a link quality determination unit for determining a link quality of a transmission link between an OFDM transmitter and an OFDM receiver of an OFDM transmission system. A first link quality measure determination unit determines a first link quality measure on the basis of a signal power variation or a signal-to-noise variation determined by a variation determination unit. A second link quality determination unit calculates a second link quality measure on the basis of an average signal-to-noise ratio based on the noise power and the signal power. To perform a link adaptation an overall link quality determination unit combines the first and second link quality measures into an overall link quality measure. US 2012/0323568 A1 describes a method and arrangement in a network node for adapting a property of source coding to the quality of a communication link in packet switched conversational services in a communication system. The method comprises obtaining information related to the quality of a communication link. The method further comprises selecting a source coding mode with an associated source coding delay, based on the obtained information and the associated source coding delay. The selected source coding mode is selected from a set of at least two source coding modes associated with different source coding delays, and is to be used when source coding voice data to be transmitted over the communication link. WO 2016/057926 A1 describes techniques for signaling channels for scalable coding of higher order ambisonic audio data. A device comprising a memory and a processor may be configured to perform the techniques. The memory may be configured to store the bitstream. The processor may be configured to obtain, from the bitstream, an indication of a number of channels specified in one or more layers in the bitstream, and obtain the channels specified in the one or more layers in the bitstream based on the indication of the number of channels. US 2019/0198028 A1 describes techniques