US-20260129597-A1 - SYSTEMS AND METHODS FOR AUDIO TIMING AND SYNCHRONIZATION

US20260129597A1US 20260129597 A1US20260129597 A1US 20260129597A1US-20260129597-A1

Abstract

The present disclosure provides for novel systems and methods of audio transmission that alleviate shortcomings in the art and provide novel mechanisms for robust and scalable audio transmission. In some embodiments, a method is provided for receiving an input time datum, determining a time period from the input time datum to a second time, the second time occurring after the input time datum, determining a value for at least one parameter of an adjustable filter based on the time period, configuring the adjustable filter based on the value of the at least one parameter, determining an output time datum by applying the configured adjustable filter to the input time datum, and outputting the output time datum.

Inventors

Kenneth A. Boehlke
Harold T. Davis

Assignees

DATAVAULT AI INC.

Dates

Publication Date: 20260507
Application Date: 20260105

Claims (20)

1 . A method, comprising: sampling, by at least one processor of a transmitter in a wireless audio system in temporal coincidence, a timing synchronization function time value and a local system counter to obtain TSF/counter pairs; processing, by the at least one processor, the TSF/counter pairs with an adaptive estimator to determine a frequency coefficient and a delay coefficient of a first-order mapping between the local system counter and wall time; generating, by the at least one processor, a filtered wall-time value from the local system counter using the first-order mapping defined by the frequency coefficient and the delay coefficient; computing, by the at least one processor, for each audio block of a plurality of audio blocks, a presentation timestamp from the filtered wall-time value and computing a PlayTime value that accounts for at least a transmit-side buffer delay; forming, by the at least one processor, data packets, each data packet having: a header that includes the presentation timestamp and fields carrying the frequency coefficient, the delay coefficient, and the Play Time value, and a payload that includes one or more encoded audio blocks; applying, by the at least one processor, error-resiliency coding to at least a portion of the data packets by adding redundancy symbols spanning multiple audio blocks from the plurality of audio blocks within an interleave span to recover from packet loss on a wireless medium; enforcing, by the at least one processor, during operation of the adaptive estimator, a rate limit on updates to the frequency coefficient within a bounded parts-per-million change per second; and transmitting, by the at least one processor, the data packets over the wireless medium for presentation at one or more receivers using the presentation timestamp while providing the frequency coefficient, the delay coefficient, and the PlayTime value for receiver-side timing recovery and resampling.
2 . The method of claim 1 , further comprising generating, by the at least one processor, a pseudo-beacon timing synchronization function value when a wireless network TSF is unavailable, and embedding the pseudo-beacon timing synchronization function value in the data packets for adoption by receivers as network time.
3 . The method of claim 1 , further comprising querying, by the at least one processor, the one or more receivers for a time function value and, upon receipt, using a returned value as the timing synchronization function for generating the filtered wall-time value.
4 . The method of claim 1 , wherein the presentation timestamp is an AVB transport protocol timestamp generated from a wall clock value adjusted by a latency normalization value.
5 . The method of claim 1 , wherein computing the Play Time value further comprises determining a sample interval index and a fractional interpolation parameter for a polynomial resampler based on a comparison between the Play Time and elapsed time at a start of each audio block.
6 . The method of claim 1 , wherein the interleave span for the error-resiliency coding is an integer multiple of an audio block size selected from four or eight packets per interleave.
7 . The method of claim 1 , wherein applying the error-resiliency coding comprises performing network coding across the interleave span such that each coded packet carries the redundancy symbols derived from at least two audio blocks from the plurality of audio blocks.
8 . The method of claim 1 , further comprising storing, by the at least one processor, a frequency-related parameter of the adaptive estimator in non-volatile memory and, on a subsequent operation, preloading the adaptive estimator with the stored parameter to reduce settling time.
9 . The method of claim 1 , wherein enforcing the rate limit on the updates to the frequency coefficient comprises limiting the update rate to approximately ±3 parts per million per second.
10 . The method of claim 1 , wherein the header of each data packet further includes an indicator identifying a receiver-side presentation-time locking mode selected between an analog phase-locked-loop mode and a software sample-rate conversion mode.
11 . A transmitter, comprising: at least one processor; and a memory storing computer code; wherein the at least one processor is configured to execute the computer code that causes the at least one processor to: sample, in temporal coincidence, a timing synchronization function time value and a local system counter to obtain TSF/counter pairs; process the TSF/counter pairs with an adaptive estimator to determine a frequency coefficient and a delay coefficient of a first-order mapping between the local system counter and wall time; generate a filtered wall-time value from the local system counter using the first-order mapping defined by the frequency coefficient and the delay coefficient; compute, for each audio block of a plurality of audio blocks, a presentation timestamp from the filtered wall-time value and computing a PlayTime value that accounts for at least a transmit-side buffer delay; form data packets, each data packet having: a header that includes the presentation timestamp and fields carrying the frequency coefficient, the delay coefficient, and the Play Time value, and a payload that includes one or more encoded audio blocks; apply error-resiliency coding to at least a portion of the data packets by adding redundancy symbols spanning multiple audio blocks from the plurality of audio blocks within an interleave span to recover from packet loss on a wireless medium; enforce during operation of the adaptive estimator, a rate limit on updates to the frequency coefficient within a bounded parts-per-million change per second; and transmit the data packets over the wireless medium for presentation at one or more receivers using the presentation timestamp while providing the frequency coefficient, the delay coefficient, and the PlayTime value for receiver-side timing recovery and resampling.
12 . The transmitter of claim 11 , wherein the at least one processor is further configured to generate a pseudo-beacon timing synchronization function value when a wireless network TSF is unavailable, and to embed the pseudo-beacon timing synchronization function value in the data packets for adoption by receivers as network time.
13 . The transmitter of claim 11 , wherein the at least one processor is further configured to query one or more receivers for a time function value and, upon receipt, to use a returned value as the timing synchronization function for generating the filtered wall-time value.
14 . The transmitter of claim 11 , wherein the presentation timestamp is an AVB transport protocol timestamp generated from a wall clock value adjusted by a latency normalization value.
15 . The transmitter of claim 11 , wherein computing the PlayTime value further comprises determining a sample interval index and a fractional interpolation parameter for a polynomial resampler based on a comparison between the PlayTime and elapsed time at a start of each audio block.
16 . The transmitter of claim 11 , wherein the interleave span for the error-resiliency coding is an integer multiple of an audio block size selected from four or eight packets per interleave.
17 . The transmitter of claim 11 , wherein applying the error-resiliency coding comprises performing network coding across the interleave span such that each coded packet carries the redundancy symbols derived from at least two audio blocks from the plurality of audio blocks.
18 . The transmitter of claim 11 , further comprising non-volatile memory storing a frequency-related parameter of the adaptive estimator, the at least one processor being configured, on a subsequent operation, to preload the adaptive estimator with the stored parameter to reduce settling time.
19 . The transmitter of claim 11 , wherein enforcing the rate limit on the updates to the frequency coefficient comprises limiting the update rate to approximately ±3 parts per million per second.
20 . The transmitter of claim 11 , wherein the header of each data packet further includes an indicator identifying a receiver-side presentation-time locking mode selected between an analog phase-locked-loop mode and a software sample-rate conversion mode.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation application of U.S. application Ser. No. 19/352,344, filed Oct. 7, 2025, which is a continuation application of U.S. application Ser. No. 18/222,337, filed Jul. 14, 2023, now U.S. Pat. No. 12,439,352, which claims the benefit of and priority to U.S. Provisional Patent Application No. 63/389,184, filed Jul. 14, 2022, each of which are incorporated by reference herein in their entireties. FIELD OF THE DISCLOSURE The present disclosure is related generally to the wireless distribution of high-quality audio signals and, in particular to systems and methods of distributing high-bitrate, multichannel, audio wirelessly while maintaining low latency. BACKGROUND Generally, a key element of a positive customer experience with wireless audio systems is a robust-low latency wireless link. Low latency audio is desirable for enabling good audio to video synchronization (or Lip Sync). For example, low latency audio systems allow for compatibility with abroad range of televisions. A low latency audio system will work with both low and high latency televisions as the transmitted audio can always be delayed to match the video. On the other hand, an audio system with high latency may be incompatible with low latency televisions because the audio cannot be advanced to match the video. Low latency requires quick access to the radio medium as well as low computational times. While audio and video equipment has historically been connected by analog or digital point-to-point, one-way connections, an increasing portion of multimedia content is distributed over networks. For example, video and uncompressed audio may be streamed from an audio/video source in a media room or closet to a display and multiple speakers of a surround sound system in a remote room or rooms in a residence. Due to increased cost and complexity, retrofitting finished structures with cabling, in many cases data, including video and audio data, is transmitted from a source to a display, speakers or other output devices over a network that includes a wireless communication link(s) utilizing low cost radio technologies such as frequency modulation and spread spectrum modulation to transport packetized digital data. High quality audio, whether or not combined with video, benefits from synchronization of outputs and minimization of system latency. That is, synchronization of the various outputs and minimization of system latency can be important to high quality audio/video systems. For example, source-to-output delay or latency (“lip-sync”) can be important in audio/video systems, such as home theater systems, where a slight difference (e.g., on the order of 50 milliseconds (ms)) between display of a video sequence and the output of the corresponding audio is noticeable. On the other hand, the human ear is even more sensitive to phase delay or channel-to-channel latency between the corresponding outputs of the different channels of multi-channel audio. For example, channel-to channel latency greater than 1 microsecond (μs) may result in the perception of disjointed or blurry audio. Generally, in a digital network, such as an audio/video system, a source of digital data transmits a stream of data packets to the network's endpoints where the data is presented. In some implementations, a pair of clocks at each node of the network controls the time at which a particular datum is presented and the rate at which data is processed, for example, an analog signal is digitized or digital data is converted to an analog signal for presentation. The actual or real time that an activity is to occur, such as presentation of a video datum, is determined by “wall time,” the output of a “wall clock” at the node. In some implementations, a sample or media clock controls the rate at which data is processed, for example, the rate at which blocks of digital audio data introduced to a digital to analog converter. Audio video bridging (AVB) is the common name of a set of technical standards developed by the Institute of Electrical and Electronics Engineers (IEEE) providing specifications directed to time-synchronized, low latency, streaming services over networks. The Precision Time Protocol (PTP) specified by “IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems,” IEEE Std. 1588-2008 and adopted in IEEE 802.1AS-2011—“IEEE Standard for Local and Metropolitan Area Networks—Timing and Synchronization for Time-Sensitive Applications in Bridged Local Area Networks” describes a system enabling distributed wall clocks to be synchronized within 1 μs over seven network hops. In an AVB network, each network endpoint (e.g., a network node capable of transmitting and/or receiving a data stream) can include two clocks—a “wall” clock and a “media” or “sample” clock. In some embodiments, wall time output by the wall clock can determine the real or actual time of an event's oc