US-20260129364-A1 - SYSTEM AND METHOD TO CONCEAL DISCONTINUITIES IN AUDIO BLOCKS

US20260129364A1US 20260129364 A1US20260129364 A1US 20260129364A1US-20260129364-A1

Abstract

The present disclosure provides systems, methods, and audio devices for concealing discontinuities in wireless audio playback. In one embodiment, an audio device includes a wireless receiver and a replay buffer. The replay buffer stores audio blocks and detects a discontinuity in the sequence. A replacement audio block is generated by flipping time indices of a stored audio block and conditionally applying a vertical flip based on slope continuity. The replacement block is filtered using a glitch filter with coefficients selected according to frequency content of the stored block, and crossfaded with the stored block to produce output audio that conceals the discontinuity.

Inventors

Kenneth A. Boehlke

Assignees

DATAVAULT AI INC.

Dates

Publication Date: 20260507
Application Date: 20260105

Claims (20)

1 . A method, comprising: computing, by at least one processor, for a previous audio block, an energy ratio between a bandlimited-derivative filtered version of the previous audio block and an unfiltered version of the previous audio block; determining, by the at least one processor, an octave-indexed frequency band based on the energy ratio; setting, by the at least one processor, a maximum number of repeat presentations inversely correlated to the octave-indexed frequency band; upon detecting that normal audio returns before the maximum number of repeat presentations is reached, crossfading, by the at least one processor, from a replayed signal, comprising the previous audio block, to the normal audio; and upon failing to detect that normal audio returns before the maximum number of repeat presentations is reached, fading, by the at least one processor, the replayed signal to zero amplitude over a fade-out interval.
2 . The method of claim 1 , wherein determining the octave-indexed frequency band comprises slicing the energy ratio into eight indexes corresponding to octave ranges calibrated by a stepped frequency tone.
3 . The method of claim 1 , wherein computing the bandlimited-derivative filtered version comprises applying a four-tap derivative filter having coefficients [−1, 1, 1, −1].
4 . The method of claim 1 , further comprising selecting, by the at least one processor, a glitch filter from a set of octave-indexed coefficient sets according to the octave-indexed frequency band; and selecting, by the at least one processor, a crossfade duration for transitioning from the replayed signal to the normal audio.
5 . The method of claim 1 , further comprising measuring, by the at least one processor, autocorrelation within a previous audio block to determine a relevance metric; wherein setting the maximum number of repeat presentations comprises increasing the maximum number of repeat presentations when the relevance metric indicates lower frequency content and decreasing the maximum number of repeat presentations when the relevance metric indicates higher frequency content.
6 . The method of claim 5 , further comprising generating, by the at least one processor, a horizontally flipped version and a vertically flipped version of the previous audio block; computing, by the at least one processor, for each of the horizontally flipped version and the vertically flipped version, a boundary continuity measure based on at least one of slope continuity and value continuity at a presentation-time boundary; and selecting, by the at least one processor, a candidate flipped audio block, whichever of the horizontally flipped version and the vertically flipped version has a greater boundary continuity measure.
7 . The method of claim 1 , wherein the detecting that normal audio returns comprises determining that a current audio block is available for scheduled playback with a presentation timestamp or sequence index equal to an expected successor of the previous audio block according to a block duration.
8 . The method of claim 1 , wherein the crossfading from the replayed signal to the normal audio is initiated at a presentation-time block boundary.
9 . The method of claim 1 , further comprising generating, by the at least one processor, a flipped version of the previous audio block to form the replayed signal; and applying, by the at least one processor, a glitch filter to the replayed signal after generating the flipped version and before initiating the crossfade to the normal audio.
10 . The method of claim 1 , wherein computing the energy ratio comprises computing, over a duration of the previous audio block, a sum of absolute values of samples of the bandlimited-derivative filtered version divided by a sum of absolute values of samples of the unfiltered version.
11 . A system, comprising: at least one processor, and a non-transitory memory storing computer code; wherein the at least one processor is configured to execute the computer code that cases the at least one processor to: compute, for a previous audio block, an energy ratio between a bandlimited-derivative filtered version of the previous audio block and an unfiltered version of the previous audio block; determine an octave-indexed frequency band based on the energy ratio; set a maximum number of repeat presentations inversely correlated to the octave-indexed frequency band; upon detecting that normal audio returns before the maximum number of repeat presentations is reached, crossfade from a replayed signal, comprising the previous audio block, to the normal audio; and upon failing to detect that normal audio returns before the maximum number of repeat presentations is reached, fade the replayed signal to zero amplitude over a fade-out interval.
12 . The system of claim 11 , wherein the at least one processor is configured to determine the octave-indexed frequency band by slicing the energy ratio into eight indexes corresponding to octave ranges calibrated by a stepped frequency tone.
13 . The system of claim 11 , wherein the at least one processor is r configured to compute the bandlimited-derivative filtered version by applying a four-tap derivative filter having coefficients [−1, 1, 1, −1].
14 . The system of claim 11 , wherein the at least one processor is further configured to select a glitch filter from a set of octave-indexed coefficient sets according to the octave-indexed frequency band, and select a crossfade duration for transitioning from the replayed signal to the normal audio.
15 . The system of claim 11 , wherein the at least one processor is further configured to measure autocorrelation within a previous audio block to determine a relevance metric, and wherein the at least one processor is configured to set the maximum number of repeat presentations by increasing the maximum number of repeat presentations when the relevance metric indicates lower frequency content and decreasing the maximum number of repeat presentations when the relevance metric indicates higher frequency content.
16 . The system of claim 15 , wherein the at least one processor is further configured to: generate a horizontally flipped version and a vertically flipped version of the previous audio block; compute, for each of the horizontally flipped version and the vertically flipped version, a boundary continuity measure based on at least one of slope continuity and value continuity at a presentation-time boundary; and select a candidate flipped audio block, whichever of the horizontally flipped version and the vertically flipped version has a greater boundary continuity measure.
17 . The system of claim 11 , wherein the at least one processor is configured to detect that normal audio returns by determining that a current audio block is available for scheduled playback with a presentation timestamp or sequence index equal to an expected successor of the previous audio block according to a block duration.
18 . The system of claim 11 , wherein the at least one processor is configured to crossfade from the replayed signal to the normal audio that is initiated at a presentation-time block boundary.
19 . The system of claim 11 , wherein the at least one processor is further configured to generate a flipped version of the previous audio block to form the replayed signal and apply a glitch filter to the replayed signal after generating the flipped version and before initiating the crossfade to the normal audio.
20 . The system of claim 11 , wherein the at least one processor is configured to compute the energy ratio by computing, over a duration of the previous audio block, a sum of absolute values of samples of the bandlimited-derivative filtered version divided by a sum of absolute values of samples of the unfiltered version.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 19/315,482, filed Aug. 30, 2025, which is a continuation of U.S. application Ser. No. 18/196,319 filed May 11, 2023, now U.S. Pat. No. 12,407,983, which claims priority to and the benefit of U.S. Provisional Application No. 63/340,903, filed May 11, 2022, each of which are incorporated herein by reference in their entireties. FIELD OF THE DISCLOSURE The present disclosure is related generally to the wireless distribution of high-quality audio signals and, in particular to a system and methods of distributing high-bitrate, multichannel, audio wirelessly while maintaining low latency. BACKGROUND Key to a good wireless audio customer experience is a robust low latency wireless link. Low Latency audio is desirable for enabling good audio to video synchronization (or Lip Sync) because this is compatible with a broad range of televisions. If the wireless link has high latency then it will not work with low latency televisions because the audio cannot be advanced to match the video. On the other hand, a low latency wireless link will work with both low and high latency TVs as the transmitted audio can always be delayed to match the video. SUMMARY The present disclosure provides for novel systems and methods of audio transmission that alleviate shortcomings in the art, and provide novel mechanisms for resolving discontinuities in audio data. There are times in which the wireless medium is busy and the transmitter does not have an opportunity to transmit audio. If the busy duration of the medium exceeds the latency requirements of the system, then this audio will be delayed past the point in time when it is scheduled to be played. This delayed audio may be dropped at the transmitter, if possible, or it may be dropped when received at the receiver. In either case, there may be a block or blocks of audio data that may advantageously be concealed at the receiver. The present disclosure provides systems, methods, and audio devices for concealing discontinuities in wireless audio playback. In one embodiment, an audio device includes a wireless receiver and a replay buffer. The replay buffer stores audio blocks and detects a discontinuity in the sequence. A replacement audio block is generated by flipping time indices of a stored audio block and conditionally applying a vertical flip based on slope continuity. The replacement block is filtered using a glitch filter with coefficients selected according to frequency content of the stored block, and crossfaded with the stored block to produce output audio that conceals the discontinuity. In various embodiments, an audio device includes a wireless receiver configured to obtain a sequence of audio blocks from a source device, a replay buffer configured to store the sequence of audio blocks and detect discontinuities in the sequence. When a discontinuity is detected, the audio device may retrieve a stored audio block preceding the discontinuity, and generate a replacement audio block by performing at least one of: (i) flipping the time indices of the stored audio block, or (ii) vertically flipping the replacement audio block based on slope continuity. The replacement audio block may be further processed with an adaptive glitch filter having frequency coefficients based on the frequency content of the stored block. The processed replacement audio block is crossfaded with the stored audio block to generate smooth output audio that conceals the discontinuity. Further embodiments provide for adaptive filtering and crossfade-to-zero functionality when a maximum number of replacement audio blocks is output without receipt of normal audio, synchronization of playback timing across multiple audio devices based on a master clock, and integration of the discontinuity concealment functions into televisions, soundbars, gaming consoles, wireless speakers, earbuds, or other consumer audio systems. Additional embodiments are directed to methods and devices with a processor that execute instructions from memory to implement these processes, thereby ensuring robust, synchronized, and uninterrupted wireless audio playback in practical environments. Embodiments further provide for adaptive filtering and crossfade-to-zero functionality when a maximum number of replacement audio blocks is output without receipt of normal audio, synchronization of playback timing across multiple audio devices based on a master clock, and integration of the discontinuity concealment functions into televisions, soundbars, gaming consoles, wireless speakers, earbuds, or other consumer audio systems. Additional embodiments are directed to methods and non-transitory computer-readable media storing instructions to implement these processes, thereby ensuring robust, synchronized, and uninterrupted wireless audio playback in practical environments. Other embodiments of systems and methods include steps for concealing un-r