US-12620379-B1 - Systems, devices, and methods for dynamic synchronization of a prerecorded vocal backing track to a live vocal performance

US12620379B1US 12620379 B1US12620379 B1US 12620379B1US-12620379-B1

Abstract

Disclosed are systems, methods, and devices, that overcome timing and self-expression limitations experienced by vocalists when using prerecorded vocal backing tracks to enhance live performances. The disclosed system, devices, and methods, dynamically synchronizes prerecorded vocal backing tracks with a live vocal stream by extracting vocal elements, such as phonemes, vector embeddings, or vocal audio spectra, from the live vocal performance in real-time. These extracted vocal elements are matched against corresponding timestamped vocal elements previously derived from the prerecorded vocal backing track, enabling precise real-time adjustment and alignment of the backing track timing to the live performance. Additionally, the system enhances expressive performance by identifying prosody factors, such as pitch, vibrato, accent, stress, dynamics, and level, in the live vocal performance, and dynamically adjusting corresponding prerecorded prosody factors within predefined ranges. This maintains naturalness and spontaneity in the vocalist's live performance, overcoming traditional limitations associated with prerecorded vocal backing tracks.

Inventors

Clayton Janes

Assignees

Eidol Corporation

Dates

Publication Date: 20260505
Application Date: 20251006

Claims (20)

1 . A method, comprising: identifying and extracting vocal elements in realtime from a live vocal stream, by at least one of one or more processors, the live vocal stream digitally representing a live vocal performance; dynamically controlling timing of a prerecorded vocal backing track in realtime using the vocal elements extracted from the live vocal stream matched to timestamped vocal elements from the prerecorded vocal backing track by at least one of the one or more processors; and outputting a resulting dynamically controlled prerecorded vocal backing track in realtime that is time-synchronized to the live vocal stream.
2 . The method of claim 1 , further comprising: capturing the live vocal performance to produce the live vocal stream.
3 . The method of claim 1 , further comprising: capturing the live vocal performance using analog-to-digital conversion.
4 . The method of claim 1 , further comprising: preprocessing the prerecorded vocal backing track before the live vocal performance by identifying, extracting, and time stamping backing track vocal elements, creating the timestamped vocal elements.
5 . The method of claim 1 , wherein: dynamically controlling the timing of the prerecorded vocal backing track in realtime includes using time compression and expansion of the prerecorded vocal backing track based on timing differences between the vocal elements extracted from the live vocal stream and the timestamped vocal elements from the prerecorded vocal backing track.
6 . The method of claim 1 , wherein: the timestamped vocal elements include timestamped phonemes; the vocal elements include phonemes; identifying and extracting the phonemes from the live vocal stream in realtime; and dynamically controlling the timing of the prerecorded vocal backing track in realtime using the phonemes extracted from the live vocal stream matched to the timestamped phonemes from the prerecorded vocal backing track.
7 . The method of claim 1 , wherein: the timestamped vocal elements include timestamped vector embeddings; the vocal elements include vector embeddings; identifying and extracting the vector embeddings from the live vocal stream in realtime; and dynamically controlling the timing of the prerecorded vocal backing track in realtime using the vector embeddings extracted from the live vocal stream matched to the timestamped vector embeddings from the prerecorded vocal backing track.
8 . The method of claim 1 , wherein: the timestamped vocal elements include timestamped vocal audio spectra; the vocal elements include vocal audio spectra; identifying and extracting the vocal audio spectra from the live vocal stream in realtime; and dynamically controlling the timing of the prerecorded vocal backing track in realtime using the vocal audio spectra extracted from the live vocal stream matched to the timestamped vocal audio spectra from the prerecorded vocal backing track.
9 . The method of claim 1 , wherein: the timestamped vocal elements include timestamped two or more types of vocal elements; the vocal elements include two or more types of vocal elements; identifying and extracting the two or more types of vocal elements from the live vocal stream in realtime; and dynamically controlling the timing of the prerecorded vocal backing track in realtime using the two or more types of vocal elements extracted from the live vocal stream matched to the timestamped two or more types of vocal elements from the prerecorded vocal backing track.
10 . The method of claim 9 , further comprising: obtaining a confidence weight by comparing the two or more types of vocal elements to the timestamped two or more types of vocal elements by at least one of the one or more processors; and dynamically controlling the timing of the prerecorded vocal backing track based at least in part whether the confidence weight is above or below a predetermined confidence threshold by at least one of the one or more processors.
11 . A system, comprising: a tangible medium that includes non-transitory computer-readable instructions that, when applied to one or more processors, instructs the one or more processors to perform a method comprising: (a) identifying and extracting vocal elements in realtime from a live vocal stream by at least one of the one or more processors, the live vocal stream digitally representing a live vocal performance; dynamically controlling timing of a prerecorded vocal backing track in realtime using the vocal elements extracted from the live vocal stream matched to timestamped vocal elements from the prerecorded vocal backing track by at least one of the one or more processors; and outputting a resulting dynamically controlled prerecorded vocal backing track in realtime that is time-synchronized to the live vocal stream.
12 . The system of claim 11 , further comprising: the one or more processors.
13 . The system of claim 11 , further comprising: the one or more processors; and an analog-to-digital converter structured to digitally represent the live vocal performance as the live vocal stream.
14 . The system of claim 11 , wherein: the tangible medium instructs at least one of the one or more processors to dynamically control the timing of the prerecorded vocal backing track in realtime using time compression and expansion of the prerecorded vocal backing track based on timing differences between the vocal elements extracted from the live vocal stream and the timestamped vocal elements from the prerecorded vocal backing track.
15 . The system of claim 11 , wherein: the timestamped vocal elements include timestamped phonemes; the vocal elements include phonemes; the tangible medium instructs at least one of the one or more processors to identify and extract the phonemes from the live vocal stream; and the tangible medium instructs at least one of the one or more processors to dynamically control the timing of the prerecorded vocal backing track in realtime using the phonemes extracted from the live vocal stream matched to the timestamped phonemes from the prerecorded vocal backing track.
16 . The system of claim 11 , wherein: the timestamped vocal elements include timestamped vector embeddings; the vocal elements include vector embeddings; and the tangible medium further instructs at least one of the one or more processors to dynamically controlling the timing of the prerecorded vocal backing track in realtime using the vector embeddings extracted from the live vocal stream matched to timestamped vector embeddings from the prerecorded vocal backing track.
17 . The system of claim 11 , wherein: the timestamped vocal elements include timestamped vocal audio spectra; the vocal elements include vocal audio spectra; and the tangible medium further instructs at least one of the one or more processors to dynamically controlling the timing of the prerecorded vocal backing track in realtime using the vocal audio spectra extracted from the live vocal stream matched to timestamped vocal audio spectra from the prerecorded vocal backing track.
18 . The system of claim 11 , wherein: the timestamped vocal elements include timestamped two or more types of vocal elements; the vocal elements include two or more types of vocal elements; and the tangible medium further instructs at least one of the one or more processors to dynamically control the timing of the prerecorded vocal backing track in realtime using the two or more types of vocal elements matched to the timestamped two or more types of vocal elements from the prerecorded vocal backing track.
19 . The system of claim 18 , wherein: the tangible medium further instructs at least one of the one or more processors to obtain a confidence weight by comparing the two or more types of vocal elements to the timestamped two or more types of vocal elements; and the tangible medium further instructs at least one of the one or more processors to dynamically control the timing of the prerecorded vocal backing track based on at least in part whether the confidence weight is above or below a predetermined confidence threshold.
20 . A method, comprising: delaying a live vocal performance through a broadcast delay resulting in a delayed live performance signal by at least one of one or more processors; identifying and extracting vocal elements in realtime from the delayed live performance signal by at least one of the one or more processors; dynamically controlling timing of a prerecorded vocal backing track using the vocal elements extracted from the delayed live performance signal matched to timestamped vocal elements from the prerecorded vocal backing track by at least one of the one or more processors; and outputting a resulting dynamically controlled prerecorded vocal backing track in that is time-synchronized to the delayed live performance signal.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 19/182,460, filed on Apr. 17, 2025. The entire contents of U.S. patent application Ser. No. 19/182,460 are hereby incorporated by reference. BACKGROUND Audience enjoyment of live music often hinges on the quality and consistency of the vocalist's performance. Even seasoned professionals frequently encounter various challenges during live performances. These challenges may include vocal strain from rigorous touring schedules, age-related changes in vocal range and stamina, lifestyle factors impacting vocal health, fatigue from travel and from consecutive performances, and illness adversely impacting vocal quality. Such challenges may significantly diminish a vocalist's overall performance quality, undermining their confidence and detracting from the audience experience. To address such performance challenges, performing artists may utilize prerecorded vocal backing tracks. A prerecorded vocal backing track is a previously captured recording of a vocalist's performance, intended to support, supplement, or entirely replace segments of their live vocal performance. Typically, such tracks are recorded in controlled settings, such as professional recording studios, to ensure optimal vocal quality. During live performances, a playback engineer manually cues and initiates playback of the prerecorded vocal backing track at precise moments. The front-of-house audio engineer subsequently mixes the prerecorded vocal backing track with the live vocal signal during selected portions of the performance, occasionally substituting the prerecorded track entirely for specific song segments. In scenarios where a prerecorded vocal backing track fully replaces or significantly supplements live vocals, the vocalist often must mime or “lip-sync” their performance so it visually aligns with the prerecorded vocal track. SUMMARY The Inventor, through extensive experience in performance technology for major touring acts, has identified significant drawbacks in current prerecorded vocal backing track usage. First, while the prerecorded vocal backing track is in use, the vocalist's timing is critical. The vocalist needs to carefully mime or mimic the performance and make sure that their lip and mouth movements follow the prerecorded vocal backing track. Second, when the prerecorded vocal backing track is used to replace segments of a vocalist's live singing, unique nuances of their live performance, such as deliberate changes in timing, pitch, vibrato, and emphasis, are lost. The Inventor's systems, devices, and methods, overcome the timing issues discussed above by dynamically controlling timing of a prerecorded vocal backing track in realtime, so it is time-synchronized to the live vocal performance. They overcome the self-expression issue by identifying prosody factors such as vibrato, accent, stress, and level (loudness or volume) in the live vocal performance. These prosody factors are then applied, within a preset range, to corresponding prosody factors in the prerecorded vocal backing track in realtime. The prerecorded vocal backing track is preprocessed, before the live vocal performance, to identify, extract, and timestamp vocal elements such as phonemes, vector embeddings, or vocal audio spectra. The system may also identify, extract, and timestamp prosody parameters such as level, vibrato, accent, pitch, and stress. Unlike music learning and practice systems that perform tempo matching (i.e., detect and match musical beats measured in beats/minute), timestamping vocal elements as described within this disclosure, allows for precision alignment of vocals within a prerecorded vocal backing track in realtime (i.e., approximately 30 milliseconds or less). This allows timestamping, as described in this disclosure, sufficient for miming or lip syncing in a live performance venue. Before the live performance, the prerecorded vocal backing track along with the timestamped vocal elements that were extracted from the prerecorded vocal backing track, are preloaded into a vocal backing track synchronization unit. During the live vocal performance, the vocal backing track synchronization unit aligns the prerecorded vocal backing track to match the timing of the live vocal performance in realtime. It does so by extracting and identifying vocal elements from the live vocal performance as they occur. It then matches the extracted live stream vocal elements to the timestamped vocal elements. Typically, this extraction, matching, and alignment process may be accomplished using a machine-learning predictive algorithm. With the timestamped vocal elements matched, a dynamic synchronization engine, or algorithm, time compresses or expands the vocal elements within the prerecorded vocal backing track to match the timing of the corresponding vocal elements in the live vocal performance. This entire process may take place in realtime (i.e.,