Search

US-20260128056-A1 - AUDIO PROCESSING FOR DETECTING OCCURRENCES OF LOUD SOUND CHARACTERIZED BY BRIEF AUDIO BURSTS

US20260128056A1US 20260128056 A1US20260128056 A1US 20260128056A1US-20260128056-A1

Abstract

A boundary of a highlight of audiovisual content depicting an event is identified. The audiovisual content may be a broadcast, such as a television broadcast of a sporting event. The highlight may be a segment of the audiovisual content deemed to be of particular interest. Audio data for the audiovisual content is stored, and the audio data is automatically analyzed to detect one or more audio events indicative of one or more occurrences to be included in the highlight. Each audio event may be a brief, high-energy audio burst such as the sound made by a tennis serve. A time index within the audiovisual content, before or after the audio event, may be designated as the boundary, which may be the beginning or end of the highlight.

Inventors

  • Mihailo Stojancic
  • Warren Packard

Assignees

  • STATS LLC

Dates

Publication Date
20260507
Application Date
20251217

Claims (20)

  1. 1 . A computer-implemented method for generating a highlight video based on detected audio events, the method comprising: receiving, by one or more processors, audiovisual content; processing, by the one or more processors, the audiovisual content to identify at least a first audio event and a second audio event, wherein the first audio event and the second audio event are identified by: computing a normalized magnitude for a plurality of window samples at a plurality of respective positions of a time analysis window across the audiovisual content, and detecting high energy audio bursts based on the computed normalized magnitudes; extracting a first audiovisual highlight associated with the first audio event from the audiovisual content, wherein the first audiovisual highlight includes first metadata associated with an event; extracting a second audiovisual highlight associated with the second audio event from the audiovisual content, wherein the second audiovisual highlight includes second metadata associated with the event; generating a highlight video including the first audiovisual highlight and the second audiovisual highlight; and transmitting the highlight video to a user device.
  2. 2 . The method of claim 1 , wherein the first metadata includes at least one of a phase of the event, a time of the event, and a score of the event.
  3. 3 . The method of claim 2 , wherein the second metadata includes at least one of a phase of the event, a time of the event and a score of the event.
  4. 4 . The method of claim 1 , wherein the first metadata includes an excitement level.
  5. 5 . The method of claim 4 , wherein the excitement level changes over the course of the first audiovisual highlight.
  6. 6 . The method of claim 4 , wherein the excitement level of the first audiovisual highlight is greater than a threshold excitement level.
  7. 7 . The method of claim 1 , wherein the user device is a smartphone.
  8. 8 . The method of claim 2 , wherein the first metadata further includes boundaries indicating a beginning and an end to the first audiovisual highlight.
  9. 9 . The method of claim 1 , wherein the first audiovisual highlight is included in the highlight video based on a viewing history of a user of the user device.
  10. 10 . A system for generating a highlight video based on detected audio events, comprising: a processor; and a memory having programming instructions stored thereon, which, when executed by the processor, performs one or more operations, comprising: receiving, by one or more processors, audiovisual content; processing, by the one or more processors, the audiovisual content to identify at least a first audio event and a second audio event, wherein the first audio event and the second audio event are identified by: computing a normalized magnitude for a plurality of window samples at a plurality of respective positions of a time analysis window across the audiovisual content, and detecting high energy audio bursts based on the computed normalized magnitudes; extracting a first audiovisual highlight associated with the first audio event from the audiovisual content, wherein the first audiovisual highlight includes first metadata associated with an event; extracting a second audiovisual highlight associated with the second audio event from the audiovisual content, wherein the second audiovisual highlight includes second metadata associated with the event; generating a highlight video including the first audiovisual highlight and the second audiovisual highlight; and transmitting the highlight video to a user device.
  11. 11 . The system of claim 10 , wherein the first metadata includes at least one of a phase of the event, a time of the event and a score of the event.
  12. 12 . The system of claim 11 , wherein the first metadata includes an excitement level.
  13. 13 . The system of claim 12 , wherein the excitement level changes over the course of the first audiovisual highlight.
  14. 14 . The system of claim 10 , wherein the user device is a smartphone.
  15. 15 . The system of claim 10 , wherein the first metadata further includes boundaries indicating a beginning and an end to the first audiovisual highlight.
  16. 16 . The system of claim 10 , wherein the first audiovisual highlight is included in the highlight video based on a viewing history of a user of the user device.
  17. 17 . A non-transitory computer readable medium including one or more sequences of instructions that, when executed by one or more processors, causes: receiving, by one or more processors, audiovisual content; processing, by the one or more processors, the audiovisual content to identify at least a first audio event and a second audio event, wherein the first audio event and the second audio event are identified by: computing a normalized magnitude for a plurality of window samples at a plurality of respective positions of a time analysis window across the audiovisual content, and detecting high energy audio bursts based on the computed normalized magnitudes; extracting a first audiovisual highlight associated with the first audio event from the audiovisual content, wherein the first audiovisual highlight includes first metadata associated with an event; extracting a second audiovisual highlight associated with the second audio event from the audiovisual content, wherein the second audiovisual highlight includes second metadata associated with the event; generating a highlight video including the first audiovisual highlight and the second audiovisual highlight; and transmitting the highlight video to a user device.
  18. 18 . The non-transitory computer readable medium of claim 17 , wherein the first metadata includes at least one of a phase of the event, a time of the event and a score of the event.
  19. 19 . The non-transitory computer readable medium of claim 17 , wherein the first metadata includes an excitement level.
  20. 20 . The non-transitory computer readable medium of claim 17 , wherein the user device is a smartphone.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present application is a continuation of, and claims the benefit of priority to, U.S. application Ser. No. 18/421,178, filed on Jan. 24, 2024, which is a continuation of, and claims the benefit of priority to, U.S. application Ser. No. 17/681,115, filed on Feb. 25, 2022, now U.S. Pat. No. 11,922,968, issued on Mar. 5, 2024, which is a continuation of, and claims the benefit of priority to, U.S. application Ser. No. 16/553,025, filed Aug. 27, 2019, now U.S. Pat. No. 11,264,048, issued on Mar. 1, 2022, which is a continuation-in-part of U.S. application Ser. No. 16/440,229, filed on Jun. 13, 2019, and a continuation-in-part of U.S. application Ser. No. 16/421,391, filed May 23, 2019. U.S. application Ser. No. 16/440,229, filed on Jun. 13, 2019, claims the benefit of priority to U.S. Provisional Ser. No. 62/712,041, filed on Jul. 30, 2018, and U.S. Provisional Ser. No. 62/746,454, filed on Oct. 16, 2018. U.S. application Ser. No. 16/421,391, filed on May 23, 2019, claims the benefit of U.S. Provisional Ser. No. 62/680,955, filed on Jun. 5, 2018, U.S. Provisional Ser. No. 62/712,041, filed on Jul. 30, 2018, and U.S. Provisional Ser. No. 62/746,454, filed on Oct. 16, 2018, all of which are incorporated herein by reference in their entireties. TECHNICAL FIELD The present document relates to techniques for identifying multimedia content and associated information on a television device or a video server delivering multimedia content, and enabling embedded software applications to utilize the multimedia content to provide content and services synchronous with that multimedia content. Various embodiments relate to methods and systems for providing automated audio analysis to identify and extract information from television programming content depicting sporting events, so as to create metadata associated with video highlights for in-game and post-game viewing. DESCRIPTION OF THE RELATED ART Enhanced television applications such as interactive advertising and enhanced program guides with pre-game, in-game and post-game interactive applications have long been envisioned. Existing cable systems that were originally engineered for broadcast television are being called on to support a host of new applications and services including interactive television services and enhanced (interactive) programming guides. Some frameworks for enabling enhanced television applications have been standardized. Examples include the OpenCable™ Enhanced TV Application Messaging Specification, as well as the Tru2way specification, which refer to interactive digital cable services delivered over a cable video network and which include features such as interactive program guides, interactive ads, games, and the like. Additionally, cable operator “OCAP” programs provide interactive services such as e-commerce shopping, online banking, electronic program guides, and digital video recording. These efforts have enabled the first generation of video-synchronous applications, synchronized with video content delivered by the programmer/broadcaster, and providing added data and interactivity to television programming. Recent developments in video/audio content analysis technologies and capable mobile devices have opened up an array of new possibilities in developing sophisticated applications that operate synchronously with live TV programming events. These new technologies and advances in audio signal processing and computer vision, as well as improved computing power of modern processors, allow for real-time generation of sophisticated programming content highlights accompanied by metadata that are currently lacking in the television and other media environments. SUMMARY A system and method are presented to enable automatic real-time processing of audio signals extracted from sporting event television programming content, for detecting, selecting, and tracking short bursts of high-energy audio events, such as tennis ball hits in a tennis match. In at least one embodiment, initial audio signal analysis is performed in the time domain, so as to detect short bursts of high-energy audio and generate an indicator of potential occurrence of audio events of interest. In at least one embodiment, detected time-domain audio events are further processed and revised by invoking consideration of spectral characteristics of the audio signal in the neighborhood of detected time-domain audio events. A spectrogram is constructed for the analyzed audio signal, and pronounced spectral magnitude peaks are extracted by maximum magnitude suppression in a sliding 2-D diamond-shaped time-frequency area filter. In addition, a spectrogram time-spread range is constructed around audio event points previously obtained by the time-domain analysis, and a qualifier for each audio event point is established by counting spectral magnitude peaks in this time-spread range. The time-spread range can be established in any of a multitude of wa