CA-3100788-C - VIDEO PROCESSING FOR EMBEDDED INFORMATION CARD LOCALIZATION AND CONTENT EXTRACTION

CA3100788CCA 3100788 CCA3100788 CCA 3100788CCA-3100788-C

Abstract

Metadata for one or more highlights of a video stream may be extracted from one or more card images embedded in the video stream. The highlights may be segments of the video stream, such as a broadcast of a sporting event, that are of particular interest. According to one method, video frames of the video stream are stored. One or more information cards embedded in a decoded video frame may be detected by analyzing one or more predetermined video frame regions. Image segmentation, edge detection, and/or closed contour identification may then be performed on identified video frame regions. Further processing may include obtaining a minimum rectangular perimeter area enclosing all remaining segments, which may then be further processed to determine precise boundaries of information cards. The card images may be analyzed to obtain metadata, which may be stored in association with at least one of the video frames.

Inventors

Mihailo Stojancic
Warren Packard
Dennis Kanygin

Assignees

Thuuz, Inc.

Dates

Publication Date: 20260505
Application Date: 20190515
Priority Date: 20190514

Claims (20)

CLAIMS What is claimed is: 1. A method for extracting metadata from a video stream, the method comprising: at a data store, storing video frames of the video stream; at a processor, automatically identifying and extracting a card image embedded in at least one video frame of the video frames by: determining whether a predetermined location of the card image is known based on a network associated with the video stream; based on determining that the predetermined location is known, processing the predetermined location, within the video frame, that defines a video frame region containing the card image; and sequentially processing a plurality of regions of the video frame to identify the video frame region containing the card image; at the processor, analyzing the card image to obtain metadata; and at a data store, storing the metadata in association with at least one of the video frames.
2. The method of claim 1, wherein: the video stream comprises a broadcast of a sporting event; the video frames constitute a highlight deemed to be of particular interest to one or more users; and the metadata is descriptive of a status of the sporting event during the highlight.
3. The method of claim 2, further comprising, at an output device, presenting the metadata during viewing of the highlight.
4. The method of claim 3, wherein automatically identifying and extracting a card image and analyzing the card image to obtain the metadata are carried out, for a highlight, during viewing of the highlight. Date Re9ue/Date Received 2024-04-29
5. The method of claim 1, further comprising localizing and extracting the card image from the video frame region.
6. The method of claim 5, wherein localizing and extracting the card image from the video frame region comprises cropping the video frame to isolate the video frame region.
7. The method of claim 5, wherein localizing and extracting the card image from the video frame region comprises: segmenting the video frame region, or a processed version of the video frame region, to generate a segmented image; and modifying pixel values of segments adjacent to boundaries of the segmented image.
8. The method of claim 5, wherein localizing and extracting the card image from the video frame region comprises removing background from the video frame region or a processed version of the video frame region.
9. The method of claim 5, wherein localizing and extracting the card image from the video frame region comprises: generating an edge image based on the video frame region with removed background; finding contours in the edge image; approximating the contours as polygons; and extracting a region enclosed by a minimum rectangular perimeter encompassing all of the contours to generate a perimeter rectangular image.
10. The method of claim 9, further comprising iteratively: counting color-modified pixels for each edge of the perimeter rectangular image; and moving any boundary edge, with a number of color-modified pixels exceeding a threshold, inward.
11. The method of claim 10, further comprising validating a quadrilateral detected within the region by: Date Re9ue/Date Received 2024-04-29 counting: a first number of pixels in the video frame region; a second number of pixels in a perimeter rectangular image; and a third number of pixels in an adjusted perimeter rectangular image; and comparing the first number, the second number, and the third number to determine whether an assumed quadrilateral within the region is viable.
12. The method of claim 5, wherein localizing and extracting the card image from the video frame region comprises adjusting a left boundary of the card image.
13. A non-transitory computer-readable medium for extracting metadata from a video stream, comprising instructions stored thereon, that when executed by a processor, perform the steps of: causing a data store to store video frames of the video stream; automatically identifying and extracting a card image embedded in at least one video frame of the video frames by: determining whether a predetermined location of the card image is known based on a network associated with the video stream; based on determining that the predetermined location is known, processing the predetermined location, within the video frame, that defines a video frame region containing the card image; and sequentially processing a plurality of regions of the video frame to identify the video frame region containing the card image; analyzing the card image to obtain metadata; and causing the data store to store the metadata in association with at least one of the video frames.
14. The non-transitory computer-readable medium of claim 13, wherein: the video stream comprises a broadcast of a sporting event; Date Re9ue/Date Received 2024-04-29 the video frames constitute a highlight deemed to be of particular interest to one or more users; and the metadata is descriptive of a status of the sporting event during the highlight.
15. The non-transitory computer-readable medium of claim 14, further comprising instructions stored thereon, that when executed by a processor, cause an output device to present the metadata during viewing of the highlight.
16. The non-transitory computer-readable medium of claim 15, wherein automatically identifying and extracting a card image and analyzing the card image to obtain the metadata are carried out, for a highlight, during viewing of the highlight.
17. The non-transitory computer-readable medium of claim 13, further comprising instructions stored thereon, that when executed by a processor, localize and extract the card image from the video frame region.
18. The non-transitory computer-readable medium of claim 17, wherein localizing and extracting the card image from the video frame region comprises cropping the video frame to isolate the video frame region.
19. The non-transitory computer-readable medium of claim 17, wherein localizing and extracting the card image from the video frame region comprises: segmenting the video frame region, or a processed version of the video frame region, to generate a segmented image; and modifying pixel values of segments adjacent to boundaries of the segmented image.
20. The non-transitory computer-readable medium of claim 17, wherein localizing and extracting the card image from the video frame region comprises removing background from the video frame region or a processed version of the video frame region. Date Re9ue/Date Received 2024-04-29

Description

[0017] VIDEO PROCESSING FOR EMBEDDED INFORMATION CARD LOCALIZATION AND CONTENT EXTRACTION TECHNICAL FIELD The present document relates to techniques for identifying multimedia content and associated information on a television device or a video server delivering multimedia content, and enabling embedded software ap-plications to utilize the multimedia content to provide content and services synchronous with that multimedia content. Various embodiments relate to methods and systems for providing automated video and audio analysis that are used to identify and extract information in sports television video content, and to create metadata associated with video highlights for in-game and post-game reviewing of the sports television video content. DESCRIPTION OF THE RELATED ART [0018] Enhanced television applications such as interactive advertising and enhanced program guides with pre-game, in-game and post-game inter-active applications have long been envisioned. Existing cable systems that were originally engineered for broadcast television are being called on to support a host of new applications and services including interactive television services and enhanced (interactive) programming guides. [0019] Some frameworks for enabling enhanced television applications have been standardized. Examples include the OpenCable™ Enhanced TV Application Messaging Specification, as well as the Tru2way specification, which refer to interactive digital cable Date Re9ue/Date Received 2024-04-29 services delivered over a cable video network and which include features such as interactive program guides, interactive ads, games, and the like. Additionally, cable operator "OCAP" programs provide interactive services such as e-commerce shop-ping, online banking, electronic program guides, and digital video recording. These efforts have enabled the first generation of video-synchronous applications, synchronized with video content delivered by the programmer/ broadcaster, and providing added data and interactivity to television programmmg. [0020] Recent developments in video/ audio content analysis technologies and capable mobile devices have opened up an array of new possibilities in developing sophisticated applications that operate synchronously with live TV programming events. These new technologies and advances in computer vision and video processing, as well as improved computing power of mod-em processors, allow for real-time generation of sophisticated programming content highlights accompanied by metadata. SUMMARY [0021] Methods and systems are presented for automatically finding the location of an information card ("card image"), such as an information score board, in a video frame, or multiple video frames, in sports television broadcast programming. Also described are methods and systems for identifying text strings within various fields of the localized card image, and reading and interpreting textual information from various fields of the localized card image. Date Re9ue/Date Received 2024-04-29 [0022] In at least one embodiment, the card image detection, localization, and reading are performed synchronously with respect to presentation of sports television programming content. In at least one embodiment, an automated process is provided for receiving a digital video stream, analyzing one or more frames of the digital video stream, and automatically detecting and localizing card image quadrilaterals. In another embodiment, an auto-mated process is provided for analyzing one or more localized card images, recognizing and extracting text strings (for example, in text boxes), and reading information from extracted text boxes. [0023] In yet another embodiment, detected text strings associated with particular fields within the card image are interpreted, thus providing immediate in-game information related to the content of the televised broadcast of the sporting event. The extracted in-frame information may be used to generate metadata related to automatically created custom video content as a set of highlights of broadcast television programming content associated with audiovisual and textual data. [0024] In at least one embodiment, a method for extracting metadata from a video stream may include storing video frames of the video stream in a data store. At a processor, a card image embedded in at least one of the video frames may be automatically identified and extracted by determining whether a predetermined location of the card image is known based on a network associated with the video stream. Based on determining that the predetermined location is known, processing the predetermined location within the video frame, that defines a video frame region containing the card image, and sequentially processing a plurality of regions of the video frame to identify the video frame region containing the card image. At the Date Re9ue/Date Received 2024-04-29 processor, the card image may be analyzed to obtain metadata, which ma