US-20260127871-A1 - Logo Recognition In Images And Videos

US20260127871A1US 20260127871 A1US20260127871 A1US 20260127871A1US-20260127871-A1

Abstract

Accurately detection of logos in media content on media presentation devices is addressed. Logos and products are detected in media content produced in retail deployments using a camera. Logo recognition uses saliency analysis, segmentation techniques, and stroke analysis to segment likely logo regions. Logo recognition may suitably employ feature extraction, signature representation, and logo matching. These three approaches make use of neural network based classification and optical character recognition (OCR). One method for OCR recognizes individual characters then performs string matching. Another OCR method uses segment level character recognition with N-gram matching. Synthetic image generation for training of a neural net classifier and utilizing transfer learning features of neural networks are employed to support fast addition of new logos for recognition.

Inventors

Jose Pio Pereira
Kyle Brocklehurst
Sunil Suresh Kulkarni
Peter Wendt

Assignees

GRACENOTE, INC.

Dates

Publication Date: 20260507
Application Date: 20250813

Claims (20)

1 . A method to detect a logo in images in video frames selected from a video stream, comprising: applying a saliency analysis and segmentation of selected regions in a selected video frame to determine segmented likely logo regions; processing the segmented likely logo regions using at least two of three techniques to generate correspond matches, the three techniques including a first technique involving feature matching using correlation to generate a first match, a second technique involving neural network classification using a convolutional neural network to generate a second match, and a third technique involving text recognition using character segmentation and string matching to generate a third match; and deciding a most likely logo match by combining results from the generated matches that correspond to the at least two of three techniques.
2 . The method of claim 1 , wherein the saliency analysis comprises: applying a discrete cosine transform (OCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each segmented likely logo region.
3 . The method of claim 1 , wherein saliency detection comprises: applying a discrete cosine transform (DCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each likely logo region; and measuring multi-scale similarity at two higher scales and a smaller scale of the spectral saliency of each likely logo region.
4 . The method of claim 3 , wherein the multi-scale similarity measures include orientation gradient histograms, hue, saturation, value (HSV) histograms, and stroke width transform (SWT) statistics which include total number of strokes, number of horizontal strokes, number of vertical strokes, stroke density, and number of loops.
5 . The method of claim 1 , wherein segmentation comprises: applying a stroke width transform (SWT) analysis to the selected regions to generate SWT statistics; applying a graph based segmentation algorithm to establish word boxes around likely logo character strings; and analyzing each of the word boxes to produce a set of character segmentations to delineate the characters in the likely logo character strings.
6 . The method of claim 1 further comprising: combining neighboring keypoint regions with consistent aspect ratios and size to generate a new keypoint and region.
7 . The method of claim 1 further comprising: detecting and combining edge segments in a keypoint region; and binning sample points on selected edges according to angle and distance with reference to a dominant orientation of the selected edges.
8 . The method of claim 1 further comprising: using multiple text classifiers for robust logo text detection.
9 . The method of claim 1 further comprising: using stroke heuristics to select a text classifier.
10 . The method of claim 1 further comprising: using N-gram matching to recognize a segment.
11 . An apparatus comprising: at least one processor; and a memory in communication with the at least one processor, the memory including non-transitory computer-readable code which, when executed, cause the at least one processor to at least: apply a saliency analysis and segmentation of selected regions in a selected video frame to determine segmented likely logo regions; process the segmented likely logo regions using at least two of three techniques to generate correspond matches, the three techniques including a first technique involving feature matching using correlation to generate a first match, a second technique involving neural network classification using a convolutional neural network to generate a second match, and a third technique involving text recognition using character segmentation and string matching to generate a third match; and decide a most likely logo match by combining results from the generated matches that correspond to the at least two of three techniques.
12 . The apparatus of claim 11 , wherein the saliency analysis comprises: applying a discrete cosine transform (OCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each segmented likely logo region.
13 . The apparatus of claim 11 , wherein saliency detection comprises: applying a discrete cosine transform (DCT) on the segmented likely logo regions of an image in a selected video frame to determine spectral saliency of each likely logo region; and measuring multi-scale similarity at two higher scales and a smaller scale of the spectral saliency of each likely logo region.
14 . The apparatus of claim 13 , wherein the multi-scale similarity measures include orientation gradient histograms, hue, saturation, value (HSV) histograms, and stroke width transform (SWT) statistics which include total number of strokes, number of horizontal strokes, number of vertical strokes, stroke density, and number of loops.
15 . A non-transitory computer-readable storage medium storing code which, when executed, cause a machine to at least: apply a saliency analysis and segmentation of selected regions in a selected video frame to determine segmented likely logo regions; process the segmented likely logo regions using at least two of three techniques to generate correspond matches, the three techniques including a first technique involving feature matching using correlation to generate a first match, a second technique involving neural network classification using a convolutional neural network to generate a second match, and a third technique involving text recognition using character segmentation and string matching to generate a third match; and decide a most likely logo match by combining results from the generated matches that correspond to the at least two of three techniques.
16 . The computer-readable storage medium of claim 15 , wherein segmentation comprises: applying a stroke width transform (SWT) analysis to the selected regions to generate SWT statistics; applying a graph based segmentation algorithm to establish word boxes around likely logo character strings; and analyzing each of the word boxes to produce a set of character segmentations to delineate the characters in the likely logo character strings.
17 . The computer-readable storage medium of claim 15 further comprising: combining neighboring keypoint regions with consistent aspect ratios and size to generate a new keypoint and region.
18 . The computer-readable storage medium of claim 15 further comprising detecting and combining edge segments in a keypoint region; and binning sample points on selected edges according to angle and distance with reference to a dominant orientation of the selected edges.
19 . The computer-readable storage medium of claim 15 further comprising: using multiple text classifiers for robust logo text detection.
20 . The computer-readable storage medium of claim 15 further comprising: using stroke heuristics to select a text classifier.

Description

This is a continuation of U.S. patent application Ser. No. 18/507,560, filed Nov. 13, 2023, which is a continuation of U.S. patent application Ser. No. 17/672,963, filed Feb. 16, 2022, which is a continuation of U.S. patent application Ser. No. 16/841,681, filed on Apr. 7, 2020, which is a continuation of U.S. patent application Ser. No. 16/018,011, filed on Jun. 25, 2018 and issued as U.S. Pat. No. 10,614,582, which is a divisional of U.S. patent application Ser. No. 15/172,826, filed on Jun. 3, 2016 and issued as U.S. Pat. No. 10,007,863, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/171,820 entitled “Logo Recognition in Images and Videos” filed on Jun. 5, 2015, which are hereby incorporated by reference in their entirety. CROSS REFERENCE TO RELATED APPLICATIONS U.S. patent application Ser. No. 12/141,337 filed on Jun. 18, 2009 entitled “Method and Apparatus for Multi-dimensional Content Search and Video Identification” now issued as U.S. Pat. No. 8,171,030; U.S. application Ser. No. 12/141,163 filed on Jun. 18, 2008 entitled “Methods and Apparatus for Providing a Scalable Identification of Digital Video Sequences” now issued as U.S. Pat. No. 8,229,227; U.S. patent application Ser. No. 12/772,566 filed on May 3, 2010 entitled “Media Fingerprinting and Identification System” now issued as U.S. Pat. No. 8,195,689; U.S. application Ser. No. 12/788,796 filed on May 27, 2010 entitled “Multi-Media Content Identification Using Multi-Level Content Signature Correlation and Fast Similarity Search” now issued as U.S. Pat. No. 8,335,786; U.S. application Ser. No. 13/102,479 filed on May 6, 2011 entitled “Scalable, Adaptable, and Manageable System for Multimedia Identification” now issued as U.S. Pat. No. 8,655,878; and U.S. application Ser. No. 13/276,110 filed on Oct. 18, 2011 entitled “Distributed and Tiered Architecture for Content Search and Content Monitoring” now issued as U.S. Pat. No. 8,959,108, all of which are incorporated by reference herein in their entirety. FIELD OF THE INVENTION The present invention relates generally to methods for advertising, content retrieval, media monitoring, image and video processing. More specifically, the present invention relates to use of logo recognition, text detection, optical character recognition (OCR), machine learning techniques including neural net classifiers and support vector machines (SVM). BACKGROUND OF THE INVENTION Sponsored advertising is a large and dynamic business segment with more than $55 billion spent in 2014. The resulting ecosystem of sponsored advertising includes measurement for potential value of targets (teams, celebrity, retail, stadium spaces) and actual value as measured by “earned viewership” or promotion of the advertising brand. Harvesting of user generated content for displaying or content marketing is another business segment enabled by logo recognition systems. Additionally “competitive brand intelligence” of all media content including online videos, broadcast or streaming video, social images and outdoor display is another use case for more accurate logo recognition systems. Other applications include measurement of product placement within stores, detection and localization of products in retail aisles for a better shopping experience and to provide information for retail management. Additionally, other applications include logistics and industrial applications. However, current solutions for logo recognition have various limitations. One constraint is time and cost to train a system to recognize new logos due in part to the effort to collect large numbers of trainable images. Another limitation is the accuracy to detect various types of logos in the presence of significant warp, occlusion, blur and varying lighting conditions. Another limitation of general current solutions is a weakness in detecting tiny and often distorted logos on cloth, such as logos located on banners and apparel. Another weakness of such systems is the limited number of logos that can be recognized which is often limited due to accuracy of both current feature detectors that use bag of words methods and learning methods such as neural network classifiers. SUMMARY OF THE INVENTION In one or more of its several aspects, the present invention addresses problems such as those described above. For example, a method for logo recognition in accordance with an aspect of the present invention may suitably use saliency analysis, segmentation techniques, and character stroke analysis as addressed further herein to segment likely logo regions. Saliency detection relies on the fact that logos have significant information content compared to the background. Multi-scale similarity comparison is performed to remove less interesting regions such as text strings within a sea of text or objects in large sets of objects, such as faces in a sea of faces. To achieve high robustness and accuracy of detection, multiple methods are used to recognize a log