US-12626435-B2 - Video playback recognition apparatus and method

US12626435B2US 12626435 B2US12626435 B2US 12626435B2US-12626435-B2

Abstract

A method includes processing live image data captured by a camera associated with a first device. The live image data is displayed by the first device. A video displayed by a second device is viewable within the live image data displayed by the first device. The live image data is processed to identify a spatial position of the video displayed with respect to the camera associated with the first device within the live image data, and to identify a known frame of the video. The method also includes causing augmented imagery to be displayed by the first device in a display area of the first device within which the live image data is displayed concurrently with the video displayed by the second device. The augmented imagery is displayed in the display area of the first device based on the identified spatial position, and in response to the known frame.

Inventors

Mark Michael Gerhard
Riaan Henning Hodgson
David GOMBERG

Assignees

PlayFusion Limited

Dates

Publication Date: 20260512
Application Date: 20230929

Claims (20)

1 . A method, comprising: processing live image data captured by a camera associated with a first device, wherein the live image data is displayed by the first device, a video displayed by a second device is viewable within the live image data displayed by the first device, the live image data is processed to identify a spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, and the live image data is processed to identify a known frame of the video displayed by the second device; and causing augmented imagery to be displayed by the first device in a display area of the first device within which the live image data is displayed concurrently with the video displayed by the second device, wherein the augmented imagery is displayed in the display area of the first device based on the identified spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, and in response to the known frame.
2 . The method of claim 1 , wherein the augmented imagery is displayed by the first device so as to appear between the video displayed by the second device and the first device.
3 . The method of claim 1 , wherein the augmented imagery is displayed by the first device so as to appear behind the video displayed by the second device such that the video displayed by the second device is between the augmented imagery and the first device.
4 . The method of claim 1 , wherein the augmented imagery is displayed by the first device so as to appear as being on the video displayed by the second device.
5 . The method of claim 1 , wherein the processing of the live image data to identify the spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, comprises: identifying one or more points in a candidate frame of the video displayed by the second device; comparing the one or more points in the candidate frame to a plurality of points in a database of known points corresponding to known video clips comprising a plurality of known frames; and identifying the known frame within the video displayed by the second device based on a matching of the one or more points in the candidate frame of the video displayed by the second device and the plurality of points in the database of known points corresponding to the known video clips comprising the plurality of known frames.
6 . The method of claim 5 , further comprising: detecting a boundary of the candidate frame in the live image data captured by the camera; cropping the candidate frame from the live image data captured by the camera to generate a cropped frame image; and processing the cropped frame image to identify the known frame.
7 . The method of claim 6 , further comprising: de-warping the cropped frame image before processing the cropped frame image.
8 . The method of claim 1 , wherein the augmented imagery is further caused to be displayed in response to a playback position in the video displayed by the second device, wherein the playback position is based on the known frame.
9 . The method of claim 8 , wherein the playback position is an estimated playback position in the video displayed by the second device, the estimated playback position is based on a continuously transformed state of the video displayed by the second device, the continuously transformed state corresponds to a time in the playback of the video displayed by the second device, and the continuously transformed state is based on a plurality of past states in a table of past states ranging from an oldest past state to a newest past state.
10 . The method of claim 9 , wherein the table of past states is maintained outside a neural network that generates the continuously transformed state and the estimated playback position, and the method further comprises: causing one or more of the plurality of past states in the table of past states to be supplied to the neural network for determining a latest state based on the known frame; causing the neural network to generate the estimated playback position; and causing the latest state to be added to the table of past states as the newest past state.
11 . An apparatus, comprising: a processor; and a memory having instructions stored thereon that, when executed by the processor, cause the apparatus to: process live image data captured by a camera associated with a first device, wherein the live image data is displayed by the first device, a video displayed by a second device is viewable within the live image data displayed by the first device, the live image data is processed to identify a spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, and the live image data is processed to identify a known frame of the video displayed by the second device; and cause augmented imagery to be displayed by the first device in a display area of the first device within which the live image data is displayed concurrently with the video displayed by the second device, wherein the augmented imagery is displayed in the display area of the first device based on the identified spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, and in response to the known frame.
12 . The apparatus of claim 11 , wherein the augmented imagery is displayed by the first device so as to appear between the video displayed by the second device and the first device.
13 . The apparatus of claim 11 , wherein the augmented imagery is displayed by the first device so as to appear behind the video displayed by the second device such that the video displayed by the second device is between the augmented imagery and the first device.
14 . The apparatus of claim 11 , wherein the augmented imagery is displayed by the first device so as to appear as being on the video displayed by the second device.
15 . The apparatus of claim 11 , wherein to process the live image data to identify the spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, the apparatus is caused to: identify one or more points in a candidate frame of the video displayed by the second device; compare the one or more points in the candidate frame to a plurality of points in a database of known points corresponding to known video clips comprising a plurality of known frames; and identify the known frame within the video displayed by the second device based on a matching of the one or more points in the candidate frame of the video displayed by the second device and the plurality of points in the database of known points corresponding to the known video clips comprising the plurality of known frames.
16 . The apparatus of claim 15 , wherein the apparatus is further caused to: detect a boundary of the candidate frame in the live image data captured by the camera; crop the candidate frame from the live image data captured by the camera to generate a cropped frame image; and process the cropped frame image to identify the known frame.
17 . The apparatus of claim 16 , wherein the apparatus is further caused to: de-warp the cropped frame image before processing the cropped frame image.
18 . The apparatus of claim 11 , wherein the augmented imagery is further caused to be displayed in response to a playback position in the video displayed by the second device, wherein the playback position is based on the known frame.
19 . The apparatus of claim 18 , wherein the playback position is an estimated playback position in the video displayed by the second device, the estimated playback position is based on a continuously transformed state of the video displayed by the second device, the continuously transformed state corresponds to a time in the playback of the video displayed by the second device, and the continuously transformed state is based on a plurality of past states in a table of past states ranging from an oldest past state to a newest past state.
20 . A non-transitory computer readable medium having instructions stored thereon that, when executed by a processor, cause an apparatus to: process live image data captured by a camera associated with a first device, wherein the live image data is displayed by the first device, a video displayed by a second device is viewable within the live image data displayed by the first device, the live image data is processed to identify a spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, and the live image data is processed to identify a known frame of the video displayed by the second device; and cause augmented imagery to be displayed by the first device in a display area of the first device within which the live image data is displayed concurrently with the video displayed by the second device, wherein the augmented imagery is displayed in the display area of the first device based on the identified spatial position of the video displayed with respect to the camera associated with the first device within the live image data captured by the camera, and in response to the known frame.

Description

BACKGROUND Service providers and device manufacturers are continually challenged to deliver value and convenience to consumers by, for example, providing compelling network services. Video playback recognition processes often involve special watermarking, fingerprinting, and/or audio recognition to identify a playback position in a video that is being viewed. BRIEF DESCRIPTION OF THE DRAWINGS Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, in accordance with the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion. FIG. 1 is a diagram of a system for recognizing a video displayed by a playback device, in accordance with one or more embodiments. FIG. 2 is a diagram of video playback recognition platform, in accordance with one or more embodiments. FIG. 3 is a diagram of user equipment with live image data being displayed, in accordance with one or more embodiments. FIG. 4 is a diagram of a screen finder process, in accordance with one or more embodiments. FIG. 5 is a diagram of a loss constraint diagram, in accordance with one or more embodiments. FIG. 6 is a graphical representation a cropping and de-warping process, in accordance with one or more embodiments. FIG. 7 is a diagram of a frame identifier process, in accordance with one or more embodiments. FIG. 8 is a flow diagram of a sequence identifier process for generating an estimated playback position, in accordance with one or more embodiments. FIG. 9 is a flow chart of a process of determining a video playback position and causing augmented imagery to be displayed, in accordance with one or more embodiments. FIG. 10 is a functional block diagram of a computer or processor-based system upon which or by which an embodiment is implemented. DETAILED DESCRIPTION The following disclosure provides many different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components, values, operations, materials, arrangements, or the like, are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. Other components, values, operations, materials, arrangements, or the like, are contemplated. For example, the formation of a first feature over or on a second feature in the description that follows may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which additional features may be formed between the first and second features, such that the first and second features may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. Further, the present disclosure may omit some operations, such as a “response” or “send receipt” that corresponds to the previous operation, for the purpose of simplicity and clarity. Further, spatially relative terms, such as “beneath,” “below,” “lower,” “above,” “upper” and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the figures. The spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. The apparatus may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein may likewise be interpreted accordingly. Game developers, toy manufacturers, media providers, advertisers, etc. are continually challenged to develop new and interesting ways for users to interact with games, toys, television shows, movies, video clips, commercials, advertisements, music, or other consumable media. FIG. 1 is a diagram of a system 100 for recognizing a video displayed by a playback device, in accordance with one or more embodiments. System 100 comprises user equipment (UE) 101 having connectivity to a video playback recognition platform 103 and a database 105. The UE 101, video playback recognition platform 103 and a database 105 communicate by wired or wireless communication connection and/or one or more networks, or combination thereof. By way of example, the UE 101, video playback recognition platform 103 and database 105 communicate with each other using well known, new or still developing protocols. In this context, a protocol includes a set of rules defining how the network nodes within a communication network interact with each other based on information sent over the communication links. The protocols