US-12625907-B2 - Entity cards including descriptive content relating to entities from a video

US12625907B2US 12625907 B2US12625907 B2US 12625907B2US-12625907-B2

Abstract

A computer-implemented method includes: receiving a video and implementing one or more machine-learned models to: identify a plurality of entities from the video, rank the plurality of entities, and generate, based on the rank of the plurality of entities, a first entity card which includes descriptive content relating to a first entity from among the plurality of entities. The first entity corresponds to one or more entities which are ranked highest among the plurality of entities. The method further includes: providing, for presentation on a display device, the video; and providing, for presentation on the display device while the video is being played, the first entity card, in response to the first entity being mentioned in the video.

Inventors

Jonathan Matthew Malmaud
Nicolas Paul-Stringall HIGUERA
Jeff Hsu
Sarah Fay Smith
Gabriel Rubow Culbertson
Runmin Zhao

Assignees

GOOGLE LLC

Dates

Publication Date: 20260512
Application Date: 20240717

Claims (20)

1 . A computer-implemented method, comprising: receiving a video; implementing one or more machine-learned models to: identify a plurality of entities from the video, wherein the one or more machine-learned models are trained to identify a training entity from a training video based on at least one of a number of times the training entity is mentioned in the training video, a relevance of the training entity to a topic in the training video, or a number of times the training entity appears across a corpus of training videos, rank the plurality of entities, wherein the one or more machine-learned models are trained to rank the training entity identified in the training video based on at least one of a likelihood that the training entity is to be searched for when the training video is viewed, the number of times the training entity is mentioned in the training video, the relevance of the training entity to the topic in the training video, or the number of times the training entity appears across the corpus of training videos, and generate, based on the rank of the plurality of entities, a first entity card which includes descriptive content relating to a first entity from among the plurality of entities, wherein the first entity corresponds to one or more entities which are ranked highest among the plurality of entities; providing, for presentation on a display device, the video; and providing, for presentation on the display device while the video is being played, the first entity card, in response to the first entity being mentioned in the video.
2 . The computer-implemented method of claim 1 , wherein implementing the one or more machine-learned models to identify the plurality of entities includes: obtaining training data to train the one or more machine-learned models based on observational data of users conducting searches in response to viewing the video.
3 . The computer-implemented method of claim 1 , wherein implementing the one or more machine-learned models to identify the plurality of entities further includes: identifying the plurality of entities from the video by associating text from a transcription of the video with a knowledge graph, and ranking the plurality of entities, based on one or more of: a relevance of each of the plurality of entities to a topic of the video, a relevance of each of the plurality of entities to one or more other entities among the plurality of entities, a number of mentions of each entity in the video among the plurality of entities, and a number of videos in which each entity among the plurality of entities appears across a corpus of videos stored in one or more databases.
4 . The computer-implemented method of claim 3 , wherein implementing the one or more machine-learned models to identify the plurality of entities further includes: evaluating user interactions with the first entity card, and determining at least one adjustment to the one or more machine-learned models based on the evaluating.
5 . The computer-implemented method of claim 1 , wherein the first entity is mentioned in the video at a first timepoint in the video, and the first entity card is provided for presentation on the display device at the first timepoint.
6 . The computer-implemented method of claim 1 , wherein the one or more entities include a second entity, and the method further comprises: implementing the one or more machine-learned models to generate, based on the rank of the plurality of entities, a second entity card which includes descriptive content relating to the second entity; providing, for presentation on the display device while the video is being played and before the second entity is mentioned in the video, the second entity card in a contracted form, the second entity card in the contracted form referencing the second entity; and providing, for presentation on the display device while the video is being played and in response to the second entity being mentioned in the video, the second entity card in a fully expanded form, the second entity card in the fully expanded form including descriptive content relating to the second entity.
7 . The computer-implemented method of claim 1 , wherein the one or more entities include a second entity, and the method further comprises: implementing the one or more machine-learned models to generate, based on the rank of the plurality of entities, a second entity card which includes descriptive content relating to the second entity; and providing, for presentation on the display device while the video is being played, the second entity card, by replacing the first entity card at a time when the second entity is mentioned in the video, the second entity card including descriptive content relating to the second entity.
8 . The computer-implemented method of claim 1 , further comprising: providing, for presentation on the display device while the video is being played, a notification user interface element in response to the first entity being mentioned in the video, the notification user interface element indicating additional information relating to the first entity is available; and providing, for presentation on the display device while the video is being played, the first entity card, in response to the first entity being mentioned in the video and in response to receiving a selection of the notification user interface element.
9 . The computer-implemented method of claim 1 , wherein the first entity card includes at least one of a textual summary providing information relating to the first entity or an image relating to the first entity.
10 . A computing device, comprising: a display device; one or more memories to store instructions; and one or more processors to execute the instructions stored in the one or more memories to perform operations, the operations including: receiving a video; implementing one or more machine-learned models to: identify a plurality of entities from the video, wherein the one or more machine-learned models are trained to identify a training entity from a training video based on at least one of a number of times the training entity is mentioned in the training video, a relevance of the training entity to a topic in the training video, or a number of times the training entity appears across a corpus of training videos, rank the plurality of entities, wherein the one or more machine-learned models are trained to rank the training entity identified in the training video based on at least one of a likelihood that the training entity is to be searched for when the training video is viewed, the number of times the training entity is mentioned in the training video, the relevance of the training entity to the topic in the training video, or the number of times the training entity appears across the corpus of training videos, and generate, based on the rank of the plurality of entities, a first entity card which includes descriptive content relating to a first entity from among the plurality of entities, wherein the first entity corresponds to one or more entities which are ranked highest among the plurality of entities; providing, for presentation on a display device, the video; and providing, for presentation on the display device while the video is being played, the first entity card, in response to the first entity being mentioned in the video.
11 . The computing device of claim 10 , wherein the first entity is mentioned in the video at a first timepoint in the video, and the first entity card is provided for presentation on the display device at the first timepoint.
12 . The computing device of claim 10 , wherein the one or more entities include a second entity, and the operations further include: implementing the one or more machine-learned models to generate, based on the rank of the plurality of entities, a second entity card which includes descriptive content relating to the second entity; providing, for presentation on the display device while the video is being played and before the second entity is mentioned in the video, the second entity card in a contracted form, the second entity card in the contracted form referencing the second entity; and providing, for presentation on the display device while the video is being played and in response to the second entity being mentioned in the video, the second entity card in a fully expanded form, the second entity card in the fully expanded form including descriptive content relating to the second entity.
13 . The computing device of claim 10 , wherein the one or more entities include a second entity, and the operations further include: implementing the one or more machine-learned models to generate, based on the rank of the plurality of entities, a second entity card which includes descriptive content relating to the second entity; and providing, for presentation on the display device while the video is being played, the second entity card, by replacing the first entity card at a time when the second entity is mentioned in the video, the second entity card including descriptive content relating to the second entity.
14 . The computing device of claim 10 , further comprising providing for presentation on the display device, one or more first entity search user interface elements that, when selected, are configured to perform a search relating to the first entity.
15 . The computing device of claim 10 , wherein the one or more machine-learned models are trained based on observational data of users conducting searches in response to viewing the video.
16 . The computing device of claim 10 , wherein the plurality of entities are ranked based on a prediction as to which entities among the plurality of entities are most likely to be searched for by a user viewing the video.
17 . The computing device of claim 16 , wherein the one or more machine-learned models are configured to re-identify the plurality of entities from the video according to a preset schedule.
18 . The computing device of claim 10 , wherein the operations further include: providing, for presentation on the display device while the video is being played, a notification user interface element in response to the first entity being mentioned in the video, the notification user interface element indicating additional information relating to the first entity is available; and providing, for presentation on the display device while the video is being played, the first entity card, in response to the first entity being mentioned in the video and in response to receiving a selection of the notification user interface element.
19 . The computing device of claim 10 , wherein the first entity card includes a textual summary providing information relating to the first entity and/or an image relating to the first entity.
20 . A non-transitory computer readable medium storing instructions which, when executed by a processor, cause the processor to perform operations, the operations comprising: receiving a video; implementing one or more machine-learned models to: identify a plurality of entities from the video, wherein the one or more machine-learned models are trained to identify a training entity from a training video based on at least one of a number of times the training entity is mentioned in the training video, a relevance of the training entity to a topic in the training video, or a number of times the training entity appears across a corpus of training videos, rank the plurality of entities, wherein the one or more machine-learned models are trained to rank the training entity identified in the training video based on at least one of a likelihood that the training entity is to be searched for when the training video is viewed, the number of times the training entity is mentioned in the training video, the relevance of the training entity to the topic in the training video, or the number of times the training entity appears across the corpus of training videos and generate, based on the rank of the plurality of entities, a first entity card which includes descriptive content relating to a first entity from among the plurality of entities, wherein the first entity corresponds to one or more entities which are ranked highest among the plurality of entities; providing, for presentation on a display device, the video; and providing, for presentation on the display device while the video is being played, the first entity card, in response to the first entity being mentioned in the video.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 18/091,837 having a filing date of Dec. 30, 2022, which is based on and claims priority to U.S. Provisional Application 63/341,674 having a filing date of May 13, 2022. Applicant claims priority to and the benefit of each of such applications and incorporates all such applications herein by reference in their entirety for all purposes. FIELD The disclosure relates generally to providing entity cards for a user interface in association with a video displayed on a display of a user computing device. More particularly, the disclosure relates to providing entity cards which assist in the understanding of the contents of the video and include descriptive content relating to an entity (e.g., a concept, a term, a topic, and the like) which is mentioned in the video. BACKGROUND When users watch a video, for example on a challenging or a new topic, there may be keywords or concepts that the user is not familiar with, but are helpful to understanding the content of the video. For example, in a video about the Egyptian pyramids, the term “sarcophagus” may be an important concept which is discussed extensively. However, a user not familiar with the term “sarcophagus” may not fully understand the content of the video. The user may pause the video and navigate to a search page to perform a search for the term “sarcophagus.” In some instances, the user may have difficulty spelling the term which they wish to search for and may not obtain accurate search results or may experience inconvenience in searching. In other instances, the user may stop watching the video after finding the content of the video too difficult to understand. SUMMARY Aspects and advantages of embodiments of the disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the example embodiments. In one or more example embodiments, a computer-implemented method for a server system includes obtaining a transcription of content from a video, applying a machine learning resource to identify one or more entities which are most likely to be searched for by a user viewing the video, based on the transcription of the content, generating one or more entity cards for each of the one or more entities, each of the one or more entity cards including descriptive content relating to a respective entity among the one or more entities, and providing a user interface, to be displayed on a respective display of one or more user computing devices, for: playing the video on a first portion of the user interface, and when the video is played and a first entity among the one or more entities is mentioned in the video, displaying a first entity card on a second portion of the user interface, the first entity card including descriptive content relating to the first entity. In some implementations, applying the machine learning resource to identify the one or more entities includes obtaining training data to train the machine learning resource based on observational data of users conducting searches in response to viewing only the video. In some implementations, applying the machine learning resource to identify the one or more entities includes identifying a plurality of candidate entities from the video by associating text from the transcription with a knowledge graph, and ranking the candidate entities to obtain the one or more entities, based on one or more of: a relevance of each of the candidate entities to a topic of the video, a relevance of each of the candidate entities to one or more other candidate entities among the plurality of candidate entities, a number of mentions of the candidate entity in the video, and a number of videos in which the candidate entity appears across a corpus of videos stored in one or more databases. In some implementations, applying the machine learning resource to identify the one or more entities includes evaluating user interactions with the user interface, and determining at least one adjustment to the machine learning resource based on the evaluation of the user interactions with the user interface. In some implementations, the first entity is mentioned in the video at a first timepoint in the video, and the first entity card is displayed on the second portion of the user interface at the first timepoint. In some implementations, the one or more entities include a second entity and the one or more entity cards include a second entity card, and the method further includes providing the user interface, to be displayed on the respective display of the one or more user computing devices, for: displaying, on a third portion of the user interface while the continuing to play the video, the second entity card in a contracted form, the second entity card in the contracted form referencing the second entity to be mentioned in the video at a second timepoint in