US-12625906-B2 - System and methods for resolving query related to content

US12625906B2US 12625906 B2US12625906 B2US 12625906B2US-12625906-B2

Abstract

Systems and methods are described for providing a reply to a query related to a media asset. A query may be received from a user while the media asset is being played on a first device, and in response to determining that the query is related to the media asset, a snapshot of the media asset may be captured, where the snapshot comprises a depiction of a first object and a second object, and the snapshot may be generated for display at a second device. In response to determining there is ambiguity whether the query is related to the first or second object, a disambiguating query based on the first and second objects of the snapshot may be generated for simultaneous output with the snapshot. In response to receiving a reply to the disambiguating query, a response to the query may be generated for output based on the reply.

Inventors

Ankur Anil Aher
Jeffry Copps Robert Jose

Assignees

ADEIA GUIDES INC.

Dates

Publication Date: 20260512
Application Date: 20231129

Claims (20)

1 . A method comprising: generating, for display at a particular device, a media asset; determining that an input, received while a particular portion of the media asset depicts a first object and a second object, is related to the media asset and is ambiguous as to whether the input is related to the first object or the second object; in response to determining that the input is related to the media asset and is ambiguous as to whether the input is related to the first object or the second object: generating, for simultaneous display with a current portion of the media asset at the particular device, a snapshot of the particular portion of the media asset; and generating, for simultaneous output with the snapshot and the current portion of the media asset, a disambiguating query based on the first object and the second object; and in response to receiving a reply to the disambiguating query, generating for output a response to the input based on the reply.
2 . The method of claim 1 , wherein the disambiguating query comprises a prompt to select the first object or the second object via a user interface of the particular device, and the reply comprises a selection of the first object or the second object.
3 . The method of claim 1 , further comprising: while generating, for simultaneous output with the snapshot and the current portion of the media asset, the disambiguating query, causing playing of the media asset to be paused at the current portion.
4 . The method of claim 1 , wherein generating for output the disambiguating query comprises: generating for display an overlay highlighting the first object and the second object; and generating for display a prompt to select the first object or the second object.
5 . The method of claim 1 , wherein generating for output the disambiguating query comprises: generating for display a zoomed-in view of the first object and the second object; and generating for display a prompt to select the first object or the second object.
6 . The method of claim 1 , wherein the current portion of the media asset comprises a set of contiguous frames of the media asset and each frame of the set of contiguous frames comprises at least one of the first object or the second object, and wherein the first object is a first actor, and the second object is a second actor that is different from the first actor.
7 . The method of claim 1 , further comprising: causing capture of the snapshot of the particular portion of the media asset in response to determining that simultaneous display of each of the first object and the second object in the media asset will cease within a threshold period of time.
8 . The method of claim 1 , wherein generating for output the disambiguating query comprises: identifying the first and second objects in the snapshot; determining at least one attribute of each of the first object and the second object; and generating the disambiguating query based on the attributes of each of the first object and the second object.
9 . The method of claim 8 , further comprising: identifying a type of the first object and a type of the second object based on respective attributes of the first object and the second object; querying a database to determine a disambiguation success rate associated with the type of the first object and a disambiguation success rate associated with the type of the second object; and in response to determining the type of the first object is associated with a greater disambiguation success rate than the type of the second object, generating the disambiguating query based on the first object.
10 . The method of claim 8 , further comprising: determining a confidence level associated with a classification of the first object, wherein the classification of the first object is based at least in part on the at least one attribute of the first object; determining a confidence level associated with a classification of the second object, wherein the classification of the second object is based at least in part on the at least one attribute of the second object; and in response to determining the first object is associated with a higher classification confidence level than the second object, generating the disambiguating query based on the first object.
11 . A system comprising: non-transitory computer memory; control circuitry configured to: generate, for display at a particular device, a media asset; determine that an input, received while a particular portion of the media asset depicts a first object and a second object, is related to the media asset and is ambiguous as to whether the input is related to the first object or the second object, wherein the input is received from a user associated with a user profile stored in the computer memory; in response to determining that the input is related to the media asset and is ambiguous as to whether the input is related to the first object or the second object: generate, for simultaneous display with a current portion of the media asset at the particular device, a snapshot of the particular portion of the media asset; and generate, for simultaneous output with the snapshot and the current portion of the media asset, a disambiguating query based on the first object and the second object; and in response to receiving a reply to the disambiguating query, generate for output a response to the input based on the reply.
12 . The system of claim 11 , wherein the disambiguating query comprises a prompt to select the first object or the second object via a user interface of the particular device, and the reply comprises a selection of the first object or the second object.
13 . The system of claim 11 , wherein the control circuitry is configured to: while generating, for simultaneous output with the snapshot and the current portion of the media asset, the disambiguating query, cause playing of the media asset to be paused at the current portion.
14 . The system of claim 11 , wherein the control circuitry is configured to generate for output the disambiguating query by: generating for display an overlay highlighting the first object and the second object; and generating for display a prompt to select the first object or the second object.
15 . The system of claim 11 , wherein the control circuitry is configured to generate for output the disambiguating query by: generating for display a zoomed-in view of the first object and the second object; and generating for display a prompt to select the first object or the second object.
16 . The system of claim 11 , wherein the current portion of the media asset comprises a set of contiguous frames of the media asset and each frame of the set of contiguous frames comprises at least one of the first object or the second object, and wherein the first object is a first actor, and the second object is a second actor that is different from the first actor.
17 . The system of claim 11 , wherein the control circuitry is further configured to: cause capture of the snapshot of the particular portion of the media asset in response to determining that simultaneous display of each of the first object and the second object in the media asset will cease within a threshold period of time.
18 . The system of claim 11 , wherein the control circuitry is configured to generate for output the disambiguating query by: identifying the first and second objects in the snapshot; determining at least one attribute of each of the first object and the second object; and generating the disambiguating query based on the attributes of each of the first object and the second object.
19 . The system of claim 18 , wherein the control circuitry is further configured to: identify a type of the first object and a type of the second object based on respective attributes of the first object and the second object; query a database to determine a disambiguation success rate associated with the type of the first object and a disambiguation success rate associated with the type of the second object; and in response to determining the type of the first object is associated with a greater disambiguation success rate than the type of the second object, generate the disambiguating query based on the first object.
20 . The system of claim 18 , wherein the control circuitry is further configured to: determine a confidence level associated with a classification of the first object, wherein the classification of the first object is based at least in part on the at least one attribute of the first object; determine a confidence level associated with a classification of the second object, wherein the classification of the second object is based at least in part on the at least one attribute of the second object; and in response to determining the first object is associated with a higher classification confidence level than the second object, generate the disambiguating query based on the first object.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation of U.S. patent application Ser. No. 17/345,217, filed Jun. 11, 2021, the disclosure of which is hereby incorporated by reference herein in its entirety. BACKGROUND This disclosure is directed to, while a media asset is being played, providing a reply to a user query related to the media asset. Specifically, techniques are disclosed for generating for simultaneous output a snapshot comprising a depiction of the media asset and a disambiguating query based on a first object and a second object included in the snapshot. SUMMARY Many users have become accustomed to interacting with digital assistant applications or applications providing digital assistant capabilities (e.g., voice-based, text-based, a combination thereof, etc.). For example, a digital assistant may receive a request from a user to play a movie, find local restaurants in his or her area, or provide a weather report. However, digital assistants often receive ambiguous queries which are difficult to answer (e.g., based on limited information provided in the query). For example, if a digital assistant receives a query of “Who is the actor in Harry Potter?” the digital assistant may determine that it is unclear which actor the user is asking about, e.g., is it the actor who plays Dumbledore, the actor who plays Harry Potter, or the actor who plays Ron Weasley, etc. In an approach, the system may simply guess at what the user meant when providing a response (e.g., the system may guess that user is referring to the actor who plays Harry Potter because he is more popular). This approach is deficient, because the system does not provide the user an opportunity to clarify who he or she intended to reference by the query, and the system may merely be providing information that the user already knows or is not interested in. To overcome this problem, the system may prompt the user to reply to another query, such as “What other movies have you seen the actor in?” in an effort to clarify which actor was intended by the initial query. However, this approach also faces certain problems. For example, the digital assistant may simply receive another unhelpful or ambiguous reply (e.g., “I don't know” or “He was also in that movie with the dog”) from the user, and this dialog may continue (e.g., digital assistant “How old does he look?”; user: “16,” etc.). That is, in such approach, the digital assistant often fails to generate helpful prompts that quickly identify the intent of the initial query received from the user. Even if the digital assistant eventually returns the correct answer based on the intent of the initial query, the process of such a prolonged dialogue by the digital assistant unnecessarily consumes resources (e.g., memory and processing power) and may be time-consuming and frustrating for the user such that the user loses interest in the answer to the query by the end of the process. Indeed, the user may forget what the initial query was about, or what the context was (e.g., a scene of a current movie) that prompted the user to input the query. To overcome these problems, systems and methods are provided herein for receiving a query from a user while a media asset is being played on a first device, and in response to determining that the query is related to the media asset, causing capture of a snapshot of the media asset being played on the first device, wherein the snapshot comprises a depiction of a first object and a second object, and causing the captured snapshot to be generated for display on a second device. In response to determining that the query is ambiguous as to whether the query is related to the first object or the second object of the snapshot, the systems and methods provided herein generate for simultaneous output with the snapshot a disambiguating query based on the first object and the second object of the snapshot, and in response to receiving a reply to the disambiguating query, generate for output a response to the query based on the reply. Such aspects enable a system to efficiently generate an optimal query to disambiguate a query received from a user based on one or more of a variety of factors, in order to minimize or avoid an extensive dialogue between the system and the user. For example, such systems and methods may analyze one or more frames of a media asset (related to the query and being played while the query is received) for attributes to generate a disambiguating query to enable a user to provide feedback to quickly clarify the initial query (e.g., refer to features that are largest or most conspicuous on the screen and with a likelihood of being maximally disambiguating). In some embodiments, such aspects simultaneously provide (e.g., at a mobile device of a user) a snapshot of a portion of the media asset that relates to query received from the user along with a disambiguating query, while the media asset continues playing at a first de