US-12627865-B2 - Systems and methods for inserting emoticons within a media asset

US12627865B2US 12627865 B2US12627865 B2US 12627865B2US-12627865-B2

Abstract

Systems and methods are described herein for inserting emoticons within a media asset based on an audio portion of the media asset. Each audio portion of a media asset is associated with a respective part of speech, and an emotion corresponding to the audio portion for the media asset is determined. A corresponding emoticon is identified based on the determined emotion in the audio portion and presented in the subtitles.

Inventors

Ankur Anil Aher
Charishma Chundi

Assignees

ADEIA GUIDES INC.

Dates

Publication Date: 20260512
Application Date: 20241114

Claims (20)

1 . A method comprising: providing media content for presentation, wherein a segment of the media content is in a first language; providing subtitles for the segment, wherein the subtitles represent a translation of the segment from the first language to a second language; determining at least one first emotion value associated with the segment in the first language; determining at least one second emotion value associated with the subtitles in the second language, wherein the at least one second emotion value corresponds to at least one keyword in the subtitles; determining a third emotion value present in the at least one first emotion value associated with the segment in the first language and absent from the at least one second emotion value associated with the subtitles in the second language; selecting an emoticon associated with the at least one keyword in the subtitles and corresponding to the third emotion value present in the at least one first emotion value associated with the segment in the first language and absent from the at least one second emotion value associated with the subtitles in the second language; and causing the segment, the subtitles, and selected emoticon to be provided for presentation, wherein the selected emoticon is presented with the subtitles.
2 . The method of claim 1 , wherein the segment comprises a plurality of segment attributes comprising at least one of a facial expression of actors in the media content, a body expression of actors in the media content, a dialogue in an audio portion of the media content, a tone of the dialogue in the audio portion of the media content, and background music in the audio portion of the media content.
3 . The method of claim 2 , wherein determining the at least one first emotion value associated with the segment in the first language and the at least one second emotion value associated with the subtitles in the second language comprises: determining the at least one first emotion value corresponding to the segment based on analyzing at least one segment attribute of the plurality of segment attributes; and determining the at least one second emotion value corresponding to the subtitles based on analyzing subtitles data.
4 . The method of claim 1 , wherein determining the third emotion value further comprises comparing the at least one first emotion value to the at least one second emotion value by: calculating a threshold value based on the at least one first emotion value; and comparing the at least one second emotion value to the calculated threshold value.
5 . The method of claim 4 , further comprising selecting the emoticon associated with the at least one keyword in the subtitles that corresponds to the third emotion value absent from the at least one second emotion value based on: determining that the at least one second emotion value is below the calculated threshold value.
6 . The method of claim 1 , wherein selecting the emoticon associated with the at least one keyword in the subtitles and corresponding to the third emotion value comprises: identifying a plurality of emoticons associated with the at least one keyword; generating a ranked list of the plurality of emoticons, wherein each emoticon of the plurality of emoticons is ranked based on a respective relevance value; and selecting the emoticon from the ranked list.
7 . The method of claim 6 , wherein, for each emoticon of the plurality of emoticons, the respective relevance value is calculated based on: determining a similarity between the at least one keyword associated with the emoticon and the third emotion value; and generating the relevance value based on the determined similarity.
8 . The method of claim 6 , wherein, for each emoticon of the plurality of emoticons, the respective relevance value is calculated based on: accessing user preference information; determining whether the emoticon corresponds to a user preference; and generating the relevance value based on the determination.
9 . The method of claim 1 , wherein causing the segment, the subtitles, and the selected emoticon to be provided for presentation comprises: formatting at least one of the subtitles or the selected emoticon to make the emoticon visually distinguishable during presentation.
10 . The method of claim 1 , wherein causing the segment, the subtitles, and the selected emoticon to be provided for presentation comprises: identifying a plurality of insertion spots for the selected emoticon; selecting an insertion spot from the plurality of insertion spots that best represents the third emotion value; and presenting the selected emoticon at the selected insertion spot.
11 . A system comprising: input/output circuitry; control circuitry configured to: provide media content for presentation, wherein a segment of the media content is in a first language; provide subtitles for the segment, wherein the subtitles represent a translation of the segment from the first language to a second language; determine at least one first emotion value associated with the segment in the first language; determine at least one second emotion value associated with the subtitles in the second language, wherein the at least one second emotion value corresponds to at least one keyword in the subtitles; determine a third emotion value present in the at least one first emotion value associated with the segment in the first language and absent from the at least one second emotion value associated with the subtitles in the second language; select an emoticon associated with the at least one keyword in the subtitles and corresponding to the third emotion value present in the at least one first emotion value associated with the segment in the first language and absent from the at least one second emotion value associated with the subtitles in the second language; and cause the segment, the subtitles, and selected emoticon to be provided for presentation, wherein the selected emoticon is presented with the subtitles.
12 . The system of claim 11 , wherein the segment comprises a plurality of segment attributes comprising at least one of a facial expression of actors in the media content, a body expression of actors in the media content, a dialogue in an audio portion of the media content, a tone of the dialogue in the audio portion of the media content, and background music in the audio portion of the media content.
13 . The system of claim 12 , wherein the control circuitry determines the at least one first emotion value associated with the segment in the first language and the at least one second emotion value associated with the subtitles in the second language by: determining the at least one first emotion value corresponding to the segment based on analyzing at least one segment attribute of the plurality of segment attributes; and determining the at least one second emotion value corresponding to the subtitles based on analyzing subtitles data.
14 . The system of claim 11 , wherein the control circuitry is further configured to determine the third emotion value based on comparing the at least one first emotion value to the at least one second emotion value by: calculating a threshold value based on the at least one first emotion value; and comparing the at least one second emotion value to the calculated threshold value.
15 . The system of claim 14 , wherein the control circuitry is further configured to select the emoticon associated with the at least one keyword in the subtitles that corresponds to the third emotion value absent from the at least one second emotion value based on: determining that the at least one second emotion value is below the calculated threshold value.
16 . The system of claim 11 , wherein the control circuitry is configured to select the emoticon associated with the at least one keyword in the subtitles and corresponding to the third emotion value by: identifying a plurality of emoticons associated with the at least one keyword; generating a ranked list of the plurality of emoticons, wherein each emoticon of the plurality of emoticons is ranked based on a respective relevance value; and selecting the emoticon from the ranked list.
17 . The system of claim 16 , wherein, for each emoticon of the plurality of emoticons, the control circuitry is configured to calculate a respective relevance value based on: determining a similarity between the at least one keyword associated with the emoticon and the third emotion value; and generating the relevance value based on the determined similarity.
18 . The system of claim 16 , wherein, for each emoticon of the plurality of emoticons, the control circuitry is configured to calculate a respective relevance value based on: accessing user preference information; determining whether the emoticon corresponds to a user preference; and generating the relevance value based on the determination.
19 . The system of claim 11 , wherein the control circuitry causes the segment, the subtitles, and the selected emoticon to be provided for presentation by: formatting at least one of the subtitles or the selected emoticon to make the emoticon visually distinguishable during presentation.
20 . The system of claim 11 , wherein the control circuitry causes the segment, the subtitles, and the selected emoticon to be provided for presentation by: identifying a plurality of insertion spots for the selected emoticon; selecting an insertion spot from the plurality of insertion spots that best represents the third emotion value; and presenting the selected emoticon at the selected insertion spot.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/368,213, filed Sep. 14, 2023, which is a continuation of U.S. patent application Ser. No. 17/861,446, filed Jul. 11, 2022, now U.S. Pat. No. 11,792,489, which is a continuation of U.S. patent application Ser. Nos. 17/077,541 and 17/077,539, filed Oct. 22, 2020, now U.S. Pat. Nos. 11,418,850 and 11,418,849, respectively. The disclosures of which are hereby incorporated by reference herein in their entireties. BACKGROUND This disclosure relates to inserting emoticons within a media asset and, more particularly, inserting emoticons within scenes of the media asset. SUMMARY With movies in many different languages, users are more frequently employing subtitles while consuming such movies in different languages. Typically, subtitles include direct translation of the dialogues or transcript or screenplay in a language selected by the user. Such direct translations and subtitles associated with the translation cannot convey the actor's emotion when quickly presented during the consumption of the movie. For example, an actor gesturing while also saying “Gracias” in Spanish may be translated to “thank you,” and a subtitle may be presented. However, such a translation and associated subtitle lack the actor's gesture, for example, a sarcastic indication, or a happy indication or an angry indication. Further, when consuming content with subtitles, the consumer often focuses their attention on the location where the subtitles appear on display and often misses the actor's facial expressions or gestures during each scene of the media asset. Thus, the translation does not allow the consumer to appreciate the actor's reaction and diminishes the consumer's experience of the movie. Even further, the placement of the subtitles on the screen causes the user to be distracted, which leads to a user not paying attention to the scenes and potentially missing essential parts of the movie. Systems and methods are disclosed herein for improving emotions conveyed over conventionally translated subtitles by inserting an emoticon or an emoji into a subtitle of a media asset for display as part of the subtitle of the media asset. In order to provide this improvement, a media guidance application identifies text or sound or facial expressions during the scene that relate to an emotion, mentioned in the subtitles of program annotations of a media asset or uttered by actors in the scene. Systems and methods select emoticons or emojis associated with that specific text or sound. Such emoticon or emoji can then be inserted as part of the media asset's subtitle or into the display frame of the scene near the actor making the sound or conveying the emotion. As users generally focus on the subtitles when watching a media asset, an emoticon displayed by the media guidance application as part of the media asset's subtitle can improve a user's consumption of the media asset in an inexpensive and efficient way. Specifically, a media guidance application may obtain an audio portion and subtitle data corresponding to a media asset and identify a keyword from the subtitle data that needs improvement or clarification via an emoticon. The media guidance application may then determine whether the identified keyword relates to an emotion corresponding to an emoticon by searching an emoticon database. In response to determining that the identified keyword relates to the emotion corresponding to the emoticon, the media guidance application may determine a location to insert the emoticon into media asset. In some embodiments, the media guidance application may cause the subtitles and the emoticon to be presented together. The emotion may be determined based on various factors, such as facial and body expressions, words in the dialogue, tone of the dialogue, and background music. The emoticon corresponding to the emotion may be determined based on various factors. For example, in one embodiment, the emoticon may be chosen based on available display capacity at the subtitle region within a media asset's video frame. In another example, the emoticon may be selected from the emoticon database based on semantic context matching of the subtitle data, as further described below. Further, the media guidance application may then generate for display, at the subtitle region or another location of the media asset's video frame, the first subtitle data including the determined emoticon. By inserting emoticons based on the keywords and other factors, the media guidance application may more precisely convey the emotion than conventional systems that rely on simple translations in the subtitle data. The media guidance application may insert emoticons into the subtitles of a live program or a previously stored media asset. Specifically, the media guidance application may obtain media guidance data indicating the availability of a plurality of media assets and dete