US-12619660-B2 - Performing a semantic search on a digital media database

US12619660B2US 12619660 B2US12619660 B2US 12619660B2US-12619660-B2

Abstract

The system obtains, from a database, multiple videos and multiple metadata corresponding to the multiple videos. The metadata includes tags generated by a large language model to describe a video. The system converts the metadata into a first vector in a multidimensional space. The first vector encodes information included in the tags as a numerical representation in the multidimensional space. A distance between the first vector and a second vector indicates similarity between the metadata represented by the first vector and a second metadata represented by the second vector. The system obtains a natural language query (NLQ) associated with the database, and converts the NLQ into a third vector in the multidimensional space. The system determines whether the first vector satisfies a distance threshold to the third vector. Upon determining that the first vector satisfies the distance threshold, the system presents the video as a result to the NLQ.

Inventors

Harshith BARIKI
Hardik Radia
Latchmana Kumar M R
Raghavendra Somanath

Assignees

DISH NETWORK TECHNOLOGIES INDIA PRIVATE LIMITED

Dates

Publication Date: 20260505
Application Date: 20231229

Claims (20)

1 . A non-transitory, computer-readable storage medium comprising instructions recorded thereon, wherein the instructions, when executed by at least one data processor of a system, cause the system to: obtain, from a database, multiple video data and multiple metadata associated with the multiple video data, wherein a video data among the multiple video data includes a metadata among the multiple metadata, wherein the video data is associated with a video, wherein the metadata among the multiple metadata includes multiple tags generated by a large language model to describe the video and a transcript associated with the video; enable a semantic search of the multiple video data to obtain multiple multidimensional vectors by performing steps for each video data among the multiple video data comprising: combining and converting the multiple tags and the transcript associated with the video into a first multidimensional vector in a multidimensional space, wherein the first multidimensional vector encodes information included in the multiple tags and the transcript associated with the video as a numerical representation in the multidimensional space, wherein a distance between the first multidimensional vector and a second multidimensional vector in the multidimensional space indicates similarity between the metadata associated with the first multidimensional vector and a second metadata represented by the second multidimensional vector; store the first multidimensional vector in the database, wherein the first multidimensional vector corresponds to the each video data stored in the database; obtain a natural language query associated with the database; convert the natural language query into a third multidimensional vector in the multidimensional space; perform the semantic search of the multiple video data by obtaining, among the multiple multidimensional vectors, a subset of multidimensional vectors satisfying a distance threshold to the third multidimensional vector, wherein the subset of multidimensional vectors corresponds to a subset of video data among the multiple video data; and present the subset of video data as a result to the natural language query.
2 . The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: obtain, from the database storing the multiple video data, the metadata associated with the video data, wherein the metadata includes at least three of: a title associated with the video data, a subtitle associated with the video data, a description associated with the video data, a genre associated with the video data, or a performer associated with the video data; extract, from the video, an audio and a closed caption data; provide the audio, the closed caption data, a prompt, and the metadata to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the video, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the video; and store the multiple tags in the database by adding the multiple tags to the metadata associated with the video.
3 . The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: determine multiple distances between each multidimensional vector among the multiple multidimensional vectors and the third multidimensional vector using a cosine similarity between each multidimensional vector among the multiple multidimensional vectors and the third multidimensional vector.
4 . The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: extract, from the video, an audio and a closed caption data; provide the audio, the closed caption data, a prompt, and the video to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the video, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the video; and store the multiple tags in the database by adding the multiple tags to the metadata associated with the video.
5 . The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: determine multiple distances between each multidimensional vector among the multiple multidimensional vectors and the third multidimensional vector using a cosine similarity between each multidimensional vector among the multiple multidimensional vectors and the third multidimensional vector; and rank the multiple multidimensional vectors using the multiple distances.
6 . The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: obtain a social media tag associated with the video; extract, from the video, an audio and a closed caption data; provide the audio, the closed caption data, the social media tag, and a prompt to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the video, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the video; and store the multiple tags in the database by adding the multiple tags to the metadata associated with the video.
7 . The non-transitory, computer-readable storage medium of claim 1 , comprising the database storing live television, streaming video content, transactional video on demand, and subscription video on demand.
8 . A method comprising: obtaining, from a database, multiple digital media and multiple metadata associated with the multiple digital media, wherein a digital medium among the multiple digital media includes a metadata among the multiple metadata, wherein the metadata among the multiple metadata includes multiple tags generated by a large language model to describe the digital medium and a transcript associated with the digital medium; enabling a semantic search of the multiple digital media to obtain multiple multidimensional vectors by: combining and converting the multiple tags and the transcript associated with the digital medium into a first multidimensional vector in a multidimensional space, wherein the first multidimensional vector encodes information included in the multiple tags and the transcript associated with the digital medium as a numerical representation in the multidimensional space, wherein a distance between the first multidimensional vector and a second multidimensional vector in the multidimensional space indicates similarity between the metadata associated with the first multidimensional vector and a second metadata represented by the second multidimensional vector; obtaining a natural language query associated with the database; converting the natural language query into a third multidimensional vector in the multidimensional space; perform the semantic search of the multiple digital media by: determining whether the first multidimensional vector satisfies a distance threshold to the third multidimensional vector; and upon determining that the first multidimensional vector satisfies the distance threshold, presenting the digital medium as a result to the natural language query.
9 . The method of claim 8 , comprising: obtaining, from the database storing the multiple digital media, the metadata associated with the digital medium, wherein the metadata includes at least three of: a title associated with the digital medium, a subtitle associated with the digital medium, a description associated with the digital medium, a genre associated with the digital medium, or a performer associated with the digital medium; extracting, from the digital medium, an audio and a closed caption data; providing the audio, the closed caption data, a prompt, and the metadata to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the digital medium, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the digital medium; and storing the multiple tags in the database by adding the multiple tags to the metadata associated with the digital medium.
10 . The method of claim 8 , comprising: determining a distance between the first multidimensional vector and the third multidimensional vector using a cosine similarity.
11 . The method of claim 8 , comprising: extracting, from the digital medium, an audio and a closed caption data; providing the audio, the closed caption data, a prompt, and the digital medium to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the digital medium, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the digital medium; and storing the multiple tags in the database by adding the multiple tags to the metadata associated with the digital medium.
12 . The method of claim 8 , comprising: obtaining a user profile associated with a database storing multiple digital media, wherein the user profile includes an indication of a digital medium for which a user viewed a short description, a digital medium of which the user viewed a portion, and a digital medium which the user viewed from beginning to end; and encoding the user profile and the natural language query into the third multidimensional vector.
13 . The method of claim 8 , comprising: obtaining a social media tag associated with the digital medium; extracting, from the digital medium, an audio and a closed caption data; providing the audio, the closed caption data, the social media tag, and a prompt to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the digital medium, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the digital medium; and storing the multiple tags in the database by adding the multiple tags to the metadata associated with the digital medium.
14 . The method of claim 8 , comprising the database storing live television, a streaming digital medium, a transactional digital medium on demand, and a subscription digital medium on demand.
15 . A system comprising: at least one hardware processor; and at least one non-transitory memory storing instructions, which, when executed by the at least one hardware processor, cause the system to: obtain, from a database, multiple digital media and multiple metadata associated with the multiple digital media, wherein a digital medium among the multiple digital media includes a metadata among the multiple metadata, wherein the metadata among the multiple metadata includes multiple tags generated by a large language model to describe the digital medium and a transcript associated with the digital medium; enable a semantic search of the multiple digital media to obtain multiple multidimensional vectors by: combining and converting the multiple tags and the transcript associated with the digital medium into a first multidimensional vector in a multidimensional space, wherein the first multidimensional vector encodes information included in the multiple tags and the transcript associated with the digital medium as a numerical representation in the multidimensional space, wherein a distance between the first multidimensional vector and a second multidimensional vector in the multidimensional space indicates similarity between the metadata associated with the first multidimensional vector and a second metadata represented by the second multidimensional vector; obtain a natural language query associated with the database; convert the natural language query into a third multidimensional vector in the multidimensional space; perform the semantic search of the multiple digital media by: determining whether the first multidimensional vector satisfies a distance threshold to the third multidimensional vector; and upon determining that the first multidimensional vector satisfies the distance threshold, presenting the digital medium as a result to the natural language query.
16 . The system of claim 15 , comprising instructions to: obtain, from the database storing the multiple digital media, the metadata associated with the digital medium, wherein the metadata includes at least three of: a title associated with the digital medium, a subtitle associated with the digital medium, a description associated with the digital medium, a genre associated with the digital medium, or a performer associated with the digital medium; extract, from the digital medium, an audio and a closed caption data; provide the audio, the closed caption data, a prompt, and the metadata to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the digital medium, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the digital medium; and store the multiple tags in the database by adding the multiple tags to the metadata associated with the digital medium.
17 . The system of claim 15 , comprising instructions to: determine a distance between the first multidimensional vector and the third multidimensional vector using a cosine similarity.
18 . The system of claim 15 , comprising instructions to: extract, from the digital medium, an audio and a closed caption data; provide the audio, the closed caption data, a prompt, and the digital medium to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the digital medium, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the digital medium; and store the multiple tags in the database by adding the multiple tags to the metadata associated with the digital medium.
19 . The system of claim 15 , comprising instructions to: obtain a social media tag associated with the digital medium; extract, from the digital medium, an audio and a closed caption data; provide the audio, the closed caption data, the social media tag, and a prompt to the large language model, wherein the prompt requests the multiple tags based on the audio, the closed caption data, and the digital medium, wherein a tag among the multiple tags includes a natural language text indicating a property associated with the digital medium; and store the multiple tags in the database by adding the multiple tags to the metadata associated with the digital medium.
20 . The system of claim 15 , comprising the database storing live television, a streaming digital medium, a transactional digital medium on demand, and a subscription digital medium on demand.

Description

BACKGROUND Third-party providers, such as Gracenote, provide software and metadata to businesses which enable their users to manage and search digital media. Service providers, such as video streaming providers, can enable users to search the digital media, such as video, using the metadata provided by the third-party provider. However, the metadata provided by the third-party provider can be limited, and may not encompass every aspect of the digital media. BRIEF DESCRIPTION OF THE DRAWINGS Detailed descriptions of implementations of the present invention will be described and explained through the use of the accompanying drawings. FIG. 1 is a block diagram of an example transformer. FIG. 2 shows a system to enable a more accurate search of a digital media database. FIG. 3 shows filtering of tags generated by a large language model. FIG. 4 is a flowchart of a method to enable a more accurate search of a digital media database. FIG. 5 shows a system to perform a semantic search on a digital media database. FIG. 6 shows a system to answer a natural language input using multidimensional vectors. FIG. 7 is a flowchart of a method to perform a semantic search on a digital media database. FIG. 8 shows a system to proactively suggest videos to a user. FIG. 9 shows an automatically generated ribbon of movie tiles. FIG. 10 is a flowchart of a method to proactively suggest videos to a user. FIG. 11 is a flowchart of a method to automatically generate the ribbon indicating movies of interest to a user. FIG. 12 is a block diagram that illustrates an example of a computer system in which at least some operations described herein can be implemented. The technologies described herein will become more apparent to those skilled in the art from studying the Detailed Description in conjunction with the drawings. Embodiments or implementations describing aspects of the invention are illustrated by way of example, and the same references can indicate similar elements. While the drawings depict various implementations for the purpose of illustration, those skilled in the art will recognize that alternative implementations can be employed without departing from the principles of the present technologies. Accordingly, while specific implementations are shown in the drawings, the technology is amenable to various modifications. DETAILED DESCRIPTION Disclosed here is a system and method to enable a more accurate search of a video, e.g., movie, database. The system obtains, from a database storing multiple videos, a video including metadata associated with the video, where the metadata includes a title associated with the video, and where the database storing multiple videos is configured to support a traditional search using the metadata. The traditional search can be a text search looking for matching strings. The system extracts, from the video, an audio and a closed caption data, and provides the audio, the closed caption data, the title associated with the video, and a prompt to a large language model. The prompt requests from the large language model multiple tags based on the audio, the closed caption data, and the title associated with the video. A tag among the multiple tags includes a natural language text indicating a property associated with the video. The system stores the multiple tags in the database by adding the multiple tags to the metadata associated with the video to obtain new metadata. The system enables a more accurate search of the multiple videos stored in the database by searching the new metadata, which more accurately describes the video. Additionally, disclosed here is a system to perform a semantic search on a video, e.g., movie, database. The system obtains, from the database, multiple videos and multiple metadata associated with the multiple videos, where a video includes a metadata, and where the metadata includes multiple tags generated by a large language model to describe the video. The system performs the following step for each video among the multiple videos to obtain multiple multidimensional vectors. Specifically, the system converts the metadata into a single multidimensional vector in a multidimensional space, where the single multidimensional vector encodes information included in the multiple tags as a numerical representation in the multidimensional space. A distance between the single multidimensional vector and a multidimensional vector A in the multidimensional space indicates similarity between the metadata associated with the single multidimensional vector and a metadata represented by the multidimensional vector A. For example, a distance of zero indicates identity, while increasing distance indicates increasing dissimilarity. The system stores the single multidimensional vector in the database, where the single multidimensional vector corresponds to a single video stored in the database. The system obtains a natural language query associated with the database, and converts the natu