US-12621541-B2 - Generating a collaborative interleaved content series

US12621541B2US 12621541 B2US12621541 B2US 12621541B2US-12621541-B2

Abstract

A collaborative content generation system uses machine learning to generate a script and content depicting the performance of the script. A director may use the system to generate the script and optionally, may involve one or more collaborators who perform portions of the script. The system may use machine learning to generate or modify a script, a storyboard to visualize the story, a narrator (e.g., the narrator's voice), characters, music, sound effects, etc. A director may assign portions of the script to certain collaborators and select which of their recordings are interleaved into the final collaborative interleaved content series. The collaborators may independently perform their portions and provide clips of their performances to the system, which may then interleave the clips to produce the finalized content.

Inventors

Brent Hurley
Chad Hurley
Raina Plom
Reed Martin

Assignees

EyeTell, Inc.

Dates

Publication Date: 20260505
Application Date: 20250610

Claims (18)

1 . A non-transitory computer-readable medium comprising instructions, the instructions, when executed by a computer system, causing the computer system to perform operations including: accessing a structure parameter characterizing a genre of content; accessing descriptive parameters characterizing a plot of the content; generating a first prompt for a first machine learning model to request a script for the content, the first prompt specifying the accessed structure parameter and the descriptive parameters, wherein the first prompt further requests character descriptions for characters in the script, dialogue for the characters, and one or more transition scene descriptions; receiving, as an output from the first machine learning model, the script, wherein the script comprises the dialogue for the characters and the one or more transition scene descriptions; generating a second prompt for a second machine learning model to request digital augmentation for the characters, wherein the digital augmentation comprises one or more of a visual or audio enhancement; receiving as an output from the second machine learning model, the digital augmentation; transmitting, to collaborator client devices, the script and the digital augmentation, wherein: the script is transmitted to the collaborator client devices according to accessed staffing instructions for the script; and the transmitted script appears at the collaborator client devices with respective visual indicators indicating which portions of the script are assigned to a corresponding collaborator; receiving, from the collaborator client devices, recordings of collaborators performing the respective portions of the script, wherein the recordings include an overlay of the digital augmentation; and interleaving the recordings into a collaborative interleaved content series for transmission to a viewer client device.
2 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: classifying, using a third machine learning model trained using previously generated scripts and corresponding parameters used to generate the previously generated scripts, the accessed structure parameter and the accessed descriptive parameters as sufficient to generate the prompt for the first machine learning model to request the script for the content.
3 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: generating a third prompt for the second machine learning model to request a sound effect for the content, wherein the third prompt includes at least a portion of the script in which the requested sound effect is featured.
4 . The non-transitory computer-readable medium of claim 3 , the operations further comprising: receiving audio of a user reading aloud the portion of the script; and detecting a manually produced sound effect in the audio; wherein the third prompt includes an instruction that the requested sound effect be based on the manually produced sound effect.
5 . The non-transitory computer-readable medium of claim 1 , wherein the output from the second machine learning model is a first output of the second machine learning model, the operations further comprising: receiving a shared video and a prior prompt used to generate the shared video; modifying the second prompt using the prior prompt, wherein the modified second prompt requests an updated digital augmentation for one of the characters to have an appearance of a character in the shared video; and receiving as a second output from the second machine learning model, the updated digital augmentation.
6 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: generating a third prompt for a third machine learning model to request storyboard images, the third prompt specifying the character descriptions for characters in the script and the one or more transition scene descriptions; receiving, as an output from the third machine learning model, the storyboard images; and causing the storyboard images and corresponding dialogue for the characters to be displayed.
7 . The non-transitory computer-readable medium of claim 6 , the operations further comprising: receiving, from a client device, a selection one of the storyboard images; and causing a portion of the third prompt to be displayed, wherein the portion of the third prompt caused the third machine learning model to generate the selected storyboard image.
8 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: receiving a reference image of a desired character of the characters; and providing the second prompt and the reference image to the second machine learning model; wherein the digital augmentation received as the output from the second machine learning model includes an appearance of the desired character that is based upon the reference image.
9 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: receiving cinematography instructions comprising a desired camera angle for depicting at least one of the characters, wherein the first prompt for the first machine learning model further requests a scene of the content be captured at the desired camera angle.
10 . The non-transitory computer-readable medium of claim 1 , wherein the digital augmentation for the characters requested in the second prompt includes an audio enhancement to a voice associated with a line of the script.
11 . The non-transitory computer-readable medium of claim 10 , wherein the voice is a voice of a computer-generated narrator.
12 . The non-transitory computer-readable medium of claim 10 , wherein the voice is a voice of a collaborator.
13 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: accessing collaborator characteristics; applying the collaborator characteristics to a second machine learning model, the second machine learning model trained to determine a likelihood that a given collaborator is suited to perform a character in the script; and determining the staffing instructions for the script based on the output of the second machine learning model.
14 . The non-transitory computer-readable medium of claim 13 , the operations further comprising: accessing an external database of actor characteristics, the actor characteristics describing one or more of physical appearances of actors or filmography of the actors; accessing an external database of scripts, the actors assigned to characters in the scripts; creating a training set based on the actor characteristics and the scripts; and training the second machine learning model using the training set.
15 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: generating recommendations for editing the recordings using a third machine learning model trained to determine a likelihood that the recordings satisfy preferences of a director; and generate an alternative version of one of the recordings using a diffusion model.
16 . The non-transitory computer-readable medium of claim 1 , the operations further comprising: receiving an image from a collaborator client device, the image depicting an environment surrounding the collaborator client device; and generating a prompt for a generative AI video model to request a non-collaborator video clip based on the script and the image, wherein the non-collaborator video clip depicts the environment.
17 . A system comprising: a computer system; and a non-transitory computer-readable medium comprising instructions, the instructions, when executed by the computer system, causing the computer system to perform operations including: accessing a structure parameter characterizing a genre of content; accessing descriptive parameters characterizing a plot of the content; generating a prompt for a first machine learning model to request a script for the content, the prompt specifying the accessed structure parameter and the descriptive parameters, wherein the prompt further requests character descriptions for characters in the script, dialogue for the characters, and one or more transition scene descriptions; receiving, as an output from the first machine learning model, the script, wherein the script comprises the dialogue for the characters and the one or more transition scene descriptions; generating a prompt for a second machine learning model to request digital augmentation for the characters; receiving as an output from the second machine learning model, the digital augmentation; transmitting, to collaborator client devices, the script and the digital augmentation, wherein: the script is transmitted to the collaborator client devices according to accessed staffing instructions for the script; and the transmitted script appears at the collaborator client devices with respective visual indicators indicating which portions of the script are assigned to a corresponding collaborator; receiving, from the collaborator client devices, recordings of collaborators performing the respective portions of the script, wherein the recordings include an overlay of the digital augmentation; and interleaving the recordings into a collaborative interleaved content series for transmission to a viewer client device.
18 . The system of claim 17 , wherein the output from the second machine learning model is a first output of the second machine learning model, the operations further comprising: receiving a shared video and a prior prompt used to generate the shared video; modifying the second prompt using the prior prompt, wherein the modified second prompt requests an updated digital augmentation for one of the characters to have an appearance of a character in the shared video; and receiving as a second output from the second machine learning model, the updated digital augmentation.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 63/658,383, filed Jun. 10, 2024, which is incorporated by reference in its entirety. TECHNICAL FIELD The disclosure generally relates to digital media generation, and more specifically to generating collaborative, interleaved content series using machine learning. BACKGROUND The widespread accessibility of cameras and microphones has greatly increased content production. Anyone with a smartphone can become a director or actor. While the number of directors and actors have increased, conventional media generation systems targeted at home users lack tools that the traditional production studios possess. One conventional media generation system allows a user to record their own video of a movie scene and dub over the original actor's line while another conventional system facilitates karaoke for an existing song. Because these conventional systems lack sufficient content collaboration and production tools, they leave minimal room for the users to engage with the technology to have creative direction over the content they generate. Additionally, these conventional systems rely on existing content without user customization to create unique content. SUMMARY In some aspects, the techniques described herein relate to a non-transitory computer-readable medium including instructions, the instructions, when executed by a computer system, causing the computer system to perform operations including: accessing a structure parameter characterizing a genre of content; accessing descriptive parameters characterizing a plot of the content; generating a first prompt for a first machine learning model to request a script for the content, the first prompt specifying the accessed structure parameter and descriptive parameters, wherein the first prompt further requests character descriptions for characters in the script, dialogue for the characters, and one or more transition scene descriptions; receiving, as an output from the first machine learning model, the script, wherein the script includes the dialogue for the characters and the one or more transition scene descriptions; generating a second prompt for a second machine learning model to request digital augmentation for the characters, wherein the digital augmentation includes one or more of a visual or audio enhancement; receiving as an output from the second machine learning model, the digital augmentation; and transmitting the script and the digital augmentation to at least one client device. In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, the operations further including: classifying, using a third machine learning model trained using previously generated scripts and corresponding parameters used to generate the previously generated scripts, the accessed structure parameter and the accessed descriptive parameters as sufficient to generate the prompt for the first machine learning model to request the script for the content. In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, the operations further including: generating a third prompt for the second machine learning model to request a sound effect for the content, wherein the third prompt includes at least a portion of the script in which the requested sound effect is featured. In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, the operations further including: receiving audio of a user reading aloud the portion of the script; and detecting a manually produced sound effect in the audio; wherein the third prompt includes an instruction that the requested sound effect be based on the manually produced sound effect. In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the output from the second machine learning model is a first output of the second machine learning model, the operations further including: receiving a shared video and a prior prompt used to generate the shared video; modifying the second prompt using the prior prompt, wherein the modified second prompt requests an updated digital augmentation for one of the characters to have an appearance of a character in the shared video; and receiving as a second output from the second machine learning model, the updated digital augmentation. In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, the operations further including: generating a third prompt for a third machine learning model to request storyboard images, the third prompt specifying the character descriptions for characters in the script and the one or more transition scene descriptions; receiving, as an output from the third machine learning model, the storyboard images; and causing the storyboard images and corresponding dialogue for the cha