EP-4740129-A1 - TECHNICAL ARCHITECTURES FOR MEDIA CONTENT EDITING USING MACHINE LEARNING

EP4740129A1EP 4740129 A1EP4740129 A1EP 4740129A1EP-4740129-A1

Abstract

Examples are provided relating to media content editing architectures utilizing machine learning techniques One aspect includes a method for media content editing, the method comprising: receiving a media content from a user; receiving an editing request for the media content from the user; and editing the media content based on the editing request to generate edited media content by: retrieving a prompt from a prompt pool, wherein the retrieved prompt is selected based on the editing request; parsing the retrieved prompt and the editing request using a large language model to generate one or more editing actions to be performed on the media content; and performing the one or more editing actions on the media content to generate the edited media content.

Inventors

CHEN, FAN
WONG, Kin Chung

Assignees

Lemon Inc.

Dates

Publication Date: 20260513
Application Date: 20240628

Claims (20)

1. A method for media content editing, the method comprising: receiving a media content from a user, receiving an editing request for the media content from the user; and editing the media content based on the editing request to generate edited media content by: retrieving a prompt from a prompt pool, wherein the retrieved prompt is selected based on the editing request; parsing the retrieved prompt and the editing request using a large language model to generate one or more editing actions to be performed on the media content; and performing the one or more editing actions on the media content to generate the edited media content.
2. The method of claim 1, wherein performing the one or more editing actions comprises performing application programming interface calls provided by a back-end tool service comprising a plurality of editing tools, wherein each application programming interface call corresponds to a respective editing tool in the plurality of editing tools.
3. The method of claim 2, wherein each editing tool in the plurality of editing tools corresponds to one or more prompts in the prompt pool.
4. The method of claim 2, wherein the plurality of editing tools is organized into a plurality of groupings, and wherein the prompt pool is generated based at least in part on the plurality of groupings.
5. The method of claim 1, further comprising rendering and displaying the edited media content to the user; and receiving a second editing request.
6. The method of claim 5, wherein the second editing request comprises a request to revert the performed one or more editing actions.
7. The method of claim 1, further comprising storing contextual information relating to the editing of the media content.
8. The method of claim 7, wherein the contextual information comprises one or more of conversation history, editing context, or editing draft history.
9. The method of claim 8, further comprising refining the prompt pool based on the contextual information.
10. The method of claim 1, wherein editing the media content further comprises: providing a dialog reply to the user, wherein the dialog reply is generated by the large language model in response to the retrieved prompt and the editing request; and receiving a dialog response from the user in response to the dialog reply.
11. A computing device for media content editing, the computing device comprising: a processor and memory of a computing device, the processor being configured to execute a program using portions of the memory to: receive a media content from a user: receive an editing request for the media content from the user; and edit the media content based on the editing request to generate edited media content by: retrieving a prompt from a prompt pool, wherein the retrieved prompt is selected based on the editing request; parsing the retrieved prompt and the editing request using a large language model to generate one or more editing actions to be performed on the media content; and performing the one or more editing actions on the media content to generate the edited media content.
12. The computing device of claim 11, wherein performing the one or more editing actions comprises performing application programming interface calls provided by a back-end tool service comprising a plurality of editing tools, wherein each application programming interface call corresponds to a respective editing tool in the plurality of editing tools.
13. The computing device of claim 11, wherein: each editing tool in the plurality of editing tools corresponds to one or more prompts in the prompt pool; the plurality of editing tools is organized into a plurality of groupings; and the prompt pool is generated based at least in part on the plurality of groupings.
14. The computing device of claim 11, wherein the processor is further configured to store contextual information relating to the editing of the media content, wherein the contextual information comprises one or more of conversation history, editing context, or editing draft history.
15. The computing device of claim 11, wherein editing the media content further comprises: providing a dialog reply to the user, wherein the dialog reply is generated by the large language model in response to the retrieved prompt and the editing request; and receiving a dialog response from the user in response to the dialog reply.
16. A computing system for media content editing, the computing system comprising: a display; a back-end tool service comprising a prompt pool, a plurality of editing tools, and a plurality of application programming interfaces, each application programming interface corresponding to an editing tool in the plurality of editing tools; a processor and memory of a computing device, the processor being configured to execute a program using portions of the memory to: receive a media content from a user: receive an editing request for the media content from the user; edit the media content based on the editing request to generate edited media content by: retrieving a prompt from the prompt pool, wherein the retrieved prompt is selected based on the editing request; parsing the retrieved prompt and the editing request using one or more large language models to generate one or more editing actions to be performed on the media content; and performing the one or more editing actions on the media content by calling at least one application programming interface in the plurality of application programming interfaces to generate the edited media content; render and display the edited media content using the display through a dialog assisted editing interface.
17. The computing system of claim 16, wherein the one or more large language models comprises a plurality of large language models, each trained for at least one task, and wherein the processor is configured to select a large language model from the plurality of large language models to parse the retrieved prompt and the editing request.
18. The computing system of claim 16, wherein: each editing tool in the plurality of editing tools corresponds to one or more prompts in the prompt pool; the plurality of editing tools is organized into a plurality of groupings; and prompts in the prompt pool are generated based at least in part on the plurality of groupings.
19. The computing system of claim 16, wherein the processor is further configured to store contextual information relating to the editing of the media content, wherein the contextual information comprises one or more of conversation history, editing context, or editing draft history.
20. A non-transitory computer readable medium for media content editing, the non- transitoiy computer readable medium comprising instructions that, when executed by a computing device, cause the computing device to implement the method of claim 1 .

Description

TECHNICAL ARCHITECTURES FOR MEDIA CONTENT EDITING USING MACHINE LEARNING CROSS REFERENCE TO RELATED APPLICATIONS [0001] This application claims priority to U. S. Application Ser No. 18/346,727, filed July 3, 2023, and titled “TECHNICAL ARCHITECTURES FOR MEDIA CONTENT EDITING USING MACHINE LEARNING”, the disclosure of which is incorporated herein by reference in its entirety. BACKGROUND [0002] Raw media content in its original recorded form is typically edited before publication to enhance its appeal for better viewer engagement. Editing media content (e g., images, audios, videos, and other modalities) typically involves the use of software with editing capabilities provided in the form of editing tools. Edits to media content can include a wide range of manipulations and modifications. For example, in the context of video editing, edits can include trimming segments, re-sequencing segments, adjusting playback speed, embedding content such as special effects and caption text, adjusting audio, cropping, etc. Additionally, the use of powerful editing software enables non-linear editing (NLE) systems where multiple edits are performed on raw media content in a non-destructive process such that the original data can be recovered - i.e., the edits can be reversed. SUMMARY [0003] Examples are provided relating to media content editing architectures utilizing machine learning techniques. One aspect includes a method for media content editing, the method comprising: receiving a media content from a user; receiving an editing request for the media content from the user; and editing the media content based on the editing request to generate edited media content by: retrieving a prompt from a prompt pool, wherein the retrieved prompt is selected based on the editing request; parsing the retrieved prompt and the editing request using a large language model to generate one or more editing actions to be performed on the media content; and performing the one or more editing actions on the media content to generate the edited media content. [0004] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure. BRIEF DESCRIPTION OF THE DRAWINGS [0005] FIG. 1 shows a block diagram model describing a general pipeline and various components of an example technical architecture for implementing a media content editing application [0006] FIG. 2 is a block diagram model illustrating an example back-end tool service for providing editing tools and editing capabilities, which can be implemented in the general pipeline described in FIG. 1. [0007] FIG. 3 is a block diagram model illustrating an example use of contextual memory 302 in a media content editing architecture, which can be implemented in the general pipeline described in FIG. 1. [0008] FIG. 4 is a block diagram model illustrating an example system evolving and refinement application for a media content editing architecture, which can be implemented in the general pipeline described in FIG. 1. [0009] FIG. 5 is a block diagram model illustrating an example media content editing model architecture with a system evolving and refinement process, which provides a detailed illustration of the general pipeline described in FIG. 1 [0010] FIG. 6 is a flow chart illustrating an example method for a media content editing process using machine learning techniques, which can be implemented using the technical architecture of FIG. 1. |0011| FIG. 7 is a flow chart illustrating an example method for refining a media content editing architecture, which can be implemented using the technical architecture of FIG. 1. [0012] FIG. 8 schematically shows a non-limiting embodiment of a computing system that can enact one or more of the methods and processes described above. DETAILED DESCRIPTION [0013] Media content editing software capable of providing powerful editing tools is widely available for commercial and personal uses. Typically, content editing software involves the use of a user interface (UI) with various sections, menus, buttons, etc. for navigating and selecting the desired editing tool. These technologies have grown over time to provide a vast array of tools for performing numerous editing tasks. However, software with more powerful editing capabilities and functionalities will naturally result in more complexity. As a result, many features remain unexplored for the typical user. Complex UI navigation, a lack of knowledge in the software’s capabilities, and difficulty in utilizing said capabilities can all contribute to the underutilization of editing software. For examp