KR-102962773-B1 - Automatic video editing method with detecting shot transition points and recommending next camera video

KR102962773B1KR 102962773 B1KR102962773 B1KR 102962773B1KR-102962773-B1

Abstract

A method for automatic video editing through the detection of screen transition points and the recommendation of connected camera footage is provided. The automatic video editing method according to an embodiment of the present invention detects screen transition points from multiple camera footages of the same scene and recommends camera footage to compose the video from the detected screen transition point to the next screen transition point. By doing so, screen transition points are automatically detected based on artificial intelligence, and camera footage to be newly connected at the detected screen transition points is automatically recommended, thereby significantly reducing the effort and time required by an editor for video content production.

Inventors

송혁
고민수

Assignees

한국전자기술연구원

Dates

Publication Date: 20260511
Application Date: 20231211

Claims (12)

A step of receiving multiple camera images of the same scene; Step of extracting features from camera images; A step of detecting screen transition points from extracted features; A step of recommending camera images to constitute the video from the detected screen transition point to the next screen transition point; is included, The recommendation level is, A video automatic editing method characterized by recommending a camera video using a machine learning model trained to recommend a new camera video to connect at a screen transition point by receiving features extracted from camera images prior to a screen transition point and features extracted from camera images after a screen transition point as input.
In claim 1, The input step is, An automatic video editing method characterized by receiving multiple camera images of a filming site from different angles.
In claim 1, The detection step is, An automatic video editing method characterized by detecting whether a screen transition occurs for each camera image, and detecting a screen transition point when a screen transition is detected in a number of camera images greater than a predetermined number.
delete
In claim 1, The detection step is, A video automatic editing method characterized by detecting whether a screen transition occurs from extracted features using a machine learning model trained to estimate whether a screen transition occurs from extracted features.
In claim 1, The features are, A video automatic editing method characterized by including dialogue features, behavior features, reaction features, and background features.
delete
In claim 1, The recommendation level is, An automatic video editing method characterized by recommending multiple camera images based on the accuracy of a machine learning model.
In claim 1, A video automatic editing method characterized by further including the step of synthesizing detected screen transition points throughout the entire video segment and synthesizing recommended camera image information at the screen transition points to generate automatic editing information for video content.
Input unit that receives multiple camera images of the same scene; An extraction unit that extracts features from input camera images; A detection unit that detects screen transition points from extracted features; A recommendation unit that recommends camera images to constitute the video from the detected screen transition point to the next screen transition point; is included, The recommendation department, An automatic video editing system characterized by recommending camera images using a machine learning model trained to recommend a new camera image to connect at a screen transition point by receiving features extracted from camera images prior to a screen transition point and features extracted from camera images after a screen transition point as input.
A step of detecting screen transition points from features extracted from multiple camera images of the same scene; A step of recommending camera images to compose the video from the detected screen transition point to the next screen transition point; The method includes the step of generating automatic editing information by combining detected screen transition points and recommended camera image information; The recommendation level is, A video automatic editing method characterized by recommending a camera video using a machine learning model trained to recommend a new camera video to connect at a screen transition point by receiving features extracted from camera images prior to a screen transition point and features extracted from camera images after a screen transition point as input.
A detection unit that detects screen transition points from features extracted from multiple camera images of the same scene; A recommendation unit that recommends camera images to compose the video from the detected screen transition point to the next screen transition point; A generation unit that generates automatic editing information by combining detected screen transition points and recommended camera image information; The recommendation department, An automatic video editing system characterized by recommending camera images using a machine learning model trained to recommend a new camera image to connect at a screen transition point by receiving features extracted from camera images prior to a screen transition point and features extracted from camera images after a screen transition point as input.

Description

Automatic video editing method with detecting shot transition points and recommending next camera video The present invention relates to video editing technology, and more specifically, to a video editing method for producing a single video content by selectively connecting camera images acquired by filming a shooting site from various angles. To produce video content, the editing process must first involve capturing the filming location from various angles to acquire multi-channel video, having the editor repeatedly review all acquired multi-channel videos to decide on screen transitions when deemed necessary, and then selecting the camera footage to connect. Since scene transitions require understanding the context, such as changes in content, shifts in speakers, and the duration of the main camera, editors must carefully review and evaluate the footage. However, since this editing process requires a significant amount of effort and time from the editor, it becomes a major obstacle in video content production, and this difficulty increases in proportion to the shooting time and the number of camera channels. FIG. 1 is a video automatic editing system according to one embodiment of the present invention, FIG. 2 is a video automatic editing method according to another embodiment of the present invention. The present invention will be described in more detail below with reference to the drawings. An embodiment of the present invention presents an automatic video editing method through screen transition point detection and recommendation of connected camera footage. It is a technology that automatically detects screen transition points by analyzing multi-channel camera footage acquired by capturing a shooting site from various angles using artificial intelligence, and automatically recommends new camera footage to connect at the detected screen transition points. Unlike the conventional method in which screen transition timing and connected camera footage are determined solely by the editor's judgment, in the embodiment of the present invention, these tasks are performed automatically, thereby drastically reducing the editor's effort and time required for video content production. FIG. 1 is a diagram illustrating the configuration of a video automatic editing system according to an embodiment of the present invention. The video automatic editing system according to an embodiment of the present invention is configured to include a video input unit (110), a video feature extraction unit (120), a screen transition detection unit (130), a connected camera recommendation unit (140), and an editing information generation unit (150). The video input unit (110) receives multiple camera images of the shooting site from different angles in multiple channels. The image feature extraction unit (120) extracts features from multi-channel camera images input from the image input unit (110). Feature extraction is performed on a camera image basis. If the camera image is N-channel, features are extracted from the camera image of the first channel, features are extracted from the camera image of the second channel, ..., features are extracted from the camera image of the Nth channel. Meanwhile, the extracted features include conversational features, behavioral features, and reaction features, and it is also possible to include background features as needed. Feature extraction can be performed using a machine learning model trained to extract features from input camera images. In this case, the machine learning model can be implemented separately for each feature. For instance, separate machine learning models can be configured for extracting dialogue features, behavior features, reaction features, and background features. The screen transition detection unit (130) detects the screen transition time based on features extracted by the image feature extraction unit (120). Specifically, the screen transition detection unit (130) detects whether a screen transition occurs for each multi-channel camera image, and if a screen transition is detected in more than a predetermined number of camera images, such as more than half of the camera images, it detects it as the screen transition time. Whether the screen changes can be detected using a machine learning model trained to estimate whether the screen changes by receiving features extracted by the image feature extraction unit (120) as a time series input. The connected camera recommendation unit (140) recommends a camera video to compose the video from the screen transition point detected by the screen transition detection unit (130) to the next screen transition point. It recommends a main camera video to continue the video content from the screen transition point. For camera video recommendation, a Connected Camera Recommendation Model can be utilized. This is a machine learning model trained to recommend the new camera video to connect at the point of screen transition, by receivi