WO-2026092380-A1 - VIDEO GENERATION METHOD AND APPARATUS, MEDIUM, ELECTRONIC DEVICE AND PROGRAM PRODUCT

WO2026092380A1WO 2026092380 A1WO2026092380 A1WO 2026092380A1WO-2026092380-A1

Abstract

The present disclosure relates to a video generation method and apparatus, a medium, an electronic device and a program product. The method comprises: acquiring a first video and a target camera movement type, the target camera movement type carrying camera parameters that correspond to video frames in the first video; on the basis of the camera parameters corresponding to the video frames in the first video, determining a target camera parameter feature, the target camera parameter feature being used for representing the camera parameters that respectively correspond to the video frames in the first video and the time correlation of the camera parameters of different video frames; and on the basis of the first video and the target camera parameter feature, generating a second video involving a camera movement that corresponds to the target camera movement type. The present application not only considers the camera parameters corresponding to video frames in the first video, but also considers the time correlation of the camera parameters of different video frames in the first video, thus accurately controlling the viewing angle of a camera during video generation, and achieving automatic generation of camera movement videos having high-quality camera movement effects.

Inventors

TU, Pengqi

Assignees

北京字跳网络技术有限公司

Dates

Publication Date: 20260507
Application Date: 20251027
Priority Date: 20241029

Claims (11)

A video generation method, comprising: Obtain a first video and a target camera movement type, wherein the target camera movement type carries camera parameters corresponding to video frames in the first video; Based on the camera parameters corresponding to the video frames in the first video, target camera parameter features are determined. The target camera parameter features are used to characterize the camera parameters corresponding to the video frames in the first video and the temporal correlation of camera parameters between different video frames. Based on the first video and the target camera parameter features, a second video with a camera movement corresponding to the target camera movement type is generated.
According to the method of claim 1, wherein determining the target camera parameter features based on the camera parameters corresponding to the video frames in the first video includes: Based on the camera parameters corresponding to the video frames in the first video, determine the camera parameter embedding representation; The target camera parameter features are obtained by encoding the camera parameter embedding representation using an encoder based on a temporal attention mechanism.
According to the method of claim 2, wherein encoding the camera parameter embedding representation by an encoder based on a temporal attention mechanism to obtain the target camera parameter features includes: The camera parameter embedding representation is encoded by convolutional layers of different scales in the encoder to obtain a first feature of the corresponding scale output by the convolutional layer. The first feature is used to characterize the camera parameters corresponding to the video frame in the first video. The first feature is encoded by the temporal attention layer corresponding to the convolutional layer in the encoder to obtain the second feature of the corresponding scale output by the temporal attention layer. The second feature is used to characterize the temporal correlation of camera parameters between different video frames in the first video. The target camera parameter features are obtained based on all the first features and all the second features.
The method according to any one of claims 1-3, wherein generating a second video having a camera movement corresponding to the target camera movement type based on the first video and the target camera parameter features comprises: Determine the image embedding vector corresponding to the image features of the first video, and the camera parameter embedding vector corresponding to the target camera parameter features; A second video carrying a camera movement corresponding to the target camera movement type is generated by performing a diffusion process based on the image embedding vector and the camera parameter embedding vector using a pre-generated video generation model. The diffusion process includes a noise-adding process and a noise-reducing process. The camera parameter embedding vector is injected into the temporal attention layer involved in the noise reduction process of the video generation model in a manner based on a temporal attention mechanism. The noise-adding process is used to add noise to the image embedding vector. The noise reduction process is used to generate the denoised second video based on the camera parameter embedding vector and the image embedding vector after the noise-adding process.
According to the method of claim 4, the video generation model is a model generated based on the following method: Obtain a training sample set, which includes multiple sample video data. The sample video data includes sample camera movement videos and sample camera parameters corresponding to sample video frames in the sample camera movement videos. The sample camera movement videos are either camera movement videos that include foreground motion images or camera movement videos that do not include foreground motion images. The initial model is trained based on the training sample set to obtain the video generation model.
The method according to claim 5 further includes: The target sample video data in the training sample set is deleted to obtain an updated training sample set. The sample camera movement video corresponding to the target sample video data is a camera movement video with sudden changes in the content of the scene. The initial model is trained based on the updated training sample set to obtain the video generation model.
The camera parameters according to any one of claims 1-6 include at least one of the following: Camera external parameters; The camera's internal parameters.
A video generation apparatus, comprising: The acquisition module is configured to acquire a first video and a target camera movement type, wherein the target camera movement type carries camera parameters corresponding to video frames in the first video; The determining module is configured to determine target camera parameter features based on the camera parameters corresponding to the video frames in the first video. The target camera parameter features are used to characterize the camera parameters corresponding to the video frames in the first video and the temporal correlation of camera parameters between different video frames. The generation module is configured to generate a second video with a camera movement corresponding to the target camera movement type, based on the first video and the target camera parameter features.
A computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processing device, implements the steps of the method according to any one of claims 1-7.
An electronic device, comprising: A storage device on which computer programs are stored; A processing device for executing the computer program in the storage device to implement the steps of the method according to any one of claims 1-7.
A computer program product includes a computer program, wherein the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1-7.

Description

Video generation methods, apparatus, media, electronic devices and software products Cross-reference of related applications This application claims priority to Chinese Patent Application No. 202411525491.1, filed on October 29, 2024, the disclosure of which is incorporated herein by reference in its entirety. Technical Field This disclosure relates to a video generation method, apparatus, medium, electronic device, and program product. Background Technology In films, television shows, and short videos, camera movements such as rotation and zoom-in are common. These techniques require users to handhold the camera and control its movement to track the subject, creating the effect. This shooting method, often simply called camera movement, demands professional shooting skills to effectively control the speed and stability of the camera movement, resulting in high-quality video with smooth camera movements. For users without these skills, capturing high-quality video with smooth camera movements is extremely difficult. Summary of the Invention This summary section is provided to briefly introduce the concepts, which will be described in detail in the detailed description section below. This summary section is not intended to identify key or essential features of the claimed technical solution, nor is it intended to limit the scope of the claimed technical solution. In a first aspect, this disclosure provides a video generation method, including: Obtain a first video and a target camera movement type, wherein the target camera movement type carries camera parameters corresponding to video frames in the first video; Based on the camera parameters corresponding to the video frames in the first video, target camera parameter features are determined. The target camera parameter features are used to characterize the camera parameters corresponding to the video frames in the first video and the temporal correlation of camera parameters between different video frames. Based on the first video and the target camera parameter features, a second video with a camera movement corresponding to the target camera movement type is generated. Secondly, this disclosure provides a video generation apparatus, comprising: The acquisition module is configured to acquire a first video and a target camera movement type, wherein the target camera movement type carries camera parameters corresponding to video frames in the first video; The determining module is configured to determine target camera parameter features based on the camera parameters corresponding to the video frames in the first video. The target camera parameter features are used to characterize the camera parameters corresponding to the video frames in the first video and the temporal correlation of camera parameters between different video frames. The generation module is configured to generate a second video with a camera movement corresponding to the target camera movement type, based on the first video and the target camera parameter features. Thirdly, this disclosure provides a computer-readable medium having a computer program stored thereon, which, when executed by a processing device, implements the steps of the method described in the first aspect. Fourthly, this disclosure provides an electronic device, comprising: A storage device on which computer programs are stored; A processing device for executing the computer program in the storage device to implement the steps of the method in the first aspect. Fifthly, this disclosure provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the method described in the first aspect. Other features and advantages of this disclosure will be described in detail in the following detailed description section. Attached Figure Description The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale. In the drawings: Figure 1 is a flowchart illustrating a video generation method according to an exemplary embodiment of the present disclosure. Figure 2 is a schematic diagram illustrating the training process of a video generation model according to an exemplary embodiment of the present disclosure. Figure 3 is a block diagram illustrating a video generation apparatus according to an exemplary embodiment of the present disclosure. Figure 4 is a schematic diagram of the structure of an electronic device according to an exemplary embodiment of the present disclosure. Detailed Implementation Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodi