CN-122027869-A - Video generation method, device and medium based on video script

CN122027869ACN 122027869 ACN122027869 ACN 122027869ACN-122027869-A

Abstract

The invention relates to the technical field of video generation, in particular to a video generation method, equipment and medium based on a video script, wherein the method constructs a target lens feature library by acquiring lens parameter data of scene type, motion trail, lens duration and transition mode in a sub-lens script; dividing the materials of the shot to be generated into four types of panoramic fixed-field shots, medium-scene narrative shots, close-up dialogue shots and close-up emotion shots according to shooting scene types, analyzing the adaptation degree and the narrative association degree of each scene to different types of shots based on a feature library to generate a shot combination matrix, distributing optimal shot types for each scene according to the matrix, and determining the clipping time sequence weight of each scene. The method realizes intelligent matching and automatic arrangement from the script to the lens sequence, and improves the continuity, rhythm control capability and vision profession of the narrative generated by the video.

Inventors

KANG XINXIN
QI GUIFANG
QI JINJU
PAN GUOJING

Assignees

杭州致算科技有限责任公司

Dates

Publication Date: 20260512
Application Date: 20260309

Claims (10)

1. A video script-based video generation method, the method comprising: Acquiring lens parameter data corresponding to a shot script to construct a corresponding target lens feature library, wherein the lens parameter data comprises a scene type, a motion track, a lens duration and a transition mode; dividing the lens material to be generated into corresponding lens types according to shooting scene types, wherein the lens types comprise a panoramic fixed-field lens, a medium-scene narrative lens, a close-up dialogue lens and a close-up emotion lens; based on the target lens feature library, analyzing the adaptation degree and the narrative association degree of the lens requirements of all shooting scenes to different lens types; generating a lens combination matrix corresponding to the adaptation degree and the narrative association degree; and distributing corresponding shot types to each shooting scene according to the shot combination matrix, and determining the clipping time sequence weight corresponding to each shot type.
2. The video script-based video generation method of claim 1, wherein analyzing the suitability and narrative association of the lens requirements of each shooting scene to different lens types based on the target lens feature library comprises: Extracting multidimensional features corresponding to the target lens feature library and lens parameter data; Analyzing a space-time logic relationship among lenses in the multi-dimensional characteristics by adopting a graph neural network model; Determining a visual narrative path for switching the lens from the scene to the emotion progression according to the space-time logical relationship; and according to the visual narrative path, determining the adaptation degree and the narrative association degree of the lens requirements of all shooting scenes to different lens types.
3. The video script-based video generation method of claim 2, wherein said determining, according to the visual narrative path, the fitness and narrative relevance of the lens requirements of each shooting scene to different lens types comprises: according to the visual narrative path, calculating corresponding lens adaptation weights of each shooting scene according to the emotion basic tone of the scene; Weighting and sequencing the adaptation degree of the shot type of each shooting scene according to the shot adaptation weight, and determining the adaptation degree; analyzing the visual fluency of the shot from scene switching to transition completion according to the shot adaptation weight; And determining the narrative association degree of the shot types of each shooting scene according to the visual fluency of switching from scene to transition completion.
4. The video script-based video generation method of claim 1, wherein assigning a corresponding shot type to each shot scene according to the shot combining matrix and determining a clip timing weight corresponding to each shot type comprises: analyzing a shot distribution priority sequence of each shooting scene according to the shot combination matrix; calculating the narrative intensity indexes of each shooting scene on different shot types based on the shot distribution priority sequence, wherein the narrative intensity indexes are obtained by weighting products of the adaptation degree and the narrative association degree; according to the narrative strength index, matching lens types to corresponding shooting scenes in descending order; Calculating the time sequence weight of the clip through nonlinear weighting according to the rhythm requirement data of the shooting scene and the narrative strength index, wherein the rhythm requirement data is determined through shot duration and a transition mode in shot parameter data.
5. The video script-based video generation method of claim 1 wherein said generating a lens combination matrix corresponding to said fitness and said narrative relevance comprises: normalizing the adaptation degree and the narrative association degree to generate a normalized adaptation degree score and a normalized association degree score; calculating a comprehensive clipping coefficient of each shooting scene for each shot type based on the standardized fitness score and the standardized relevance score, wherein the comprehensive clipping coefficient is a harmonic mean of the fitness score and the relevance score; arranging the comprehensive clipping coefficients in a matrix according to shooting scenes and shot types to generate an initial clipping matrix; and dynamically optimizing the initial clipping matrix, eliminating clipping logic conflict based on a Markov chain model, and generating the lens combination matrix.
6. The video script-based video generation method of claim 1, wherein the obtaining lens parameter data corresponding to a shot script to construct a corresponding target lens feature library comprises: Acquiring the scene type, the motion trail, the lens duration and the transition mode of a lens in the shot script, and generating an original lens data set; based on a hierarchical clustering algorithm, carrying out shot style clustering on the original shot data set, and determining a plurality of shot style clusters; Extracting style feature vectors of each lens style cluster, wherein the style feature vectors comprise curvature distribution of a motion track and frequency duty ratio of a transition mode; and constructing a target shot feature library according to the style feature vector, wherein the target shot feature library comprises a visual grammar tag and rhythm control parameters.
7. The video script-based video generation method of claim 6, wherein the extracting the style feature vector of each of the lens style clusters comprises: Separating a multi-mode feature subset in the lens parameter data according to each shooting scene type, wherein the feature subset comprises scene change rate features, motion trail smoothness features, duration rhythm features and transition logic features; The feature subset is structurally reconstructed to generate a four-dimensional feature tensor, wherein the first dimension is a change gradient of a scene on a time axis, the second dimension is a space coordinate sequence of a motion track, the third dimension is beat distribution of a lens duration, and the fourth dimension is a type probability vector of a transition mode; Determining contribution weights of all dimension features in the four-dimensional feature tensor to lens style grouping by applying a random forest algorithm, wherein the contribution weights are obtained through mutual information values of features and style clusters; based on the contribution weight, carrying out weighted fusion on the four-dimensional feature tensor to generate an initial style vector; and (3) carrying out standardized dimension reduction processing on the initial style vector, reserving core visual features through a t-SNE algorithm, and outputting the style feature vector corresponding to the core visual features.
8. The video script-based video generation method according to claim 1, wherein the dividing the shot material to be generated into corresponding shot types according to shooting scene types comprises: Analyzing metadata tags of the lens materials to be generated, wherein the metadata tags comprise resolution parameters, dynamic ranges, lens focal segments and color styles; matching a metadata tag with visual requirements of shooting scene types, wherein the visual requirements are determined through scene descriptions in a shot script; Based on the matching result and the narrative function positioning of the lens materials, dividing the lens types and outputting a divided lens type set.
9. A video script-based video generating device comprising a memory, a processor and a video script-based video generating program stored on the memory and executable on the processor, the video script-based video generating program being configured to implement the steps of the video script-based video generating method of any of claims 1 to 8.
10. A medium having stored thereon a video script-based video generation program which, when executed by a processor, implements the steps of the video script-based video generation method of any of claims 1 to 8.

Description

Video generation method, device and medium based on video script Technical Field The present invention relates to the field of video generation technologies, and in particular, to a video generation method, device, and medium based on a video script. Background With the rapid development of artificial intelligence and computer vision technologies, video content generation is evolving gradually from traditional artificial shooting and editing to automated and intelligent generation. In particular, in the fields of short video, video previewing, virtual production, digital human content production, etc., there is an increasing demand for efficient, high quality video generation. Currently, most Video generation systems rely on a simple Text-to-Video (Text-to-Video) model to generate isolated Video segments based solely on textual descriptions, and lack of systematic planning of the overall narrative structure, resulting in significant shortfalls in the generated Video in terms of shot engagement, cadence control, and plot consistency. The shot script is used as a core front link of movie and television creation and carries the detailed design of image composition, lens movement, rhythm arrangement and emotion expression by a director. However, there are few methods in the prior art that can deep parse structured information (e.g., jing Bie, mirror mode, duration, transition logic, etc.) in a shot script and effectively translate it into a sequence of shots with specialized narrative logic. Although part of the system can identify basic elements in the script, the system still relies on manual intervention in the aspects of lens type matching, scene suitability judgment and editing time sequence decision, has low automation degree, and is difficult to meet the actual requirements of large-scale and personalized video content production. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The invention mainly aims to provide a video generation method, equipment and medium based on a video script, and aims to solve the technical problems that the existing video generation technology is difficult to deeply analyze structural information in a shot script and realize intelligent matching of shot types and narrative logic, so that generated video lacks professional-level consistency and automatic editing capability. In order to achieve the above object, the present invention provides a video generation method based on a video script, the method comprising: Acquiring lens parameter data corresponding to a shot script to construct a corresponding target lens feature library, wherein the lens parameter data comprises a scene type, a motion track, a lens duration and a transition mode; dividing the lens material to be generated into corresponding lens types according to shooting scene types, wherein the lens types comprise a panoramic fixed-field lens, a medium-scene narrative lens, a close-up dialogue lens and a close-up emotion lens; based on the target lens feature library, analyzing the adaptation degree and the narrative association degree of the lens requirements of all shooting scenes to different lens types; generating a lens combination matrix corresponding to the adaptation degree and the narrative association degree; and distributing corresponding shot types to each shooting scene according to the shot combination matrix, and determining the clipping time sequence weight corresponding to each shot type. Optionally, the analyzing, based on the target lens feature library, the adaptation degree and the narrative association degree of the lens requirements of each shooting scene to different lens types includes: Extracting multidimensional features corresponding to the target lens feature library and lens parameter data; Analyzing a space-time logic relationship among lenses in the multi-dimensional characteristics by adopting a graph neural network model; Determining a visual narrative path for switching the lens from the scene to the emotion progression according to the space-time logical relationship; and according to the visual narrative path, determining the adaptation degree and the narrative association degree of the lens requirements of all shooting scenes to different lens types. Optionally, the determining, according to the visual narrative path, the adaptation degree and the narrative association degree of the lens requirements of each shooting scene to different lens types includes: according to the visual narrative path, calculating corresponding lens adaptation weights of each shooting scene according to the emotion basic tone of the scene; Weighting and sequencing the adaptation degree of the shot type of each shooting scene according to the shot adaptation weight, and determining the adaptation degree; analyzing the v