CN-122002094-A - Video generation method and device, storage medium and electronic equipment

CN122002094ACN 122002094 ACN122002094 ACN 122002094ACN-122002094-A

Abstract

The application discloses a video generation method, a device, a storage medium and electronic equipment, wherein the method comprises the steps of obtaining a video generation prompt word input by a user, inputting the video generation prompt word into a video generation large model, determining a video text script based on the video generation prompt word through the video generation large model, performing video generation processing based on the video text script to obtain a target video, and outputting the target video aiming at the video generation prompt word. According to the application, the corresponding video can be automatically generated by acquiring the video generation prompt words input by the user, so that manual processes such as script creation, video shooting, video editing and the like in the traditional video production are omitted, and the video generation efficiency is greatly improved.

Inventors

YUAN LIANGLIANG
CHEN XIAOMING
WANG CHUANHAI

Assignees

北京三六零智领科技有限公司

Dates

Publication Date: 20260508
Application Date: 20241101

Claims (10)

1. A video generation method, comprising: Acquiring a video generation prompt word input by a user; Inputting the video generation prompt word into a video generation large model, determining a video text script based on the video generation prompt word through the video generation large model, performing video generation processing based on the video text script to obtain a target video, and outputting the target video aiming at the video generation prompt word.
2. The video generation method of claim 1, wherein the determining, by the video generation large model, a video file script based on the video generation hint word comprises: Performing video content editing processing on the basis of the video generation prompt word through the video generation large model to obtain a video content document; And performing video script generation processing based on the video content text to obtain a video script; The video content script is generated based on the video content script and the video script.
3. The video generation method according to claim 2, wherein the video script generation processing based on the video content text obtains a video script, comprising: Dividing at least one sub-mirror document based on the video content document, and determining a sub-mirror script of each sub-mirror document; and generating the video script based on all the sub-mirror scripts.
4. The method of claim 1, wherein the video generation processing based on the video file script obtains a target video, comprising: Analyzing the video file script to obtain at least one sub-mirror file and a sub-mirror script corresponding to the sub-mirror file; Generating pictures based on the sub-mirror text to obtain sub-mirror pictures, and generating audio and video based on the sub-mirror pictures and the sub-mirror text to obtain sub-mirror audio and video; and combining all the sub-mirror audios and videos to obtain the target video.
5. The method of generating video according to claim 4, wherein the performing audio/video generation processing based on the sub-mirror picture and the sub-mirror document to obtain sub-mirror audio/video includes: Performing video generation processing on the basis of the sub-mirror picture and the sub-mirror text to obtain sub-mirror video, and performing dubbing processing on the sub-mirror video on the basis of the sub-mirror text to obtain sub-mirror audio; and carrying out video synthesis on the sub-mirror video and the sub-mirror audio to obtain the sub-mirror audio and video.
6. The method of claim 4, wherein generating the picture based on the mirror file to obtain the mirror file comprises: Determining sub-mirror scene file information based on the sub-mirror file, and performing scene file expansion processing based on the sub-mirror scene file information to obtain a plurality of expansion scene files; And generating a plurality of scene lens pictures by using the plurality of extended scene texts, and taking the plurality of scene lens pictures as the sub-mirror pictures.
7. The video generation method of claim 1, wherein the video generation large model is trained according to the following steps: Acquiring a basic big language model, a basic picture generation model and a basic text-to-speech model, and creating an initial video generation big model aiming at a video generation scene based on the basic big language model, the basic picture generation model and the text-to-speech model; acquiring a sample video to generate a prompt word, and labeling a video file script tag and a target video tag on the sample video to generate the prompt word; performing at least one round of model training on the initial video generation large model by adopting the sample video generation prompt word; in the model forward propagation training process, determining a predicted video text script based on the sample video generation prompt word through the initial video generation large model, and performing video generation processing based on the predicted video text script to obtain a predicted target video; In the model back propagation training process, determining script prediction loss based on the prediction video file script and the video file script label, determining video generation loss based on the prediction target video and the target video label, determining model comprehensive loss based on the script prediction loss and the video generation loss, and performing model parameter adjustment on the initial video generation large model by adopting the model comprehensive loss until the initial video generation large model finishes model training, so as to obtain the video generation large model.
8. A video generating apparatus, comprising: The acquisition module is used for acquiring a video generation prompt word input by a user; the video generation module is used for inputting the video generation prompt words into a video generation big model, determining a video file script based on the video generation prompt words through the video generation big model, performing video generation processing based on the video file script to obtain a target video, and outputting the target video of the video generation prompt words.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when run on a computer, causes the computer to perform the video generation method of any one of claims 1 to 7.
10. An electronic device comprising a processor and a memory, the memory storing a computer program, characterized in that the processor is adapted to perform the video generation method according to any of claims 1 to 7 by invoking the computer program.

Description

Video generation method and device, storage medium and electronic equipment Technical Field The present application relates to the field of video processing technologies, and in particular, to a video generating method, a device, a storage medium, and an electronic apparatus. Background With the rapid development of artificial intelligence technology, particularly advanced technologies such as deep learning, generation of countermeasure networks (GANs), natural Language Processing (NLP), and computer vision, the field of video generation is undergoing unprecedented revolution. Traditional video authoring relies primarily on traditional manual processes including script authoring, video capturing, video editing, etc., which are time consuming and laborious. Therefore, it is important to explore an efficient video generation scheme. Disclosure of Invention The embodiment of the application provides a video generation method, a video generation device, a storage medium and electronic equipment, which can improve the video generation efficiency. In a first aspect, an embodiment of the present application provides a video generating method, including: Acquiring a video generation prompt word input by a user; Inputting the video generation prompt word into a video generation large model, determining a video text script based on the video generation prompt word through the video generation large model, performing video generation processing based on the video text script to obtain a target video, and outputting the target video of the video generation prompt word. In some embodiments, the determining the video file script based on the video generation prompt word by the video generation big model includes: Performing video content editing processing on the basis of the video generation prompt word through the video generation large model to obtain a video content document; And performing video script generation processing based on the video content text to obtain a video script; And generating the video file script based on the video content file and the video script. In some embodiments, the performing the video script generating process based on the video content file to obtain a video script includes: Dividing at least one sub-mirror document based on the video content document, and determining a sub-mirror script of each sub-mirror document; And generating the video script based on all the sub-mirror scripts. In some embodiments, the performing the video generation process based on the video file script to obtain the target video includes: analyzing the video file script to obtain at least one sub-mirror file and a sub-mirror script corresponding to the sub-mirror file; Generating pictures based on the sub-mirror text to obtain sub-mirror pictures, and generating audio and video based on the sub-mirror pictures and the sub-mirror text to obtain sub-mirror audio and video; and combining all the sub-mirror audios and videos to obtain the target video. In some embodiments, the processing of audio and video generation based on the sub-mirror image and the sub-mirror document to obtain sub-mirror audio and video includes: Performing video generation processing based on the sub-mirror picture and the sub-mirror text to obtain sub-mirror video, and performing dubbing processing based on the sub-mirror text for the sub-mirror video to obtain sub-mirror audio; and carrying out video synthesis on the sub-mirror video and the sub-mirror audio to obtain the sub-mirror audio and video. In some embodiments, the generating a picture based on the split mirror document to obtain a split mirror picture includes: determining the sub-mirror scene file information based on the sub-mirror file, and performing scene file expansion processing based on the sub-mirror scene file information to obtain a plurality of expansion scene files; And generating a plurality of scene lens pictures by carrying out picture generation on a plurality of extended scene texts, and taking the plurality of scene lens pictures as the sub-mirror pictures. In some embodiments, the processing of video generation based on the mirror image and the mirror document to obtain a mirror image includes: Determining the arrangement sequence of the sub-mirror pictures according to the content of the sub-mirror document; and carrying out video generation processing on the sub-mirror pictures according to the arrangement sequence to obtain the sub-mirror video. In some embodiments, the video generation large model is trained according to the following steps: Acquiring a basic big language model, a basic picture generation model and a basic text-to-speech model, and creating an initial video generation big model aiming at a video generation scene based on the basic big language model, the basic picture generation model and the text-to-speech model; obtaining a sample video generation prompt word, and labeling a video text script tag and a target video tag on the sample v