CN-122027865-A - Video generation method, apparatus, device, medium, and program product

CN122027865ACN 122027865 ACN122027865 ACN 122027865ACN-122027865-A

Abstract

The present disclosure provides a video generation method, apparatus, device, medium and program product, and relates to the technical field of large models, in particular to the technical field of artificial intelligence such as natural language processing and computer vision, where the method includes obtaining a video generation requirement set and a target execution link set; the method comprises the steps of obtaining video generation task sequences of video generation requirements, determining task identifiers of second target tasks to be executed in the video generation task sequences based on first target tasks currently executed by target execution link sets, and determining third target tasks to be executed next by the target execution link sets based on the task identifiers, so as to obtain target video files of the video generation requirements.

Inventors

WANG KAIYU

Assignees

北京百度网讯科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (16)

1. A video generation method, wherein the method comprises: acquiring a video generation demand set and a target execution link set; Acquiring video generation task sequences of each video generation requirement, and determining task identifiers of second target tasks to be executed in each video generation task sequence based on the first target tasks currently executed by each target execution link set; And determining a third target task to be executed next in the target execution link set from the second target tasks to be executed based on the task identification, so as to obtain target video files of each video generation requirement.
2. The method of claim 1, wherein the obtaining the video generation task sequence of each video generation requirement, and determining, based on the first target task currently executed by each target execution link set, a task identifier of a second target task to be executed in each video generation task sequence, includes: Invoking model capacity of the multi-mode large model to perform task analysis on each video generation requirement, and determining video generation tasks required by achieving each video generation requirement so as to obtain a video generation task sequence of each video generation requirement; Acquiring task execution ending time stamps of the first target tasks; And aiming at any first target task, acquiring the next second target task of the first target task in the corresponding video generation task sequence, and obtaining the task identification of the second target task based on the task execution ending time stamp of the first target task.
3. The method according to claim 2, wherein the determining, based on the task identification, a third target task to be executed next for each of the target execution link sets from the second target tasks to be executed to obtain a target video file of each video generation requirement includes: Placing each video generation task sequence into a preset task buffer layer, and determining the task priority of each video generation task included in each video generation task sequence based on the task identification; and based on the task priority and the second target task in the task buffer layer, acquiring the next third target task executable by each target execution link to obtain a target video file of each video generation requirement.
4. The method of any of claims 1-3, wherein the video generation task sequence includes at least a script generation task, a text-to-graphics task, a graphics-to-video task, and an audio subtitle integration task.
5. The method of claim 4, wherein performing the script generation task comprises: the method comprises the steps of calling model capacity of a multi-mode large model to analyze video generation requirements corresponding to a script generation task so as to determine script generation information required by the script generation task, wherein the script generation information at least comprises video style requirement information, video theme information and video role setting information; Invoking model capability of a multi-mode large model, and generating a plurality of sub-mirror scripts based on script generation information, wherein the sub-mirror scripts at least comprise sub-mirror scene description information, sub-mirror role appearance information, sub-mirror role action information, sub-mirror scene information, sub-mirror operation information, sub-mirror side white information and sub-mirror environment sound effect information, and the sub-mirror role appearance information of each sub-mirror script is consistent; integrating the multiple sub-mirror scripts to obtain a target script corresponding to the script generation task.
6. The method of claim 5, wherein performing the paperwork task comprises: extracting key frame generation information required by the text-to-image task from the target script, wherein the key frame generation information at least comprises visual style information, character appearance information, lamplight atmosphere information and picture basic tone information; and calling the model capacity of the multi-mode large model, and generating a mirror-splitting key frame corresponding to the venturi chart task based on the key frame generation information.
7. The method of claim 6, wherein performing the graphically generated video task comprises: Determining respective video segment generation information of a plurality of sub-mirror scripts based on the plurality of sub-mirror scripts included in the target script; Invoking the model capability of the multi-mode large model, and generating information based on each video segment to obtain a respective sub-mirror video segment of the plurality of sub-mirror scripts, wherein the sub-mirror video segment is based on the sub-mirror key frame as a video first frame; And integrating the sub-mirror video segments of each sub-mirror script to obtain candidate video files corresponding to the graphical video task.
8. The method of claim 7, wherein performing the audio subtitle integration task comprises: Calling the model capability of the multi-mode large model, and determining an audio file and a subtitle file of the candidate video file based on the target script, wherein the audio file at least comprises background audio, character line audio and environment sound effect audio required by the candidate video file; And integrating the candidate video file, the audio file and the subtitle file to obtain the target video file of the video generation requirement corresponding to the video generation task sequence.
9. The method of claim 8, wherein the method further comprises: And determining that the candidate video file is the target video file of the video generation requirement corresponding to the video generation task sequence in response to the video generation task sequence not including the audio subtitle integration task.
10. The method of claim 1, wherein the obtaining the set of video generation requirements and the set of target execution links comprises: Acquiring a historical search term corresponding to a current task period from a historical search term library, and carrying out intention analysis and semantic analysis on the historical search term to extract the video generation requirement set in the current task period; The video generation task types of all video generation requirements in the video generation requirement set are obtained, and target execution links matched with all video generation task types are determined from candidate execution links in an idle state, so that the target execution link set is obtained.
11. The method of claim 10, wherein the method further comprises: in response to identifying that the non-executed task exists in the last task period, clearing the non-executed task, and starting task execution of the current task period after the task is cleared; And in response to identifying that the unexecuted task does not exist in the last task period, starting task execution of the current task period.
12. The method of claim 1, wherein after obtaining the target video file of each video generation requirement, further comprising: obtaining search terms matched with each target video file; And responding to any search term input in the search box, pushing and displaying the target video file corresponding to the search term on a search result display page corresponding to the search box.
13. A video generating apparatus, wherein the apparatus comprises: The acquisition module is used for acquiring a video generation demand set and a target execution link set; The determining module is used for obtaining video generation task sequences of all video generation requirements and determining task identifiers of second target tasks to be executed in the video generation task sequences based on the first target tasks currently executed by the target execution link sets; And the generation module is used for determining a third target task to be executed next in the target execution link set from the second target tasks to be executed based on the task identification so as to obtain target video files of each video generation requirement.
14. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.
15. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-12.
16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-12.

Description

Video generation method, apparatus, device, medium, and program product Technical Field The present disclosure relates to the field of large model technology, and in particular, to the field of artificial intelligence such as natural language processing and computer vision. Background With the development of technology, videos become a mainstream information propagation carrier for people, more and more people rely on videos to acquire information required by the videos, and in related technology, the production of video contents can be realized manually, so that the degree of manual dependence is high. Disclosure of Invention The present disclosure proposes a video generation method, apparatus, device, medium and program product. According to a first aspect of the present disclosure, a video generating method is provided, which includes obtaining a set of video generating requirements and a set of target execution links, obtaining a video generating task sequence of each video generating requirement, determining task identifiers of second target tasks to be executed in each video generating task sequence based on first target tasks currently executed by each target execution link set, and determining a third target task to be executed next by each target execution link set from the second target tasks to be executed based on the task identifiers, so as to obtain target video files of each video generating requirement. According to a second aspect of the present disclosure, a video generating apparatus is provided, which includes an acquiring module configured to acquire a set of video generating requirements and a set of target execution links, a determining module configured to acquire a video generating task sequence of each video generating requirement and determine task identifiers of second target tasks to be executed in each video generating task sequence based on first target tasks currently executed by each target execution link set, and a generating module configured to determine, based on the task identifiers, a third target task to be executed next by each target execution link set from the second target tasks to be executed, so as to obtain target video files of each video generating requirement According to a third aspect of the present disclosure, an electronic device is provided, comprising at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the video generation method set forth in the first aspect above. According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the video generation method set forth in the first aspect above is provided. According to a fifth aspect of the present disclosure, a computer program product is presented, comprising a computer program which, when executed by a processor, implements the video generation method presented in the first aspect above. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: Fig. 1 is a flow chart of a video generating method according to an embodiment of the disclosure; FIG. 2 is a flow chart of a video generating method according to another embodiment of the disclosure; FIG. 3 is a flow chart of a video generating method according to another embodiment of the disclosure; FIG. 4 is a flow chart of a video generating method according to another embodiment of the disclosure; Fig. 5 is a flowchart of a video generating method according to another embodiment of the present disclosure; FIG. 6 is a flow chart of a video generating method according to another embodiment of the disclosure; Fig. 7 is a flowchart of a video generating method according to another embodiment of the present disclosure; fig. 8 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present disclosure; Fig. 9 is a schematic block diagram of an electronic device of an embodiment of the present disclosure. Detailed Description Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications