CN-122002102-A - Method, system and related equipment for synthesizing split-mirror video

CN122002102ACN 122002102 ACN122002102 ACN 122002102ACN-122002102-A

Abstract

The invention provides a method, a system and related equipment for synthesizing a split-lens video, wherein the method comprises the steps of firstly analyzing input script data to generate a structured split-lens script, and associating a reference image serving as a unique visual basis for each role in a global role list of the split-lens script; and then, before generating a sub-mirror image for each sub-mirror, carrying out key conversion on an initial text-generated image prompt word in the sub-mirror script, replacing a diagonal text description in the initial text-generated image prompt word with a reference identifier pointing to a reference image, then inputting the converted target text-generated image prompt word and the corresponding reference image into an image generation model to generate sub-mirror images, and finally, synthesizing the generated sub-mirror images into videos in sequence. According to the method for synthesizing the split-lens video, the roles and the fixed reference images are strongly bound, so that the high uniformity of the images of the same roles under different lenses is fundamentally ensured, and the stability of generating the multi-role video content is remarkably improved.

Inventors

Huang zong
JIANG YAMING
CHEN ZIWEN

Assignees

深圳麦风科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260206

Claims (10)

1. The method for synthesizing the split-mirror video is characterized by comprising the following steps of: Acquiring input script data, and processing the script data by using a large language model to generate a structured split-mirror script; reading a global role list from the mirror script, and associating a reference image serving as a unique visual basis of each role in the global role list; Traversing the sub-mirror script, generating a target text-generated graph prompting word based on an initial text-generated graph prompting word in the sub-mirror script for each sub-mirror, and replacing text description of a character appearing in the initial text-generated graph prompting word with a reference identifier of the reference image associated with the character in the target text-generated graph prompting word; Inputting the target text graph prompt word and the reference image of the role related to the sub-mirror into an image generation model for each sub-mirror, and generating a sub-mirror image corresponding to the content of the sub-mirror; And synthesizing a plurality of the split mirror images into split mirror videos according to the sequence of the split mirror scripts.
2. The method of claim 1, wherein the obtaining the inputted scenario data, processing the scenario data using a large language model, and generating the structured minute-mirror script comprises: And converting the script data into a structured data format containing a global role list, a mirror-dividing index, scene description, mirror-dividing role information and an initial text-generated graph prompt word according to a preset instruction template by using a large language model to obtain a mirror-dividing script.
3. The method of claim 1, wherein the reference image for each character in the list of characters is determined by: according to the character description of the character extracted from the character list, invoking a text image generation model to generate; Or matching or directly designating the characters from a preset character image library according to the character information of the characters.
4. The method of claim 1, wherein the reference identifier is a specific placeholder or code that can be identified by the image generation model, and the reference identifier is used to instruct a model to render an associated role in the mirror image in accordance with a reference role in the corresponding reference image in a designated region of the mirror image.
5. The method of claim 1, wherein replacing, in the target meridional graph cue word, a text description of a character appearing in the initial meridional graph cue word with a reference identifier of the reference image associated with the character comprises: judging whether the currently processed sub-mirrors contain at least two roles in the role list or not; The replacing of the text description of the character with a reference identifier of the reference image associated with the character is performed only if at least two characters in the list of characters are contained in the currently processed partial mirror.
6. The method for synthesizing a split-lens video according to claim 1, wherein the target text-to-image prompt word includes a stylized prompt phrase for defining an artistic style of an overall picture, so as to ensure that styles of all the split-lens images are uniform.
7. The method of synthesizing a split-mirror video according to claim 1, wherein synthesizing the plurality of split-mirror images into a split-mirror video according to the order of the split-mirror script comprises: The generated multiple sub-mirror images are combined with the graphically generated video prompt words used for describing the dynamic effect in the sub-mirror script, and the combined video prompt words are input to a video generation model together to generate a video fragment sequence containing the dynamic transition effect; And splicing the generated video fragment sequences to obtain a split-lens video.
8. A system for synthesizing a split-mirror video, the system comprising: the script data processing module is used for acquiring input script data, processing the script data by using a large language model and generating a structured split-lens script; The reference image determining module is used for reading a global role list from the mirror script and associating a reference image serving as a unique visual basis of each role in the global role list; The prompting word generation module is used for traversing the sub-mirror script, generating target text-generated graph prompting words based on initial text-generated graph prompting words in the sub-mirror script for each sub-mirror, and replacing text description of characters appearing in the initial text-generated graph prompting words with reference identifiers of the reference images associated with the characters in the target text-generated graph prompting words; The sub-mirror image generation module is used for inputting the target text graph prompt word and the reference image of the role related to the sub-mirror into an image generation model for each sub-mirror to generate a sub-mirror image corresponding to the content of the sub-mirror; And the sub-mirror video generation module is used for synthesizing a plurality of sub-mirror images into sub-mirror videos according to the sequence of the sub-mirror script.
9. A video generating device comprising a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line; The at least one processor invoking the instructions in the memory to cause the video generating apparatus to perform the method of synthesizing a split-mirror video as claimed in any one of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the method of composing a split-mirror video according to any of claims 1-7.

Description

Method, system and related equipment for synthesizing split-mirror video Technical Field The invention relates to the technical field of video editing, in particular to a method, a system and related equipment for synthesizing a split-lens video. Background With the development of the generated artificial intelligence, the technology of generating the video based on the generated model is mature gradually, and the existing technical scheme of generating the video based on the generated model is mainly divided into two types, namely end-to-end video generation, video generation by directly inputting text or reference images, video generation by the model, frame-by-frame image generation and splicing, namely obtaining each frame or key frame through text generation or image generation, and then carrying out time sequence splicing. In practical applications such as film, animation, advertisement production and the like, a user usually hopes to control video content in a split-mirror mode, in an application scene involving continuous appearance of characters in a plurality of split mirrors, whether in an end-to-end generation mode or a frame generation mode, when a model processes prompt words of different split mirrors, even if text description is completely the same, the model can re-imagine and render the appearance of the characters when each generation is performed, and the video generation method in the prior art has obvious defects in ensuring cross-lens consistency of video characters and precisely controlling multi-role interaction scenes, so that a split-mirror video generation scheme capable of solving the problem of character image uniformity in an automatic and systematic mode is urgently needed in the prior art. Accordingly, the prior art is still in need of improvement and development. Disclosure of Invention The invention provides a method, a system and related equipment for synthesizing a split-lens video, and aims to solve the technical problems in the prior art. The first aspect of the invention provides a method for synthesizing a split-mirror video, which comprises the following steps: Acquiring input script data, and processing the script data by using a large language model to generate a structured split-mirror script; reading a global role list from the mirror script, and associating a reference image serving as a unique visual basis of each role in the global role list; Traversing the sub-mirror script, generating a target text-generated graph prompting word based on an initial text-generated graph prompting word in the sub-mirror script for each sub-mirror, and replacing text description of a character appearing in the initial text-generated graph prompting word with a reference identifier of the reference image associated with the character in the target text-generated graph prompting word; Inputting the target text graph prompt word and the reference image of the role related to the sub-mirror into an image generation model for each sub-mirror, and generating a sub-mirror image corresponding to the content of the sub-mirror; And synthesizing a plurality of the split mirror images into split mirror videos according to the sequence of the split mirror scripts. In an optional implementation manner of the first aspect of the present invention, the obtaining the input scenario data, processing the scenario data by using a large language model, and generating the structured split-lens script includes: And converting the script data into a structured data format containing a global role list, a mirror-dividing index, scene description, mirror-dividing role information and an initial text-generated graph prompt word according to a preset instruction template by using a large language model to obtain a mirror-dividing script. In an optional implementation of the first aspect of the present invention, the reference image of each character in the character list is determined by: according to the character description of the character extracted from the character list, invoking a text image generation model to generate; Or matching or directly designating the characters from a preset character image library according to the character information of the characters. In an optional implementation manner of the first aspect of the present invention, the reference identifier is a specific placeholder or code that can be identified by the image generation model, and the reference identifier is used to instruct the model to render the associated role in the mirror image in accordance with the reference role in the corresponding reference image in the designated area of the mirror image. In an optional implementation manner of the first aspect of the present invention, in the target text-to-text graph prompting word, replacing a text description of a character appearing in the initial text-to-text graph prompting word with a reference identifier of the reference image associated with the character inclu