CN-122023577-A - Drawing generation method, system, equipment and medium

CN122023577ACN 122023577 ACN122023577 ACN 122023577ACN-122023577-A

Abstract

The invention provides a method, a system, equipment and a medium for generating a drawing, which belong to the technical field of artificial intelligence, and the method comprises the steps of obtaining story conception information input by a user; the method comprises the steps of inputting story conception information into a content planning model to generate a standardized creation instruction, inputting character descriptions and scene descriptions corresponding to each sub-mirror into a multi-mode content generator to generate an initial character image and an initial scene background corresponding to each sub-mirror, inputting a sub-mirror script corresponding to each sub-mirror into the content planning model to generate an initial narrative text corresponding to each sub-mirror, carrying out optimization processing according to the initial character image and the initial scene background based on the character descriptions to generate a coordination foreground image of visual fusion, and generating a target drawing document according to the coordination foreground image, the initial scene background and the initial narrative text corresponding to each sub-mirror based on a picture-text corresponding relation. The invention automatically and efficiently generates the personalized drawing with consistent picture and text style and high quality, and reduces the technical threshold and time cost of drawing creation.

Inventors

Hu Anshu
LIU YUHAN
ZHANG RUILI

Assignees

西藏珠穆雅鲁数字科技文化有限公司

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (10)

1. The method for generating the drawing is characterized by comprising the following steps of: Acquiring story conception information input by a user; Inputting the story conception information into a content planning model to generate a standardized creation instruction, wherein the standardized creation instruction comprises a sub-mirror script which is logically arranged according to a narrative, character descriptions corresponding to each sub-mirror, scene descriptions and image-text corresponding relations; Inputting the role description and the scene description corresponding to each sub-mirror into a multi-mode content generator to generate an initial role image and an initial scene background corresponding to the sub-mirror; inputting the sub-mirror script corresponding to each sub-mirror into the content planning model to generate an initial narrative text corresponding to the sub-mirror; Based on the character description, optimizing the initial character image and the initial scene background to generate a coordination foreground image of visual fusion; and generating a target drawing document according to the coordination foreground image, the initial scene background and the initial narrative text corresponding to each sub-mirror based on the image-text corresponding relation.
2. The method for generating a drawing according to claim 1, wherein inputting the character description and the scene description corresponding to each sub-mirror into a multi-modal content generator, generating an initial character image and an initial scene background corresponding to the sub-mirror, comprises: encoding the scene description and the role description into a first text semantic vector and a second text semantic vector respectively; Generating the initial scene background by the multi-modal content generator on the condition of the first text semantic vector; and generating the initial character image through the multi-mode content generator on the condition of the second text semantic vector.
3. The method for generating a codebook according to claim 1, wherein said optimizing the initial character image and the initial scene background based on the character description to generate a visually fused coordinated foreground image includes: generating optimized role images with consistent characteristics for all the sub-mirrors comprising the same initial role image based on the role description; And obtaining visual attribute coordination parameters according to the optimized role image of each sub-mirror and the corresponding initial scene background, and adjusting the optimized role image according to the visual attribute coordination parameters to generate the coordination foreground image.
4. The method of generating a drawing according to claim 3, wherein generating an optimized character image with consistent characteristics for each of the sub-mirrors including the same initial character image based on the character description includes: Acquiring a target role image selected by a user based on the initial role image, and acquiring a target role description used for generating the target role image; Constructing a sample set according to the target role image and the target role description; Adjusting the multi-modal content generator based on the sample set to obtain a role-specific generation model, wherein the adjusting comprises freezing the basic weight of the multi-modal content generator and optimizing an introduced low-rank matrix; and generating the optimized character image consistent with the character image characteristics of the target character for all the sub-mirrors containing the description of the target character in the standardized creation instruction according to the character-specific generation model.
5. The method of generating a codebook according to claim 3, wherein obtaining a visual attribute coordination parameter from the optimized character image of each of the sub-mirrors and the corresponding initial scene background, and adjusting the optimized character image according to the visual attribute coordination parameter to generate the coordination foreground image comprises: The optimized character image of the current sub-mirror is used as a foreground image and the corresponding initial scene background, and is input to a feature extractor in an image coordination network to extract the foreground depth feature of the foreground image and the background depth feature of the initial scene background; Fusing the foreground depth features and the background depth features, and inputting the fused foreground depth features and the background depth features to a coordination controller in the image coordination network to obtain the vision attribute coordination parameters; And according to the vision attribute coordination parameters, performing pixel level adjustment on the foreground image through a differentiable renderer in the image coordination network, and generating the coordination foreground image which is fused with the initial scene background vision.
6. The method for generating a script according to claim 1, wherein the generating a target script file based on the image-text correspondence according to the coordination foreground image, the initial scene background, and the initial narrative text corresponding to each of the mirrors includes: For each sub-mirror, analyzing the corresponding image-text corresponding relation of the current sub-mirror in the standardized creation instruction to obtain the image-text logic association and the space layout instruction of the current sub-mirror; based on the graphic-text logic association, checking whether the coordination foreground image of the current sub-mirror, the initial scene background and the initial narrative text are matched on the narrative content; After verification and matching are passed, determining a layer synthesis mode of the coordinated foreground image of the current sub-mirror and the initial scene background and positioning coordinates and style attributes of the initial narrative text in a synthesized page according to the spatial layout instruction; According to the layer synthesis mode, the positioning coordinates and the style attribute, synthesizing the coordination foreground image, the initial scene background and the initial narrative text of the current sub-scope into a standard page; and assembling the standard pages corresponding to all the sub mirrors according to the narrative logic sequence of the sub mirror script, and packaging the standard pages into the target drawing file in a preset format.
7. The method of generating a drawing according to any one of claims 1 to 6, further comprising: If a defect area exists in the coordination foreground image, obtaining an image mask of the defect area, and the role description and the scene description of a sub-mirror corresponding to the defect area; Generating a local redrawn text condition aiming at the defect area according to the role description and the scene description corresponding to the defect area; And carrying out local content weight drawing on the defect area in the coordination foreground image through a condition control generation model according to the local weight drawing text condition and the image mask, and obtaining a corrected coordination foreground image.
8. A codebook generating system, comprising: the conception acquisition module is used for acquiring story conception information input by a user; The instruction generation module is used for inputting the story conception information into a content planning model to generate a standardized creation instruction, and the standardized creation instruction comprises a sub-mirror script which is logically arranged according to a narrative, character descriptions corresponding to each sub-mirror, scene descriptions and image-text corresponding relations; The content generation module inputs the role description and the scene description corresponding to each minute mirror into a multi-mode content generator to generate an initial role image and an initial scene background corresponding to the minute mirror; The collaborative optimization module is used for carrying out optimization processing according to the initial character image and the initial scene background based on the character description to generate a coordination foreground image of visual fusion; and the drawing generation module is used for generating a target drawing file according to the coordination foreground image, the initial scene background and the initial narrative text corresponding to each minute mirror based on the image-text corresponding relation.
9. An electronic device comprising a memory and a processor, wherein, The memory is used for storing programs; The processor, coupled to the memory, is configured to execute the program stored in the memory, so as to implement the steps in the method for generating a drawing according to any one of claims 1 to 7.
10. A computer readable storage medium storing a computer readable program or instructions which when executed by a processor is capable of carrying out the steps of the method of generating a drawing according to any one of claims 1 to 7.

Description

Drawing generation method, system, equipment and medium Technical Field The invention relates to the technical field of artificial intelligence, in particular to a method, a system, equipment and a medium for generating a drawing. Background In the fields of digital publishing and children education, the demands for personalized painting creation are increasing. Parents, educators and content creators wish to quickly generate unique transcript content based on a particular topic, character or educational objective. However, the traditional drawing creation is highly dependent on the collaborative work of professional painters and editing, and has the inherent bottlenecks of long period, high cost, inconvenient modification and the like, so that the personalized and instant creation requirement is difficult to meet. To improve authoring efficiency, the prior art has attempted to introduce artificial intelligence aids. For example, a story outline or short narrative is generated from text cues using a content planning model (e.g., GPT series), or a single episode is generated from separate descriptions using a text-to-image generation model (e.g., stable presentation, DALL-E). The techniques realize automatic generation of the content to a certain extent, and provide basic materials for the creator. However, the prior art solutions described above still have significant limitations. Firstly, each link is split, namely story generation and image generation are usually independent processes, so that the graphic narrative is logically disjointed and different in style. Secondly, the global consistency is lacking, and character images, visual styles and narrative consistency are difficult to keep uniform and the collage sense is presented when the multi-page content is generated. Again, the existing tools have poor controllability and poor capability of adjusting the details of the generated content, and users are difficult to intervene in optimizing specific elements (such as correcting role deformity and coordinating foreground and background). Disclosure of Invention In view of the foregoing, it is necessary to provide a method, a system, a device and a medium for generating a drawing, which are used for solving the technical problems that the generated drawing is split, has different styles and is difficult to maintain consistency among multiple pages due to the lack of end-to-end collaborative generation and optimization capability in the prior art. In order to solve the above technical problems, in a first aspect, the present invention provides a method for generating a drawing, including: Acquiring story conception information input by a user; Inputting the story conception information into a content planning model to generate a standardized creation instruction, wherein the standardized creation instruction comprises a sub-mirror script which is logically arranged according to a narrative, character descriptions corresponding to each sub-mirror, scene descriptions and image-text corresponding relations; Inputting the role description and the scene description corresponding to each sub-mirror into a multi-mode content generator to generate an initial role image and an initial scene background corresponding to the sub-mirror; inputting the sub-mirror script corresponding to each sub-mirror into the content planning model to generate an initial narrative text corresponding to the sub-mirror; Based on the character description, optimizing the initial character image and the initial scene background to generate a coordination foreground image of visual fusion; and generating a target drawing document according to the coordination foreground image, the initial scene background and the initial narrative text corresponding to each sub-mirror based on the image-text corresponding relation. In one possible implementation manner, the inputting the character description and the scene description corresponding to each sub-mirror into a multi-mode content generator, generating an initial character image and an initial scene background corresponding to the sub-mirror, includes: encoding the scene description and the role description into a first text semantic vector and a second text semantic vector respectively; Generating the initial scene background by the multi-modal content generator on the condition of the first text semantic vector; and generating the initial character image through the multi-mode content generator on the condition of the second text semantic vector. In a possible implementation manner, the optimizing process is performed according to the initial character image and the initial scene background based on the character description, and a coordination foreground image of visual fusion is generated, which includes: generating optimized role images with consistent characteristics for all the sub-mirrors comprising the same initial role image based on the role description; And obtaining visual attribut