CN-121999085-A - Image generation method and system based on multi LoRA fusion

CN121999085ACN 121999085 ACN121999085 ACN 121999085ACN-121999085-A

Abstract

The application relates to the technical field of clothing picture processing, in particular to an image generation method and system based on multi LoRA fusion, wherein the method comprises the steps of training and generating a figure LoRA model based on a figure label separation mode, training and generating a clothing LoRA model based on a clothing segmentation mode and training and generating a scene LoRA model based on a head portrait mask mode; and fusing and updating the character LoRA model, the clothing LoRA model and the scene LoRA model according to preset weights to obtain a target prompt word set, and generating a target fused image based on the target prompt word set and the updated preset text-to-image model. By adopting the technical scheme, the guide training is carried out in the preset text-generated graph model according to the target prompt word set, and the characteristics of the three LoRA models can be fused clearly and conflict-free when the preset text-generated graph model is inferred, so that a target fusion image with accurate appearance, high clothing reduction degree and strong scene atmosphere sense of the host is generated.

Inventors

WANG JIANGUO
PENG JIQUN

Assignees

杭州霖润智能科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260122

Claims (10)

1. An image generation method based on multi LoRA fusion is characterized by comprising the following steps: training to generate a figure LoRA model based on a figure label separation mode, training to generate a garment LoRA model based on a garment segmentation mode, and training to generate a scene LoRA model based on an avatar mask mode; fusing and updating a character LoRA model, a clothing LoRA model and a scene LoRA model according to preset weights to obtain a preset text-to-figure model; and acquiring a target prompt word set, and generating a target fusion image based on the target prompt word set and the updated preset text-to-image model.
2. The method for generating images based on multi LoRA fusion according to claim 1, wherein training to generate the figure LoRA model based on the figure tag separation method includes the steps of: Collecting image data of a picture to be trained, and setting feature activation words corresponding to the picture to be trained based on the picture to be trained; Calling a visual understanding large model to label the image data to generate a label characteristic data set, wherein the label characteristic data set comprises a plurality of groups of A characteristic labels and B characteristic labels; Combining the feature tag A and the feature tag B to obtain a training description text corresponding to the picture to be trained; And carrying out standard training on the image data of the picture to be trained and the training description text to generate a character LoRA model with the characteristic activation word strongly associated with the A-type tag.
3. The image generation method based on multi-LoRA fusion according to claim 2, wherein the target fusion image includes a combined character feature, wherein the combined character feature is generated in a manner including the steps of: acquiring an activation matching word based on the target prompt word set, and determining a corresponding feature activation word based on the activation matching word; And performing guide training on the feature activation words in the figure LoRA model to obtain target feature pictures, wherein the target feature pictures comprise A feature tags corresponding to the feature activation words.
4. The multi LoRA fusion-based image generation method according to claim 1, wherein the head mask-based training generation scene LoRA model includes the steps of: Collecting scene image data to be trained which accords with a desired style, and identifying head portrait areas based on the scene image data to be trained; Determining mask features according to the head portrait region, and shielding the head portrait region based on the mask features to obtain a scene training data set; Training is performed based on the scene training dataset to generate a scene LoRA model.
5. The multi LoRA fusion-based image generation method of claim 4, wherein the training based on the scene training dataset to generate a scene LoRA model comprises the steps of: Carrying out structured marking on background model clothes of scene pictures in the image data of the scene to be trained to obtain a scene training text; the scene is trained based on the scene training dataset and the scene training text to generate a scene LoRA model.
6. The method for generating images based on multi LoRA fusion according to claim 1, wherein generating a garment LoRA model based on garment segmentation training includes the steps of: Acquiring a clothing posture picture meeting requirements, and dividing a clothing region of the clothing posture picture to acquire a clothing training picture; labeling the clothing posture picture to obtain clothing text labels, and training the clothing training picture and the clothing text labels to generate a clothing LoRA model.
7. The image generation method based on multi LoRA fusion according to claim 1, wherein obtaining a target cue word set and generating a target fusion image based on the target cue word set and an updated preset text-to-image model includes the steps of: Acquiring a model feature parameter set based on a target prompt word set, and updating a preset weight based on the feature parameter set to acquire a comparison fusion image set, wherein the comparison fusion image set comprises a comparison fusion image and a corresponding fusion feature index; and screening out the comparison fusion image with the highest fusion characteristic index from the comparison fusion image set as a target fusion image.
8. The multi LoRA fusion-based image generation method according to claim 4, wherein generating mask features based on the header area includes the steps of: acquiring a boundary tangent line based on the head portrait area, extending based on the boundary tangent line to acquire a closed area, and updating the head portrait area based on the closed area; Acquiring a coordinate position based on the updated head portrait region, and defining coordinates of a mask region based on the coordinate position; The pixel value is set to 0 within the mask region defining the coordinates to generate a mask feature.
9. The multi LoRA fusion-based image generation method according to claim 2, wherein collecting image data of a picture to be trained includes the steps of: and acquiring a picture training set, and screening based on the picture training set to acquire the picture to be trained which accords with the preset condition.
10. An image generation system based on multi LoRA fusion, wherein an image generation method based on multi LoRA fusion according to any one of claims 1 to 9 is performed, comprising: The model training module is used for training and generating a figure LoRA model based on a figure label separation mode, a garment LoRA model based on a garment segmentation mode and a scene LoRA model based on an head portrait mask mode; The model loading module fuses and updates a character LoRA model, a clothing LoRA model and a scene LoRA model according to preset weights to form a preset text-to-figure model; The image generation module is used for acquiring a target prompt word set and generating a target fusion image based on the target prompt word set and the updated preset meristematic figure model.

Description

Image generation method and system based on multi LoRA fusion Technical Field The application relates to the technical field of clothing image processing, in particular to an image generation method and system based on multi LoRA fusion. Background In recent years, AI image generation techniques typified by Diffusion models (Diffusion models) have been advanced in breakthrough. On this basis, low-Rank Adaptation (LoRA) technology is widely used as an efficient fine tuning method. LoRA allow a user to quickly train out a lightweight model that can reproduce a specific concept (e.g., a specific character, clothing style, scene style, etc.) using a small number (typically 5-20) of pictures of the concept. In complex business applications, such as the generation of AI garment model diagrams, it is often necessary to combine multiple LoRA models to achieve fine control over characters, garments, and scenes. For example, loRA for a "specific character", loRA for a "specific garment", and LoRA for a "specific scene" are loaded simultaneously at the time of reasoning. This approach has the advantage of extremely high flexibility and combinability, enabling rapid generation of images meeting diverse requirements. However, the prior art, when combining multiple LoRA models directly, is generally associated with serious "concept conflicts" or "feature contamination" problems. Inter-subject feature interference-in the material drawings for training the character LoRA, the clothing features worn by the model itself can interfere with and "contaminate" the garment LoRA that requires accurate restoration, resulting in the resulting garment having the wrong style, color, or texture. Background penetration into main body features in the material diagram for training scene LoRA, if other characters are included, the features such as face, hairstyle and the like of the character can 'penetrate' to the main model body, so that the appearance and hairstyle of the main model are tampered, and the consistency and accuracy cannot be maintained. The main current approach to resolving such conflicts is to use complex Negative Prompt (Negative Prompt) to combat by manually adjusting the weights of each LoRA, or to iterate "graphically" corrections. The method is complex in operation, depends on manual experience, is unstable in effect, and is difficult to fundamentally solve the problem of feature conflict, so that the success rate of the graph is low, the fidelity of the image is poor, and the efficiency of workflow is low. Disclosure of Invention In order to realize 'plug and play' among different concept LoRA models, reduce conflict during reasoning and improve the overall quality and controllability of the finally generated image, the application provides an image generation method and system based on multi LoRA fusion. In a first aspect, the present application provides an image generating method based on multi LoRA fusion, which adopts the following technical scheme: An image generation method based on multi LoRA fusion comprises the following steps: training to generate a figure LoRA model based on a figure label separation mode, training to generate a garment LoRA model based on a garment segmentation mode, and training to generate a scene LoRA model based on an avatar mask mode; fusing and updating a character LoRA model, a clothing LoRA model and a scene LoRA model according to preset weights to obtain a preset text-to-figure model; and acquiring a target prompt word set, and generating a target fusion image based on the target prompt word set and the updated preset text-to-image model. By adopting the technical scheme, the method and the device for training the character image based on the text-generated graph have the advantages that the guiding training is carried out in the preset text-generated graph model according to the target prompt word set, the characteristics of three LoRA models can be fused clearly and conflict-free during reasoning of the preset text-generated graph model, a target fusion image which is accurate in main body appearance, high in clothing reduction degree and strong in scene atmosphere sense is generated, in addition, the preset text-generated graph model is updated according to the fusion of the character LoRA model, the clothing LoRA model and the scene LoRA model according to preset weights, so that 'plug and play' among different concept LoRA models can be realized, conflicts during reasoning are reduced, and the overall quality and controllability of the finally generated image are improved. In some embodiments, training to generate the persona LoRA model based on the persona tag separation manner includes the steps of: Collecting image data of a picture to be trained, and setting feature activation words corresponding to the picture to be trained based on the picture to be trained; Calling a visual understanding large model to label the image data to generate a label characteristic data s