CN-121999073-A - Image generation method and device

CN121999073ACN 121999073 ACN121999073 ACN 121999073ACN-121999073-A

Abstract

The disclosure provides an image generation method and device, which are applied to the technical field of artificial intelligence. The method comprises the steps of obtaining control features, wherein the control features are obtained by carrying out different image task processing on control images by a dynamic mixing expert module, the control images are structural description images corresponding to text features, inputting first data and the control features into an image generation model to obtain a target image, the first data comprise the text features, a plurality of time steps and noise samples, and the control features are used for guiding the noise samples to be restored to images corresponding to structures described by the control images.

Inventors

CHEN YIZHOU
ZHANG WEI
YUAN JIN

Assignees

联想(北京)有限公司

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. An image generation method, comprising: the method comprises the steps of obtaining control features, wherein the control features are obtained by processing different image tasks according to control images by a dynamic mixing expert module, and the control images are structural description images corresponding to text features; and inputting first data and the control features into an image generation model to obtain a target image, wherein the first data comprises the text features, a plurality of time steps and noise samples, and the control features are used for guiding the noise samples to be restored to an image corresponding to a structure described by the control image.
2. The method of claim 1, the obtaining control features comprising: Inputting second data into a multi-condition control module to obtain first characteristics, wherein the second data comprises latent space characteristics, text characteristics and time steps, the latent space characteristics are coding characteristics of structures described by the control image, and the first characteristics are fused representation of the latent space characteristics and the text characteristics of the control image; Inputting the first characteristic into a self-attention module to obtain a second characteristic; Inputting the first feature into the dynamic mixing expert module to obtain a third feature, And determining control characteristics corresponding to each time step according to the second characteristics and the third characteristics.
3. The method of claim 1 or 2, the dynamic blending expert module comprising a plurality of initial image expert models for enhancing different types of features of the control image, the method further comprising: creating routing information corresponding to a first feature according to the first feature; determining at least one target image expert model corresponding to the first feature using the routing information; And processing the first feature by using the at least one target image expert model to obtain a second feature.
4. A method according to claim 3, determining at least one target image expert model corresponding to the first feature using the routing information, comprising: determining candidate image tasks corresponding to the first features by using the routing information; and determining at least one target image expert model from the plurality of initial image expert models according to the candidate image task, the task attribute information of the initial image expert model and the state information of the initial image expert model.
5. The method of claim 4, wherein determining at least one target image expert model from the plurality of initial image expert models based on the candidate image task, task attribute information for the initial image expert model, and state information for the initial image expert model, comprises at least one of: if the candidate image task is the same as task attribute information of a first initial image expert model, and the learning state of the first initial image expert model meets a first condition, taking the first initial image expert model as a first target image expert model, wherein the first condition characterizes that the initial image expert model is in an unconverged state; If the task attribute information of the candidate image task is the same as that of the first initial image expert model, the learning state of the first initial image expert model meets a second condition, the first number of the initial image expert models is smaller than a target threshold, a second candidate image expert model is added as a target image expert model, the second condition characterizes that the initial image expert model is in a convergence state, and the target threshold is an upper limit value of the number of the accommodated image expert models in the dynamic mixing expert module.
6. The method of claim 4, determining at least one target image expert model from the plurality of initial image expert models based on the candidate image task, task attribute information for the initial image expert model, and state information for the initial image expert model, comprising: If the candidate image task is identical to task attribute information of a first initial image expert model, the learning state of the first initial image expert model satisfies a second condition, and a first number of the initial image expert models is greater than or equal to a target threshold, at least one of the following is performed: Deleting a third initial image expert model from the dynamic mixing expert module, and adding a third candidate image expert model as a target image expert model, wherein the target threshold is an upper limit value of the number of the image expert models contained in the dynamic mixing expert module, and the use frequency of the third initial image expert model is smaller than a first threshold; And resetting parameters of the fourth initial image expert model to obtain a target image expert model, wherein the using frequency of the fourth initial image expert model is smaller than a first threshold value.
7. The method of claim 4, wherein determining at least one target image expert model from the plurality of initial image expert models based on the candidate image task, task attribute information for the initial image expert model, and state information for the initial image expert model, comprises at least one of: if the candidate image task is different from the task attribute information of the plurality of initial image expert models, the learning states of the plurality of initial image expert models meet a second condition, the first number of the initial image expert models is smaller than a target threshold, and a fourth candidate image expert model is added as a target image expert model; And if the candidate image task is different from the task attribute information of the plurality of initial image expert models, the learning states of the plurality of initial image expert models meet a first condition, the first number of the initial image expert models is smaller than a target threshold, and a fourth candidate image expert model is added as a target image expert model.
8. The method of claim 4, wherein determining at least one target image expert model from the plurality of initial image expert models based on the candidate image task, task attribute information for the initial image expert model, and state information for the initial image expert model, comprises at least one of: If the candidate image task is different from the task attribute information of the plurality of initial image expert models, and the first number of the initial image expert models is larger than a target threshold, deleting a third initial image expert model from the dynamic mixing expert module, and adding a fifth candidate image expert model as a target image expert model, wherein the use frequency of the fifth initial image expert model is smaller than the first threshold; And if the candidate image task is different from the task attribute information of the plurality of initial image expert models, and the first number of the initial image expert models is larger than a target threshold value, resetting parameters of a third initial image expert model to obtain a target image expert model, wherein the use frequency of the third initial image expert model is smaller than the first threshold value.
9. The method of claim 5, further comprising at least one of: If the frequency of use of the third initial image expert model is greater than a second threshold, and the third initial image expert model is used to process a target image task, fixing parameters of the second initial image expert model so that the parameters of the second initial image expert model are untrainable; And if the use frequency of the fourth initial image expert model is greater than a second threshold value, and the fourth initial image expert model is used for processing the target image task, the learning rate of the second initial image expert model is always smaller than a third threshold value.
10. An image generating apparatus comprising: The dynamic mixing expert module is used for processing different image tasks according to control images, wherein the control images are structural description images corresponding to text features; The image generation module can independently execute or call an image generation model to execute the following operation that first data and the control feature are input into the image generation model to obtain a target image, wherein the first data comprises the text feature, a plurality of time steps and a noise sample, and the control feature is used for guiding the noise sample to restore to an image corresponding to a structure described by the control image.

Description

Image generation method and device Technical Field The disclosure relates to the technical field of artificial intelligence, and in particular relates to an image generation method and device. Background At present, the image generation model cannot fully understand the increasingly complex and personalized image generation requirements of users, and the experience of the users is reduced. Disclosure of Invention In view of this, the present disclosure provides an image generating method and apparatus. According to a first aspect of the present disclosure, there is provided an image generating method, including obtaining a control feature, the control feature being obtained by performing different image task processing according to a control image by a dynamic blending expert module, the control image being a structure description image corresponding to a text feature, inputting first data and the control feature into an image generating model to obtain a target image, the first data including the text feature, a plurality of time steps, and a noise sample, the control feature being used to guide the noise sample to be restored to an image corresponding to the structure described by the control image. The second aspect of the disclosure provides an image generating device, which comprises an obtaining module, an image generating module and an image generating module, wherein the obtaining module is used for obtaining control features, the control features are obtained by carrying out different image task processing according to control images by a dynamic mixing expert module, the control images are structure description images corresponding to text features, the image generating module can independently execute or call an image generating model, first data and the control features are input into the image generating model to obtain a target image, the first data comprises the text features, a plurality of time steps and noise samples, and the control features are used for guiding the noise samples to be restored to images corresponding to structures described by the control images. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which: FIG. 1 schematically illustrates an application scenario diagram of an image generation method and apparatus according to an embodiment of the present disclosure; FIG. 2 schematically illustrates a flow chart of an image generation method according to an embodiment of the present disclosure; FIG. 3 schematically illustrates a schematic diagram of a dynamic hybrid expert module according to an embodiment of the present disclosure; FIG. 4 schematically illustrates a schematic diagram of an image generation method according to an embodiment of the present disclosure; fig. 5 schematically illustrates an effect diagram of an image generating method according to an embodiment of the present disclosure; Fig. 6 schematically shows a block diagram of the structure of an image generating apparatus according to an embodiment of the present disclosure. Detailed Description Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components. All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be const