CN-121999067-A - Image generation method, device, electronic equipment, storage medium and program product

CN121999067ACN 121999067 ACN121999067 ACN 121999067ACN-121999067-A

Abstract

The present disclosure relates to an image generation method, apparatus, electronic device, storage medium, and program product, the method including acquiring a preset object image including an object to be synthesized and an image to be processed; inputting an image to be processed and a preset object image into a preset image generation model, and generating a target synthesized image on the basis of mapping and correcting the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed, wherein the target synthesized image is an image obtained by synthesizing the object to be synthesized into the region to be synthesized in the image to be processed. By using the embodiment of the disclosure, the model can be helped to pay attention to high-frequency details such as texture details, edge contours and the like of the object to be synthesized, the fidelity of the details is improved, the degree of detail reduction in the synthesized image can be further improved, and the image synthesis effect is effectively improved.

Inventors

PENG YUXIN
ZHAO GUOHAO
Cao Jiajiong

Assignees

北京大学
北京达佳互联信息技术有限公司

Dates

Publication Date: 20260508
Application Date: 20251212

Claims (13)

1. An image generation method, comprising: acquiring a preset object image comprising an object to be synthesized and an image to be processed; Inputting the image to be processed and the preset object image into a preset image generation model, and generating a target synthesized image on the basis of mapping and correcting the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed, wherein the target synthesized image is an image obtained by synthesizing the object to be synthesized into the region to be synthesized in the image to be processed.
2. The image generation method according to claim 1, characterized in that the method further comprises: determining first position information of an area where the object to be synthesized is located in the preset object image and second position information of an area to be synthesized in the image to be processed; Inputting the image to be processed and the preset object image into a preset image generation model, and generating a target synthetic image on the basis of performing mapping correction on the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed, wherein the generating comprises the following steps: Inputting the image to be processed, the preset object image, the first position information and the second position information into a preset image generation model, and generating the target synthetic image on the basis of carrying out mapping correction on the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed.
3. The image generation method according to claim 2, wherein the preset image generation model includes a first image encoding network, a first position encoding network, at least one second image encoding network and a first image decoding network which are sequentially connected, the inputting the image to be processed, the preset object image, the first position information and the second position information into the preset image generation model, and the generating the target synthesized image includes: Inputting the image to be processed and the preset object image into the first image coding network for image coding processing to obtain a first image code; inputting the first position information and the second position information into the first position coding network, and performing position coding processing on the basis of correcting the position information of each object image block of the region where the object to be synthesized is located in the preset object image to the position information corresponding to the region to be synthesized of the object to be synthesized in the image to be processed, so as to obtain a corrected first position code of each object image block of the region where the object to be synthesized is located in the preset object image; Inputting a first current image code and the first position code into a first current coding network to perform image coding processing to obtain a first target image code, wherein the first current image code is the first image code under the condition that the first current coding network is a first second image coding network in the at least one second image coding network; in the case that the first current encoding network is a non-first second image encoding network of the at least one second image encoding network, the first current image encoding is the first target image encoding output by a second image encoding network preceding the first current encoding network; And inputting the first target image code output by the last second image coding network in the at least one second image coding network into the first image decoding network for decoding processing to obtain the target synthetic image.
4. The image generation method according to claim 3, wherein the preset image generation model further includes at least one first correction coding network sequentially connected, any one first correction coding network includes a second position coding network and a third image coding network sequentially connected, the inputting the image to be processed, the preset object image, the first position information and the second position information into the preset image generation model, and the generating the target synthesized image further includes, on the basis of performing mapping correction on the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed: Inputting a second current image code into the second position code network in the first current correction network, and on the basis of the similarity between each object image block corresponding to the preset object image and each image block of the region to be synthesized in the image to be processed, learning the position mapping relation between the region where the object to be synthesized is located and the region to be synthesized, performing position correction on each object image block to obtain a corrected second position code of each object image block; Inputting the second position code output by the second position code network in the first current correction network and the second current image code into the third image code network in the first correction code network to perform image code processing to obtain a second target image code, wherein the second current image code is the first target image code output by the last second image code network in the at least one second image code network when the first current correction network is the first correction code network in the at least one first correction code network; The step of inputting the first target image code output by the last second image code network in the at least one second image code network into the first image decoding network for decoding processing, and the step of obtaining the target synthetic image comprises the following steps: And inputting the second target image code output by the last first correction coding network in the at least one first correction coding network into the first image decoding network for decoding processing to obtain the target synthetic image.
5. The image generation method according to claim 3, wherein the preset image generation model further includes at least one second correction coding network and at least one fourth image coding network, which are sequentially connected, any one of the second correction coding networks including a fifth image coding network and a third position coding network, which are sequentially connected, the image to be processed, the preset object image, the first position information, and the second position information are input into the preset image generation model, and the generating of the target synthesized image further includes, on the basis of performing mapping correction on the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed: Inputting a third current image code and the first position code into the fifth image code network in a second current correction network to perform image code processing to obtain a third target image code, wherein the third current image code is the first target image code output by the last second image code network in the at least one second image code network when the second current correction network is the first second correction code network in the at least one second correction code network; Inputting the third target image code output by the fifth image code network in the second current correction network into the third position code network in the second current correction network, and learning the position mapping relation between the region where the object to be synthesized is located and the region to be synthesized on the basis of the similarity between each object image block corresponding to the preset object image and each image block of the region to be synthesized in the image to be processed, so as to obtain a third position code after the correction of each object image block; Inputting a fourth current image code and a fourth position code into a second current code network to perform image code processing to obtain a fourth target image code, wherein the fourth current image code is the third target image code output by the last second correction code network in the at least one second correction code network when the second current code network is the first fourth image code network in the at least one fourth image code network, the fourth current image code is the fourth target image code output by the fourth image code network before the second current code network when the second current code network is the first fourth image code network, and the fourth position code is the average value of the third position codes output by the at least one second correction code network; The step of inputting the first target image code output by the last second image code network in the at least one second image code network into the first image decoding network for decoding processing, and the step of obtaining the target synthetic image comprises the following steps: and inputting the fourth target image code output by the last fourth image coding network in the at least one fourth image coding network into the first image decoding network for decoding processing to obtain the target synthetic image.
6. The image generation method according to claim 1, wherein the preset image generation model includes a sixth image encoding network, at least one third correction encoding network and a second image decoding network which are sequentially connected, any one of the second correction encoding networks includes a fourth position encoding network and a seventh image encoding network which are sequentially connected, the inputting the image to be processed and the preset object image into the preset image generation model, and the generating the target synthesized image based on the mapping correction of the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed includes: inputting the image to be processed and the preset object image into the sixth image coding network for image coding processing to obtain a second image code; Inputting a fifth current image code into the fourth position code network in a third current correction network, and on the basis of learning a position mapping relation between an area where the object to be synthesized is located and each image block of the area to be synthesized in the image to be processed based on similarity between each object image block of the area where the object to be synthesized is located in the preset object image and each image block of the area to be synthesized in the image to be processed, performing position correction on each object image block to obtain a corrected fifth position code of each object image block; Inputting the fifth position code output by the fourth position code network in the third current correction network and the fifth current image code into the seventh image code network in the third current correction network to perform image code processing to obtain a fifth target image code, wherein the fifth current image code is the second image code when the third current correction network is the first third correction code network in the at least one third correction code network; in the case that the third current correction network is not the first third correction coding network of the at least one third correction coding network, the fifth current image code is the fifth target image code output by the third correction coding network preceding the third current correction network; And inputting the fifth target image code output by the last third correction coding network in the at least one third correction coding network into the second image decoding network for decoding processing to obtain the target synthetic image.
7. The image generation method according to claim 1, wherein the preset image generation model includes an eighth image coding network, at least one fourth correction coding network sequentially connected, at least one ninth image coding network sequentially connected, and a third image decoding network, any one fourth correction coding network includes a tenth image coding network and a fifth position coding network sequentially connected, the inputting the image to be processed and the preset object image into the preset image generation model, and generating a target synthesized image based on performing mapping correction on position information of an area where the object to be synthesized is located in the preset object image based on position information of an area to be synthesized in the image to be processed includes: inputting the image to be processed and the preset object image into the eighth image coding network for image coding processing to obtain a third image code; inputting a sixth current image code into the tenth image code network in a fourth current correction network to perform image code processing to obtain a sixth target image code; in the case that the fourth current modification network is the first fourth modification encoding network of the at least one fourth modification encoding network, the sixth current image encoding is the third image encoding; in the case that the fourth current modification network is not the first fourth modification encoding network of the at least one fourth modification encoding network, the sixth current image encoding is the sixth target image encoding output by a fourth modification encoding network preceding the fourth current modification network; Inputting the sixth target image code output by the tenth image code network in the fourth current correction network into the fifth position code network in the fourth current correction network, and learning the position mapping relation between the region where the object to be synthesized is located and the region to be synthesized on the basis of similarity between each object image block based on the region where the object to be synthesized is located in the preset object image and each image block of the region to be synthesized in the image to be processed, so as to obtain a corrected sixth position code of each object image block; Inputting a seventh current image code and a seventh position code into a third current code network to perform image code processing to obtain a seventh target image code, wherein the seventh current image code is the sixth target image code output by the last second correction code network in the at least one fourth correction code network when the third current code network is the first ninth image code network in the at least one ninth image code network, the seventh current image code is the seventh target image code output by the ninth image code network before the third current code network when the third current code network is the non-first ninth image code network, and the seventh position code is the average value of the sixth position codes output by the at least one fourth correction code network; And inputting the seventh target image code output by the last ninth image coding network in the at least one ninth image coding network into the third image decoding network for decoding processing to obtain the target synthetic image.
8. The image generation method according to any one of claims 1 to 7, characterized in that the method further comprises: Acquiring a sample image to be processed corresponding to a current training round, a sample object image comprising a sample synthesis object, a preset synthesis image and preset position information from preset training data, wherein the preset position information characterizes the sample image to be processed corresponding to the current training round, the sample object image comprising the sample synthesis object corresponding to the current training round, the preset synthesis image corresponding to the current training round and the preset position information corresponding to the current training round; Inputting the sample image to be processed and the sample object image into a to-be-trained image generation model, and generating a predicted synthesized image and a predicted position code on the basis of carrying out mapping correction on the position information of the region where the sample synthesized object is located in the sample object image based on the position information of the region to be synthesized in the sample image to be processed; determining a target loss based on the preset synthetic image, the preset position information, the predicted synthetic image and the predicted position code; And performing iterative training on the image generation model to be trained based on the target loss to obtain the preset image generation model.
9. The image generation method according to claim 8, characterized in that the method further comprises: Acquiring third position information of an area where a sample synthesized object is located in the sample object image of the current training round and fourth position information of an area to be synthesized in the sample image to be processed of the current training round from the preset training data; Inputting the sample image to be processed and the sample object image into a to-be-trained image generation model, and generating a predicted synthesized image and a predicted position code on the basis of performing mapping correction on the position information of the region where the sample synthesized object is located in the sample object image based on the position information of the region to be synthesized in the sample image to be processed comprises: Inputting the sample image to be processed, the sample object image, the third position information and the fourth position information into the image to be trained to generate a model, and generating the prediction synthesized image and the prediction position code on the basis of performing mapping correction on the position information of the region where the sample synthesized object is located in the sample object image based on the position information of the region to be synthesized in the sample image to be processed.
10. An image generating apparatus, comprising: an image acquisition module configured to perform acquisition of a preset object image including an object to be synthesized and an image to be processed; The first image generation module is configured to input the image to be processed and the preset object image into a preset image generation model, and generate a target synthesized image on the basis of mapping and correcting the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed, wherein the target synthesized image is an image obtained by synthesizing the object to be synthesized into the region to be synthesized in the image to be processed.
11. An electronic device, comprising: A processor; a memory for storing the processor-executable instructions; Wherein the processor is configured to execute the instructions to implement the image generation method of any of claims 1 to 9.
12. A computer-readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image generation method of any one of claims 1 to 9.
13. A computer program product comprising computer program instructions which, when executed by a processor of a computer, cause the computer to perform the image generation method of any of claims 1 to 9.

Description

Image generation method, device, electronic equipment, storage medium and program product Technical Field The present disclosure relates to the field of computer vision, and in particular, to an image generating method, an image generating device, an electronic device, a storage medium, and a program product. Background At present, in the image synthesis process of virtual reloading, portrait synthesis and the like, a target to be synthesized can be naturally attached to a designated synthesis area in the image to be synthesized through a deep learning model under the condition that the image to be synthesized and the image comprising the target to be synthesized are given, for example, clothing is naturally attached to a clothing area corresponding to a person in the image to be synthesized, a person is naturally attached to the designated area in the image to be synthesized and the like. However, in the existing two image synthesis processes, the information of the clothing waiting for the synthesis object is generally injected into the figure image waiting for the synthesis image in a manner of contextual learning and the like, but the information of the purely injected object is often difficult to reliably pay attention to high-frequency details of the object to be synthesized, the detail fidelity is often insufficient, and further the problems of poor detail reduction degree, poor image synthesis effect and the like in the synthesized image are also caused. Disclosure of Invention The disclosure provides an image generation method, an image generation device, an electronic device, a storage medium and a program product, so as to at least solve the technical problems that high-frequency details of an object to be synthesized are difficult to be focused reliably in the related art, the fidelity of the details is often insufficient, the reduction degree of the details in a synthesized image is poor, the image synthesis effect is relatively good, and the like. The technical scheme of the present disclosure is as follows: according to a first aspect of an embodiment of the present disclosure, there is provided an image generating method including: acquiring a preset object image comprising an object to be synthesized and an image to be processed; Inputting the image to be processed and the preset object image into a preset image generation model, and generating a target synthesized image on the basis of mapping and correcting the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed, wherein the target synthesized image is an image obtained by synthesizing the object to be synthesized into the region to be synthesized in the image to be processed. According to a second aspect of the embodiments of the present disclosure, there is provided an image generating apparatus including: an image acquisition module configured to perform acquisition of a preset object image including an object to be synthesized and an image to be processed; The first image generation module is configured to input the image to be processed and the preset object image into a preset image generation model, and generate a target synthesized image on the basis of mapping and correcting the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed, wherein the target synthesized image is an image obtained by synthesizing the object to be synthesized into the region to be synthesized in the image to be processed. In an alternative embodiment, the apparatus further comprises: The first position information acquisition module is configured to execute the first position information of the region where the object to be synthesized is located in the preset object image and the second position information of the region to be synthesized in the image to be processed; The first image generation module is further configured to perform inputting the image to be processed, the preset object image, the first position information and the second position information into a preset image generation model, and generate the target synthetic image on the basis of performing mapping correction on the position information of the region where the object to be synthesized is located in the preset object image based on the position information of the region to be synthesized in the image to be processed. In an alternative embodiment, the preset image generation model comprises a first image coding network, a first position coding network, at least one second image coding network and a first image decoding network which are sequentially connected, wherein the first image generation module comprises: A first image coding unit configured to perform image coding processing by inp