KR-102960699-B1 - METHOD AND APPARATUS FOR TRANING IMAGE RESTORATION MODEL

KR102960699B1KR 102960699 B1KR102960699 B1KR 102960699B1KR-102960699-B1

Abstract

A method and apparatus for training an image restoration model are disclosed. An image restoration model training method according to one embodiment includes: a step of generating a first partial occlusion image in which a part of a first object included in the first image is occluded using a first image; a step of training a first generative model to generate an estimated texture map for the first object from the first partial occlusion image by using a texture map generated based on a three-dimensional model corresponding to the first object as ground truth; a step of generating a second partial occlusion image in which a part of a second object included in the second image is occluded using a second image; a step of generating a texture map for the second object from the second partial occlusion image using the trained first generative model; a step of generating an input image for a second generative model using the second partial occlusion image and the texture map for the second object; and a step of training the second generative model to generate an estimated image for the second image from the input image by using the second image as ground truth.

Inventors

노영민
오태현
정용현
유수연

Assignees

삼성에스디에스 주식회사
포항공과대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20210907

Claims (20)

In an image restoration model training method performed by an image restoration model training device , A step of generating a first partial occlusion image in which a part of a first object included in the first image is occluded using the first image; A step of training a first generating model to generate an estimated texture map for the first object from the first partial occlusion image by using a texture map generated based on a three-dimensional model corresponding to the first object as the ground truth; A step of generating a second partial occlusion image in which a portion of a second object included in the second image is occluded using the second image; A step of generating a texture map for the second object from the second partial occlusion image using the above-mentioned learned first generative model; A step of generating an input image for a second generation model using the second partial occlusion image and the texture map for the second object; and An image restoration model training method comprising the step of training a second generation model to generate an estimated image for the second image from the input image using the second image as the correct answer.
In claim 1, The step of training the first generative model comprises: generating a three-dimensional model corresponding to the first object from the first image using a pre-trained extraction model; and The method includes the step of generating a texture map for the first object from the generated three-dimensional model, and An image restoration model training method in which the above extraction model receives a 2D image as input and is pre-trained to generate a 3D model corresponding to an object included in the input 2D image.
In claim 1, The step of training the first generative model is an image restoration model training method in which the first generative model is trained using a loss based on the difference between a texture map generated based on the three-dimensional model and an estimated texture map.
In claim 1, The second generation model above includes a feature vector generation unit that generates a feature vector for the input image; A foreground generation unit that generates a foreground image using a first part of the above feature vector; A background generation unit that generates a background image using the second part of the above feature vector; and An image restoration model learning method comprising a synthesis unit that generates an estimated image for the second image using the foreground image and the background image.
In claim 4, The step of training the second generative model comprises: generating an estimated image for the foreground image, the background image, and the second image from the input image using the second generative model; and A method for training an image restoration model, comprising the step of training a second generation model using at least one of an estimated image for the foreground image, the background image, and the second image, and one or more losses based on the second image.
In claim 5, A method for learning an image restoration model, wherein one or more of the above losses include a loss based on the difference between the second image and the estimated image for the second image.
In claim 5, The step of training the second generative model using the above one or more losses is, The method includes the step of generating feature vectors for each of the second image and the foreground image using a pre-trained segmentation model, One or more of the above losses are, A method for learning an image restoration model, comprising a loss based on the difference between a feature vector for the second image and a feature vector for the foreground image.
In claim 7, The above partitioning model is a model based on a CNN (Convolutional Neural Network) including a plurality of convolution layers, and A method for training an image restoration model, wherein the feature vector for each of the second image and the foreground image is a vector output from a preset layer among the plurality of convolution layers for each of the second image and the foreground image.
In claim 5, The step of training the second generative model using the above one or more losses is, The method includes the step of extracting one or more image patches from each of the second image and the background image using a sliding window, One or more of the above losses are, A method for learning an image restoration model, comprising a loss based on the difference between an image patch extracted from a specific region of the background image and an image patch extracted from a corresponding region of the second image for the specific region.
In claim 9, The image patch extracted from the above-mentioned corresponding area is, In the second image above, if the area at the same location as the specific area is a background area, it is an image patch extracted from the area at the same location, and A method for training an image restoration model, wherein in the second image above, an area at the same location as the specific area is an area containing at least some of the objects, and the image patch is extracted from the background area closest to the specific area.
In claim 5, The step of training the second generative model using the above one or more losses is, The method includes the step of generating a discrimination result for each of the second image and the estimated image for the second image using a discriminator for determining a fake image, and One or more of the above losses are, An image restoration model learning method comprising a loss based on the discrimination result for each of the second image and the estimated image.
A first learning unit that uses a first image to generate a first partial occlusion image in which a part of a first object included in the first image is occluded, and uses a texture map generated based on a three-dimensional model corresponding to the first object as ground truth to train a first generating model to generate an estimated texture map for the first object from the first partial occlusion image; A texture map generation unit that generates a second partial occlusion image in which a portion of a second object included in the second image is occluded using the second image, and generates a texture map for the second object from the second partial occlusion image using the learned first generation model; and An image restoration model learning device comprising a second learning unit that generates an input image for a second generating model using the second partial occlusion image and a texture map for the second object, and learns the second generating model to generate an estimated image for the second image from the input image using the second image as a correct answer.
In claim 12, The first learning unit generates a 3D model corresponding to the first object from the first image using a pre-trained extraction model, and generates a texture map for the first object from the generated 3D model. An image restoration model learning device, wherein the extraction model receives a two-dimensional image as input and is pre-trained to generate a three-dimensional model corresponding to an object included in the input two-dimensional image.
In claim 12, The image restoration model learning device, wherein the first learning unit learns the first generating model using a loss based on the difference between a texture map generated based on the three-dimensional model and an estimated texture map.
In claim 12, The second generation model above includes a feature vector generation unit that generates a feature vector for the input image; A foreground generation unit that generates a foreground image using a first part of the above feature vector; A background generation unit that generates a background image using the second part of the above feature vector; and An image restoration model learning device comprising a synthesis unit that generates an estimated image for the second image using the foreground image and the background image.
In claim 15, An image restoration model learning device, wherein the second learning unit generates an estimated image for the foreground image, the background image, and the second image from the input image using the second generating model, and learns the second generating model using at least one of the estimated image for the foreground image, the background image, and the second image and one or more losses based on the second image.
In claim 16, An image restoration model learning device, wherein one or more of the above losses include a loss based on the difference between the second image and the estimated image for the second image.
In claim 16, The above second learning unit is, Using a pre-trained segmentation model, feature vectors for each of the second image and the foreground image are generated, and One or more of the above losses are, An image restoration model learning device comprising a loss based on the difference between a feature vector for the second image and a feature vector for the foreground image.
In claim 18, The above partitioning model is a model based on a CNN (Convolutional Neural Network) including a plurality of convolution layers, and An image restoration model learning device, wherein the feature vector for each of the second image and the foreground image is a vector output from a preset layer among the plurality of convolution layers for each of the second image and the foreground image.
In claim 16, The above second learning unit is, One or more image patches are extracted from each of the second image and the background image using a sliding window, and One or more of the above losses are, An image restoration model learning device comprising a loss based on the difference between an image patch extracted from a specific region of the background image and an image patch extracted from a corresponding region of the second image for the specific region.

Description

Method and apparatus for training an image restoration model The disclosed embodiments relate to image restoration technology for restoring a blocked image. The technique of restoring occluded images is called in-painting when the lost (or to be created) part is inside the image, and out-painting when it is outside the image. Traditional inpainting techniques use generative models to fill in missing areas by referencing surrounding pixels. However, from the perspective of generative models, it is difficult to distinguish between objects and the background within an image. To address this, it is effective to incorporate the object's semantic information into the generative model as conditioning. In existing conditioning generation methods, limited semantic information, such as 2D segmentation masks or 2D key points, was input as conditions. In the case of 2D segmentation masks, it is difficult to distinguish each detailed part of the object, leading to ambiguity when generating detailed areas. In the case of 2D key points, it is difficult to predict the object's volume, and there is a problem where the distinction between the object and the background becomes ambiguous. Accordingly, conventional technology has the problem that the details of the generated images vary and the quality is poor. FIG. 1 is a configuration diagram of an image restoration model training device according to one embodiment. FIG. 2 is a drawing for exemplarily explaining a first generative model learning process according to one embodiment. FIG. 3 is a diagram illustrating the process of generating an estimated image for a second image using a second generation model according to one embodiment. FIGS. 4 and 5 are drawings for exemplarily illustrating image patch extraction according to one embodiment. FIG. 6 is a flowchart of an image restoration model training method according to one embodiment. FIG. 7 is a flowchart illustrating a second generative model training process according to one embodiment. FIG. 8 is a block diagram illustrating a computing environment including a computing device according to one embodiment. Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to facilitate a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, this is merely illustrative and the present invention is not limited thereto. In describing the embodiments of the present invention, detailed descriptions of known technologies related to the present invention are omitted if it is determined that such detailed descriptions may unnecessarily obscure the essence of the present invention. Furthermore, the terms described below are defined in consideration of their functions within the present invention, and these may vary depending on the intentions or practices of the user or operator. Therefore, such definitions should be based on the content throughout this specification. Terms used in the detailed description are intended merely to describe the embodiments of the present invention and should not be limiting in any way. Unless explicitly stated otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as "include" or "comprise" are intended to refer to certain characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof, and should not be interpreted to exclude the existence or possibility of one or more other characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof other than those described. FIG. 1 is a configuration diagram of an image restoration model learning device according to one embodiment. Referring to FIG. 1, an image restoration model learning device (100) according to one embodiment includes a first learning unit (110), a texture map generating unit (120), and a second learning unit (130). According to one embodiment, the first learning unit (110), the texture map generation unit (120), and the second learning unit (130) may each be implemented using one or more physically separated devices, or by one or more hardware processors or a combination of one or more hardware processors and software, and unlike the illustrated example, they may not be clearly distinguished in specific operation. The image restoration model learning device (100) is a device for learning an image restoration model to restore the occluded part in a partially occluded image in which a part of an object included in the image is occluded. At this time, the fact that a part of an object included in the image is occluded may mean that a part of an object displayed in the image is lost or obscured and not displayed due to image damage, noise addition, text synthesis, etc. Meanwhile, the object included in the image may be a subject such as a person or animal, for example, but is not necessarily