EP-4200795-B1 - SYSTEMS AND METHODS FOR INPAINTING IMAGES AT INCREASED RESOLUTION

EP4200795B1EP 4200795 B1EP4200795 B1EP 4200795B1EP-4200795-B1

Inventors

KIM, Soo, Ye
LIBA, ORLY
GARG, RAHUL
KANAZAWA, Noritsugu
WADHWA, NEAL
ABERMAN, Kfir
CHANG, HUIWEN

Dates

Publication Date: 20260506
Application Date: 20211014

Claims (15)

A system comprising: a computing device (102), comprising: one or more processors (104); a memory (106); and a non-transitory computer readable medium having instructions stored thereon that when executed by a processor cause performance of a set of functions, wherein the set of functions comprises: receiving an input image (202), wherein the input image includes one or more masked regions to be inpainted; providing the input image to a first neural network (204), wherein the first neural network outputs a first inpainted image (206) at a first resolution, and wherein the one or more masked regions are inpainted in the first inpainted image; creating a second inpainted image (208) by increasing a resolution of the first inpainted image from the first resolution to a second resolution, wherein the second resolution is greater than the first resolution such that the one or more inpainted masked regions have an increased resolution; and providing the second inpainted image to a second neural network (210), wherein the second neural network outputs a first refined inpainted image (212) at the second resolution, and wherein the first refined inpainted image is a refined version of the second inpainted image.
The system of claim 1, wherein the computing device (102), the first neural network (204), and the second neural network are part of a server system; or, wherein creating the second inpainted image (208) comprises providing the first inpainted image (206) to a super-resolution network, wherein the super-resolution network outputs the second inpainted image at the second resolution.
The system of claim 1, the set of functions further comprising: downsampling the first refined inpainted image (212) to create a second refined inpainted image at the first resolution; and using the second refined inpainted image as an output image; or the set of functions further comprising: receiving a request for an output image from a second computing device; based on the request, determining to output the first refined inpainted image (212) rather than downsampling the first refined inpainted image to create a second refined inpainted image; and using the first refined inpainted image as the output image, wherein the output image is an inpainted version of the input image at an increased resolution relative to the input image (202).
The system of claim 1, the set of functions further comprising: determining an operational context for the refined inpainted image (212); and based on the determined operational context, downsampling the first refined inpainted image to create a second refined inpainted image with a third resolution that is less than the second resolution; optionally wherein the operational context corresponds to a data processing threshold associated with a request from a second computing device.
The system of claim 1, wherein a first mask defines the one or more masked regions, the set of functions further comprising: creating a second mask by increasing a resolution of the first mask from the first resolution to the second resolution; and while providing the second inpainted image (208) to the second neural network (210), providing the second mask to the second neural network.
The system of claim 1, wherein the first neural network (204) corresponds to the first resolution and the second neural network (210) corresponds to the second resolution, the set of functions further comprising: training the first neural network and the second neural network simultaneously using the input image (202) and the second inpainted image (208).
A method comprising: receiving, by a computing device (102), an input image (202), wherein the input image includes one or more masked regions to be inpainted; providing, by the computing device, the input image to a first neural network (204), wherein the first neural network outputs a first inpainted image (206) at a first resolution, and wherein the one or more masked regions are inpainted in the first inpainted image; creating, by the computing device, a second inpainted image (208) by increasing a resolution of the first inpainted image from the first resolution to a second resolution, wherein the second resolution is greater than the first resolution such that the one or more inpainted masked regions have an increased resolution; and providing, by the computing device, the second inpainted image to a second neural network (210), wherein the second neural network outputs a first refined inpainted image (212) at the second resolution, and wherein the first refined inpainted image is a refined version of the second inpainted image.
The method of claim 7, wherein creating the second inpainted image (208) comprises providing the first inpainted image (206) to a super-resolution network, wherein the super-resolution network outputs the second inpainted image at the second resolution.
The method of claim 7, further comprising: downsampling the first refined inpainted image (212) to create a second refined inpainted image at the first resolution; and using the second refined inpainted image as an output image; or further comprising: downsampling the first refined inpainted image to create a second refined inpainted image; receiving a request for an output image from a second computing device; and based on the request, using the first refined inpainted image as the output image rather than the second refined inpainted image as the output image.
The method of claim 7, further comprising: determining an operational context for the refined inpainted image (212); and based on the determined operational context, downsampling the first refined inpainted image to create a second refined inpainted image with a third resolution that is less than the second resolution; optionally wherein the operational context corresponds to a data processing threshold associated with a request from a second computing device.
The method of claim 7, wherein a first mask defines the one or more masked regions, the method further comprising: creating a second mask by increasing a resolution of the first mask from the first resolution to the second resolution; and while providing the second inpainted image (208) to the second neural network (210), providing the second mask to the second neural network.
The method of claim 11, further comprising training the second neural network (210) using both the second inpainted image (208) and the second mask.
The method of claim 11, wherein increasing the resolution of the first inpainted image (206) from the first resolution to the second resolution is performed differently than increasing the resolution of the first mask from the first resolution to the second resolution.
The method of claim 7, wherein the first neural network (204) corresponds to the first resolution and the second neural network (210) corresponds to the second resolution, the method further comprising: training the first neural network and the second neural network simultaneously using the input image (202) and the second inpainted image (208).
A non-transitory computer readable medium having instructions stored thereon that when executed by a processor (104) cause performance of a set of functions, wherein the set of functions comprises: receiving an input image (202), wherein the input image includes one or more masked regions to be inpainted; providing the input image to a first neural network (204), wherein the first neural network outputs a first inpainted image (206) at a first resolution, and wherein the one or more masked regions are inpainted in the first inpainted image; creating a second inpainted image (208) by increasing a resolution of the first inpainted image from the first resolution to a second resolution, wherein the second resolution is greater than the first resolution such that the one or more inpainted masked regions have an increased resolution; and providing the second inpainted image to a second neural network (210), wherein the second neural network outputs a first refined inpainted image (212) at the second resolution, and wherein the first refined inpainted image is a refined version of the second inpainted image.

Description

BACKGROUND Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section. Images can represent aspects of a scene that a system may automatically remove. For example, one or more aspects of the scene may distract a viewer of the image from an intended subject of the image (e.g., a person or piece of artwork). The system may remove these aspects of the environment from the image, leaving blank areas to be inpainted. Inpainting the blank areas allows the image to appear cohesive while also omitting the aspects of the scene. Reference is made to Yi Zili et al: "Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting", which presents data-driven image inpainting methods. Reference is also made to O. Chakraborty ET AL: "Deep image inpainting with region prediction at hierarchical scales", COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, (2016-12-18), pages 1-8, which discloses a CNN based method for image inpainting, which utilizes the inpaintings generated at different hierarchical resolutions. Firstly, a prediction of the missing image region with larger contextual information at the lowest resolution using deconvolution layers is performed. Secondly, the predicted region are refined at greater hierarchical scales by imposing gradually reduced contextual information surrounding the predicted region by training different CNNs. Patent Literature WO2020/108358 A1 (TENCENT TECH SHENZHEN CO LTD [CN]) 4 June 2020, discloses a method to repair an image in specific regions which have been altered. The first step consists in performing feature extraction on the non-repair area based on different receptive domains and spatial resolutions to obtain feature information of multiple scales. The second step consists in generating the texture of the region to be repaired according to the feature information of the multiple scales, and the last step consists in filling the generated texture into the area to be repaired to obtain the repaired image. Patent Literature US2019/287283 A1 (LIN ZHE [US] ET AL) 19 September 2019, discloses a user-guided method for performing image completion tasks, which involves, generating image content for insertion into one or more completion regions of an incomplete image (e.g., regions having missing content or content to be replaced). These completion tasks are performed with an image completion neural network that is trained to generate suitable image content for insertion based on one or more guidance inputs that a user has supplied for the completion region. SUMMARY A system according to claim 1 is provided. A method according to claim 7 is provided. A non-transitory computer readable medium according to claim 15 is provided. Other aspects, embodiments, and implementations will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings. BRIEF DESCRIPTION OF THE FIGURES Figure 1 is a block diagram of a system, according to an example embodiment.Figure 2A is a flow chart of a method for image inpainting implemented by a system, according to an example embodiment.Figure 2B is a portion of the flow chart of a method for image inpainting implemented by a system, according to an example embodiment.Figure 2C is a portion of the flow chart of a method for image inpainting implemented by a system, according to an example embodiment.Figure 2D is a portion of the flow chart of a method for image inpainting implemented by a system, according to an example embodiment.Figure 2E is a portion of the flow chart of a method for image inpainting implemented by a system, according to an example embodiment.Figure 3 is a block diagram of a method, according to an example embodiment. DETAILED DESCRIPTION Example methods, devices, and systems are described herein. It should be understood that the words "example" and "exemplary" are used herein to mean "serving as an example. instance, or illustration." Any embodiment or feature described herein as being an "example" or "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or features. Thus, the example embodiments described herein are not meant to be limiting. Aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are contemplated herein. Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each e