US-12626423-B2 - One-click image extension with quick mask adjustment

US12626423B2US 12626423 B2US12626423 B2US 12626423B2US-12626423-B2

Abstract

Systems and methods for image processing (e.g., image extension or image uncropping) using neural networks are described. One or more aspects include obtaining an image (e.g., a source image, a user provided image, etc.) having an initial aspect ratio, and identifying a target aspect ratio (e.g., via user input) that is different from the initial aspect ratio. The image may be positioned in an image frame having the target aspect ratio, where the image frame includes an image region containing the image and one or more extended regions outside the boundaries of the image. An extended image may be generated (e.g., using a generative neural network), where the extended image includes the image in the image region as well as generated image portions in the extended regions and the one or more generated image portions comprise an extension of a scene element depicted in the image.

Inventors

Yuqian ZHOU
Elya Shechtman
Zhe Lin
Krishna Kumar Singh
Jingwan Lu
Connelly Stuart Barnes
Sohrab Amirghodsi

Assignees

ADOBE INC.

Dates

Publication Date: 20260512
Application Date: 20240320

Claims (20)

1 . A method comprising: obtaining an image having an initial aspect ratio and depicting a scene element; identifying, using an aspect ratio selection element, a target aspect ratio different from the initial aspect ratio from among a plurality of candidate aspect ratios corresponding to a fixed set of output aspect ratios of a generative neural network; positioning the image in an image frame having the target aspect ratio, wherein the image frame includes an image region containing the image and one or more extended regions outside the boundaries of the image; automatically creating a mask based on the target aspect ratio and the positioning of the image, wherein the mask indicates the one or more extended regions; and generating, using the generative neural network, an extended image having the target aspect ratio based on the image and the mask, wherein the extended image depicts the image in the image region and one or more generated image portions in the one or more extended regions, respectively, and wherein the one or more generated image portions comprise an extension of the scene element depicted in the image.
2 . The method of claim 1 , further comprising: providing the plurality of candidate aspect ratios to a user; and receiving an aspect ratio input from the user, wherein the target aspect ratio is identified from the plurality of candidate aspect ratios based on the aspect ratio input.
3 . The method of claim 1 , further comprising: receiving a positioning input from a user, wherein the image is positioned in the image frame based on the positioning input.
4 . The method of claim 3 , wherein the extended image has a same dimension as the image and a different dimension than the image.
5 . The method of claim 1 , further comprising: dilating the image to obtain a dilated image, wherein the dilated image is positioned in the image frame, and wherein the extended image is based on the dilated image.
6 . The method of claim 1 , further comprising: receiving a text prompt, wherein the extended image is generated based on the text prompt.
7 . The method of claim 1 , further comprising: receiving an image prompt depicting an object, wherein the extended image includes the object in the one or more extended regions.
8 . The method of claim 1 , wherein: the extended image is generated to include a plurality of generated image portions on multiple sides of the image, respectively, using a one-click process.
9 . An apparatus for image processing, comprising: at least one processor unit; and at least one memory unit storing instructions and in electronic communication with the at least one processor unit; an aspect ratio selection element configured to identify a target aspect ratio different from an initial aspect ratio of an image from among a plurality of candidate aspect ratios corresponding to a fixed set of output aspect ratios of a generative neural network; an image framing component configured to position the image in an image frame having the target aspect ratio, wherein the image frame includes an image region containing the image and one or more extended regions outside the boundaries of the image; a mask generation component configured to create a mask based on the target aspect ratio and the position of the image in the image frame, wherein the mask indicates the one or more extended regions; and the generative neural network comprising parameters stored in the at least one memory unit and trained to generate an extended image based on the image, the mask, and the image frame, wherein the extended image includes the image in the image region and one or more generated image portions in the one or more extended regions, respectively.
10 . The apparatus of claim 9 , the at least one memory unit further including instructions to: provide the plurality of candidate aspect ratios to a user; and receive an aspect ratio input from the user, wherein the target aspect ratio is identified from the plurality of candidate aspect ratios based on the aspect ratio input.
11 . The apparatus of claim 10 , the at least one memory unit further including instructions to: receive a positioning input from the user, wherein the image is positioned in the image frame based on the positioning input.
12 . The apparatus of claim 11 , wherein: the extended image has a same dimension as the image and a different dimension than the image.
13 . The apparatus of claim 9 , the at least one memory unit further including instructions to: dilate the image to obtain a dilated image, wherein the dilated image is positioned in the image frame, and wherein the extended image is based on the dilated image.
14 . The apparatus of claim 9 , the at least one memory unit further including instructions to: receive a text prompt, wherein the extended image is generated based on the text prompt.
15 . The apparatus of claim 14 , wherein: the one or more generated image portions of the extended image are generated based on the text prompt.
16 . The apparatus of claim 9 , the at least one memory unit further including instructions to receive an image prompt depicting an object, wherein the extended image includes the object in the one or more extended regions.
17 . A method comprising: providing a user interface including an aspect ratio selection element; receiving an aspect ratio input via the user interface, wherein the aspect ratio input indicates a target aspect ratio from among a plurality of candidate aspect ratios corresponding to a fixed set of output aspect ratios of a generative neural network; positioning an image in an image frame having the target aspect ratio, wherein the image frame includes an image region containing the image and one or more extended regions outside the boundaries of the image; automatically creating a mask based on the target aspect ratio and the positioning of the image, wherein the mask indicates the one or more extended regions; generating an extended image using the generative neural network based on the mask, wherein the extended image includes the image in the image region and one or more generated image portions in the one or more extended regions, respectively; and displaying the extended image via the user interface in response to the received aspect ratio input.
18 . The method of claim 17 , wherein: the extended image is generated to include a plurality of generated image portions on multiple sides of the image based on a one-click input of the aspect ratio input.
19 . The method of claim 17 , further comprising: providing the plurality of candidate aspect ratios to a user via the aspect ratio selection element as preview images including the one or more extended regions to be generated, wherein the plurality of candidate aspect ratios comprises the aspect ratio input.
20 . The method of claim 17 , further comprising: receiving a positioning input via the user interface, wherein the image is positioned in the image frame based on the positioning input.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims priority under 35 USC § 119(a) to U.S. Patent Application No. 63/493,836 filed on Apr. 3, 2023, the disclosure of which is incorporated by reference herein in its entirety. BACKGROUND The following relates generally to image processing, and more specifically to image extension using neural networks. Image processing or digital image processing refers to the use of a computer to edit a digital image (e.g., or synthesize an image) using an algorithm or a processing network. Image processing technologies have become increasingly important in various fields including photography, video processing, computer vision, and more. Image extension is a subfield of image processing. In some cases, a neural network or a machine learning model may be used to generate or modify an image. In some cases, the generated content is based on a text prompt or a source image. SUMMARY The present disclosure describes systems and methods for image processing. Embodiments of the present disclosure include an image processing system configured to generate an extended image (e.g., an extended representation of a user provided image based on a target aspect ratio provided by a user). For example, an image processing system may perform content editing of an image to enable customization of the image, to make the image suitable for different applications, etc. In some examples, an image processing system may automatically fill in image content around an initial (e.g., source) image to perform image extension. As described in more detail herein, the content of a user-provided image may be extended without compromising the integrity and quality of the image generated by the image processing system. For instance, a method, apparatus, and non-transitory computer readable medium for image extension using neural networks are described. One or more aspects of the method, apparatus, and non-transitory computer readable medium include obtaining an image (e.g., a source image, a user provided image, etc.) having an initial aspect ratio. A target aspect ratio may be identified that is different from the initial aspect ratio. The image may be positioned in an image frame having the target aspect ratio, where the image frame includes an image region containing the image and one or more extended regions outside the boundaries of the image. An extended image may be generated using a generative neural network, where the extended image includes the image in the image region and one or more generated image portions in the one or more extended regions, respectively, and wherein the one or more generated image portions comprise an extension of a scene element depicted in the image. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows an example of an image processing system according to aspects of the present disclosure. FIG. 2 shows an example of an image processing system according to aspects of the present disclosure. FIG. 3 shows an example of an image extension process according to aspects of the present disclosure. FIG. 4 shows an example of image processing according to aspects of the present disclosure. FIG. 5 shows an example of an image extension process based on aspect ratio selection according to aspects of the present disclosure. FIG. 6 shows an example of an image extension process based on user image shifting according to aspects of the present disclosure. FIG. 7 shows an example of an image extension process based on pixel dilation according to aspects of the present disclosure. FIG. 8 shows an example of an image extension process based on text guided outpainting according to aspects of the present disclosure. FIG. 9 shows an example of a method for image processing according to aspects of the present disclosure. FIG. 10 shows an example of a guided latent diffusion model according to aspects of the present disclosure. FIG. 11 shows an example of a computing device according to aspects of the present disclosure. DETAILED DESCRIPTION The present disclosure relates to image processing using machine learning. Some embodiments of the disclosure relate to extending images using a trained neural network. Despite the advancement of image processing technologies, many image processing systems are not user-friendly, or are inaccessible to non-expert users. For instance, many conventional image processing tools and software require specialized knowledge and skills, making it difficult for users with limited expertise to take full advantage of such image processing tools. As a result, users may struggle to achieve desired results, and the image processing tools may produce unsatisfactory outputs. Specifically, image editing applications for extending an images (i.e., “uncropping” an image) sometimes require extensive skills and training to operate effectively. For example, these systems may use a large number of steps to extend an image. For example, multiple steps may be used to manually obtain