EP-4156084-B1 - TECHNIQUES FOR REDUCING DISTRACTIONS IN AN IMAGE

EP4156084B1EP 4156084 B1EP4156084 B1EP 4156084B1EP-4156084-B1

Inventors

ABERMAN, Kfir
KNAAN, Yael Pritch
JACOBS, DAVID EDWARD
LIBA, ORLY

Dates

Publication Date: 20260506
Application Date: 20220928

Claims (15)

A computer-implemented method (500) for reducing a distractor object in a first image, the method comprising: accessing (502), by one or more computing devices, a mask (203) and the first image (202) having the distractor object, wherein the mask indicates a region of interest associated with the first image, and wherein the distractor object is inside the region of interest and has one or more pixels each with a respective original attribute; processing (504), using a machine-learned inpainting model (208), the first image and the mask to generate an inpainted image (212) in which pixels within the mask have been inpainted, wherein the one or more pixels in the inpainted image each has a respective inpainted attribute in one or more chromaticity channels; determining (506) a palette transform based on a comparison of the first image and the inpainted image, wherein the one or more pixels of the distractor object have a transform attribute in the one or more chromaticity channels, the transform attribute being different than the inpainted attribute; and processing (508) the first image to generate a recolorized image (220), wherein the one or more pixels of the distractor object in the recolorized image has a recolorized attribute based on the transform attribute of the determined palette transform.
The computer-implemented method of claim 1, wherein processing the first image to generate the inpainted image includes: processing the first image and the mask to generate a masked image; and wherein the masked image is inputted into the machine-learned inpainting model to generate the inpainted image.
The computer-implemented method of claim 1 or 2, wherein the one or more chromaticity channels comprise hue and saturation (HS) channels, and wherein a value attribute for each pixel in the original image, the inpainted image, and the recolorized image is kept constant.
The computer-implemented method of claim 1, 2 or 3, wherein the recolorized attribute is different from the inpainted attribute.
The computer-implemented method of any preceding claim, further characterized by at least one of: the palette transform is generated through performance of a voting technique; or the machine-learned inpainting model is trained using hue and saturation (HS) training data.
The computer-implemented method of any preceding claim, wherein the distractor object includes a plurality of pixels with the original attribute, and wherein the one or more pixels of the distractor object is determined to have the transform attribute in the palette transform based on a plurality voting technique.
The computer-implemented method of any preceding claim, wherein the palette transform is further determined based on a dilated mask, the dilated mask having an expanded region of interest associated with the first image, the expanded region of interest of the dilated mask being larger than the region of interest of the mask.
The computer-implemented method of any preceding claim, further comprising: accessing a raw image, the raw image being in a red-green-blue (RGB) color space; and processing the raw image to generate the first image, wherein the first image is in a hue-saturation (HS) channels, and wherein a value attribute for each pixel in the first image is kept constant when the raw image is processed to generate the first image.
The computer-implemented method of claim 8, wherein the raw image is a high-resolution image, and a version of the first image that is processed by the machine-learned inpainting model is a low-resolution image.
The computer-implemented method of claim 8, wherein the recolorized image is in the hue-saturation (HS) channels, the method further comprising: processing the recolorized image to generate a final image, wherein the final image is in a red-green-blue (RGB) color space, optionally wherein the recolorized image is a high-resolution image, and the inpainted image is low-resolution image.
A computing system, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a machine-learned inpainting model, wherein the machine-learned inpainting model is configured to generate an inpainted image using a first image; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: accessing a mask and the first image having a distractor object, wherein the mask indicates a region of interest associated with the first image, and wherein the distractor object is inside the region of interest and has one or more pixels each with a respective original attribute; processing, using the machine-learned inpainting model, the first image and the mask to generate an inpainted image in which pixels within the mask have been inpainted, wherein the one or more pixels of the inpainted image each has a respective inpainted attribute in one or more chromaticity channels; determining a palette transform based on a comparison of the first image and the inpainted image, wherein the one or more pixels of the distractor object have a transform attribute in the one or more chromaticity channels, the transform attribute being different than the inpainted attribute; and processing the first image to generate a recolorized image, wherein the one or more pixels in the recolorized image has a recolorized attribute based on the transform attribute of the determined palette transform.
The computer system of claim 11, the operations further comprising: processing the first image and the mask to generate a masked image; and wherein the masked image is inputted into the machine-learned inpainting model to generate the inpainted image.
The computer system of claim 11 or 12, further characterised by at least one of: the one or more chromaticity channels comprise hue and saturation (HS) channels, and wherein a value attribute for each pixel in the original image, the inpainted image, and the recolorized image is kept constant; the recolorized attribute is different from the inpainted attribute; the distractor object includes a plurality of pixels with the original attribute, and wherein the one or more pixels of the distractor object is determined to have the transform attribute in the palette transform based on a plurality voting technique; or the machine-learned inpainting model is trained using hue and saturation (HS) training data.
The computer system of claim 11, 12 or 13, the operations further comprising: accessing a raw image, the raw image being in a red-green-blue (RGB) color space; and processing the raw image to generate the first image, wherein the first image is in a hue-saturation (HS) channels, and wherein a value attribute for each pixel in the first image is kept constant when the raw image is processed to generate the first image.
One or more non-transitory computer-readable media that collectively store a machine-learned inpainting model, wherein the machine-learned inpainting model has been learned by performance of operations, the operations comprising: accessing a mask and a first image having a distractor object, wherein the mask indicates a region of interest associated with the first image, and wherein the distractor object is inside the region of interest and has one or more pixels each with a respective original attribute; processing, using the machine-learned inpainting model, the first image and the mask to generate an inpainted image in which pixels within the mask have been inpainted, wherein the one or more pixels in the inpainted image each has a respective inpainted attribute in one or more chromaticity channels; determining a palette transform based on a comparison of the first image and the inpainted image, wherein the one or more pixels of the distractor object have a transform attribute in the one or more chromaticity channels, the transform attribute being different than the inpainted attribute; and processing the first image to generate a recolorized image, wherein the one or more pixels in the recolorized image has a recolorized attribute based on the transform attribute of the determined palette transform.

Description

FIELD The present disclosure relates generally to reducing distractions in an image. More particularly, the present disclosure relates to techniques for harmonizing a distractor within an image while maintaining realism of the image. BACKGROUND An image (e.g., photograph, frame of a video) and other forms of image data often include a distraction that can capture the eye-gaze of a user. As one example, the distraction can correspond to a distracting object (e.g., clutter in the background of a room, a bright color of one section of a background object) that distracts from the main subject (e.g., main speaker participating in a video call). As another example, the unwanted distractor object can correspond to an unsightly object in an otherwise pristine portrait photograph of a user. Thus, distractor objects can correspond to objects which grab a user's visual attention away from the main subject of the image. In conventional systems, the distractor object can be removed from the image. However, replacing the distractor object can be a challenging problem. In some instances, it may not be possible to remove the distractor from the image without distorting the image or making the image look unrealistic. For example, if the distractor object is one section of a background object (e.g., a chair) that is distracting (e.g., distracting color, bright color, distracting pattern), the distractor object may not be easily removed without distorting the background object. The article "Personal Photo Enhancement via Saliency Driven Color Transfer", by Gao Yuqi et al., in Proceedings of the International Conference on Internet Multimedia Computing and Service, ICIMCS'16, 19 August 2016, pages 273-276, discloses a personal photo enhancement method using saliency driven color transfer. SUMMARY Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments. The scope of the invention is defined by the appended claims. The present disclosure provides systems and methods for reducing saliency (e.g., attention) of distractors in an image by using a machine-trained model to manipulate the colors of the distractors, while maintaining the structure and content of the distractors. For example, chromatic information of the distractor(s) can be manipulated (e.g., so as to reduce saliency) while luminance information can be maintained (e.g., so as to maintain visual structure). Distractors can be defined as the regions of an image that draw attention away from the main subjects and reduce the overall user experience. In some instances, the resulting effects can be achieved solely using a pretrained model with no additional user input. One example aspect of the present disclosure is directed to a computer-implemented method for reducing a distractor object in a first image. The method can include accessing, by one or more computing devices, a mask and the first image having the distractor object. The mask can indicate a region of interest associated with the first image. The distractor object can be inside the region of interest and have one or more pixels with an original attribute. The method can further include processing, using a machine-learned inpainting model, the first image and the mask to generate an inpainted image. The one or more pixels of the distractor object can have an inpainted attribute in one or more chromaticity channels. Additionally, the method can include determining a palette transform based a comparison of the first image and the inpainted image. The one or more pixels of the distractor object can have a transform attribute in the one or more chromaticity channels, where the transform attribute is different than the inpainted attribute. Furthermore, the method can include processing the first image to generate a recolorized image. The one or more pixels of the distractor object in the recolorized image can have a recolorized attribute based on the transform attribute and the determined palette transform. Another example aspect of the present disclosure is directed to a computing system, comprising one or more processors and one or more non-transitory computer-readable media. The one or more non-transitory computer-readable media can collectively store a machine-learned inpainting model and instructions that, when executed by the one or more processors, cause the computing system to perform operations. The machine-learned inpainting model can be configured to generate an inpainted image using a first image. The operations can include accessing a mask and the first image having the distractor object. The mask can indicate a region of interest associated with the first image. The distractor object can be inside the region of interest and have one or more pixels with an original attribute. Additionally, the operations can include processing, using the machine-learne