EP-4200742-B1 - IMAGE PROCESSING BASED ON OBJECT CATEGORIZATION

EP4200742B1EP 4200742 B1EP4200742 B1EP 4200742B1EP-4200742-B1

Inventors

PINHASOV, ERAN
CHENG, SCOTT
SCHARAM, ERAN
GUREVICH, ANATOLY

Dates

Publication Date: 20260506
Application Date: 20210720

Claims (15)

An apparatus (100) for image processing, the apparatus (100) comprising: a memory (140); and one or more processors (150) coupled to the memory (140), the one or more processors (150) configured to: receive image data (210) captured by an image sensor (130); determine that a first object image region in the image data (210) depicts a first category of object of a plurality of categories of object; determine that a second object image region in the image data (210) depicts a second category of object of the plurality of categories of object; generate a category map (230) partitioning the image data (210) into a plurality of object image regions including the first object image region and the second object region, wherein each object region of the plurality of object image regions corresponds to one of the plurality of categories of object; generate a confidence map (235) partitioning the image data (210) into a plurality of confidence image regions and identify a plurality of confidence levels corresponding to the plurality of confidence image regions of the image data, wherein each confidence level of the plurality of confidence levels identifies a confidence associated with a categorization in the category map (230) that a corresponding confidence image region of the plurality of confidence image regions depicts one of the plurality of categories of object; generate, based on the category map (230) and the confidence map (235), a plurality of modifiers (545a, 545b), the plurality of modifiers (545a, 545b) identifying a first deviation from a default setting for an image signal processor, ISP, tuning parameter for the first object image region and a second deviation from the default setting for the ISP tuning parameter for the second object region; and generate an image (250) based on the image data (210) using an image capture process at least in part by applying different settings for the ISP tuning parameter to different portions of the image data (210), wherein the different settings for the ISP tuning parameter are based on the plurality of modifiers (545a, 545b), the different portions of the image data (210) being identified based on the first object image region, the second object image region, and the plurality of confidence image regions.
The apparatus of claim 1, wherein the one or more processors are configured to: adjust the one or more modifiers, including blending the one or more modifiers with a blending update that is based on the plurality of confidence levels corresponding to the plurality of confidence image regions, wherein blending the one or more modifiers with the blending update adjusts at least one of the first deviation and the second deviation in at least one area of the image data.
The apparatus of claim 1, wherein the image capture process includes processing the image data using an image signal processor (ISP) of the one or more processors, wherein the different settings for the image capture process are different tuning settings for the ISP.
The apparatus of claim 3, wherein the different tuning settings for the ISP include different strengths at which an ISP tuning parameter is applied during processing of the image data using the ISP, wherein the ISP tuning parameter is one of noise reduction, sharpening, color saturation, color mapping, color processing, and tone mapping.
The apparatus of claim 1, wherein the different settings include a setting associated with at least one of a lens position, a flash, a focus, an exposure, a white balance, an aperture size, a shutter speed, an ISO, an analog gain, a digital gain, a denoising, a sharpening, a tone mapping, a color saturation, a demosaicking, a color space conversion, a shading, an edge enhancement, an image combining for high dynamic range (HDR), a special effect, an artificial noise addition, an edge-directed upscaling, an upscaling, a downscaling, and an electronic image stabilization.
The apparatus of claim 1, wherein the one or more processors are configured to: process the image data including at least one of demosaicking the image data and converting the image data from a first color space to a second color space.
The apparatus of claim 1, wherein the one or more processors are configured to: receive a user input associated with at least one of the first object image region and the second object image region, wherein at least one of the different settings is defined based on the user input and corresponds to one of the first object image region and the second object image region.
The apparatus of claim 1, wherein the one or more processors include an image signal processor (ISP) that applies the different settings for the image capture process to the different portions of the image data.
The apparatus of claim 1, wherein the one or more processors include a classification engine that identifies at least the first object image region and the second object image region, wherein the classification engine is at least partially positioned on an integrated circuit chip.
The apparatus of claim 1, wherein the apparatus is one of a mobile device, a wireless communication device, and a camera.
The apparatus of claim 1, further comprising: the image sensor.
The apparatus of claim 1, further comprising: a display that displays the image.
A method of image processing, the method comprising: receiving image data (210) captured by an image sensor (130); determining that a first object image region in the image data (210) depicts a first category of object of a plurality of categories of object; determining that a second object image region in the image data (210) depicts a second category of object of the plurality of categories of object; generating a category map (230) partitioning the image data (210) into a plurality of object image regions including the first object image region and the second object region, wherein each object region of the plurality of object image regions corresponds to one of the plurality of categories of object; generating a confidence map (235) partitioning the image data (210) into a plurality of confidence image regions and identifying a plurality of confidence levels corresponding to the plurality of confidence image regions of the image data, wherein each confidence level of the plurality of confidence levels identifies a confidence associated with a categorization in the category map (230) that a corresponding confidence image region of the plurality of confidence image regions depicts one of the plurality of categories of object; generating, based on the category map (230) and the confidence map (235), a plurality of modifiers (545a, 545b), the plurality of modifiers (545a, 545b) identifying a first deviation from a default setting for an image signal processor, ISP, tuning parameter for the first object image region and a second deviation from the default setting for the ISP tuning parameter for the second object region; and generating an image (250) based on the image data (210) using the ISP tuning parameter including by applying different settings for the ISP tuning parameter to different portions of the image data (210), wherein the different settings for the ISP tuning parameter are based on the plurality of modifiers (545a, 545b), the different portions of the image data (210) identified based on the first object image region, the second object image region, and the plurality of confidence image regions.
The method of claim 13, wherein further comprising: adjusting the one or more modifiers, including blending the one or more modifiers with a blending update that is based on the plurality of confidence levels corresponding to the plurality of confidence image regions, wherein blending the one or more modifiers with the blending update adjusts at least one of the first deviation and the second deviation in at least one area of the image data.
A non-transitory computer readable storage medium (140) having embodied thereon a program that is executable by a processor (150) to perform a method of image processing, the method comprising: receiving image data (210) captured by an image sensor (130); determining that a first object image region in the image data (210) depicts a first category of object of a plurality of categories of object; determining that a second object image region in the image data (210) depicts a second category of object of the plurality of categories of object; generate a category map (230) partitioning the image data (210) into a plurality of object image regions including the first object image region and the second object region, wherein each object region of the plurality of object image regions corresponds to one of the plurality of categories of object; generate a confidence map (235) partitioning the image data (210) into a plurality of confidence image regions and identifying a plurality of confidence levels corresponding to the plurality of confidence image regions of the image data (210), wherein each confidence level of the plurality of confidence levels identifies a confidence associated with a categorization in the category map (230) that a corresponding confidence image region of the plurality of confidence image regions depicts one of the plurality of categories of object; generate, based on the category map (230) and the confidence map (235), a plurality of modifiers (545a, 545b), the plurality of modifiers (545a, 545b) identifying a first deviation from a default setting for an image signal processor, ISP, tuning parameter for the first object image region and a second deviation from the default setting for the ISP tuning parameter for the second object region; and generating an image (250) based on the image data (210) using the ISP tuning parameter including by applying different settings for the ISP tuning parameter to different portions of the image data (210), wherein the different settings for the ISP tuning parameter are based on the plurality of modifiers (545a, 545b), the different portions of the image data (210) identified based on the first object image region, the second object image region, and the plurality of confidence image regions.

Description

FIELD This application is related to image capture and image processing. More specifically, this application relates to systems and methods of automatically guiding image processing of a photograph based on categorization of objects in a photographed scene. BACKGROUND Image capture devices capture images by first light from a scene using an image sensor with an array of photodiodes. An image signal processor (ISP) then processes the raw image data captured by the photodiodes of the image sensor into an image that can be stored and viewed by a user. How the scene is depicted in the image depends in part on capture settings that control how much light is received by the image sensor, such as exposure time settings and aperture size settings. How the scene is depicted in the image also depends on how the ISP is tuned to process the photodiode data captured by the image sensor into an image. Traditionally, an ISP of an image capture device is only tuned once, during manufacturing. The tuning of the ISP affects how every image is processed in that image capture device, and affects every pixel of every image. Users typically expect image capture devices to capture high quality images regardless of what scene is photographed. To avoid situations where an image capture device cannot properly photograph certain types of scenes, the tuning of ISPs is generally selected to work reasonably well for as many types of scenes as possible. Because of this, however, the tuning of traditional ISPs is generally not optimal for photographing all types of scenes. Changyung Lee et al.: "An algorithm for automatic skin smoothing in digital portraits" discusses an automatic method for beautifying digital portraits by smoothing the skin of the face. The method builds on existing face detection and face feature alignment technology to automatically segment the face and neck areas to be smoothed. The smoothing filter is then applied to these areas. Chiang Holly et al.: "Multiple Object Recognition with Focusing and Blurring" discusses use of CNNs to identify significant objects in a scene for applications in photo or video editing. The problem is tackled with two different approaches: faster R-CNN and YOLO. Faster R-CNN is used to more accurately identify objects in a scene and then classify those regions to label object categories. YOLO is used to identify and classify videos that need to be processed in real time, trading off accuracy for speed. SUMMARY Systems and techniques are described herein for determining and applying different ISP settings for different image regions. An image capture and processing device can process raw image data captured by an image sensor using the different ISP settings for the different image regions. In some cases, a classification engine can partition the raw image data into the different object image regions based on detection of different types of objects within the different image regions in the raw image data. By applying different ISP settings for different regions in an image, the ISP is optimized for the types of objects depicted in the image. In one illustrative example, the ISP can use an ISP setting that enhances sharpness in a region of an image depicting a person's hair, which can enhance texture clarity of the hair. Within the same image, the ISP can use a different ISP setting that reduces sharpness and enhances noise reduction in a region of the image depicting a person's skin, which can result in a processed image depicting smoother skin. Different confidence regions of the image data can identify different degrees of confidence in the classifications. The settings can further be modified based on confidence. The strength of a particular ISP parameter, such as noise reduction, sharpness, color saturation, or tone mapping, can be adjusted from a default value for a pixel based on the category of object depicted at the pixel and a confidence level of that categorization. For instance, an increase or decrease from the default value associated with a particular category of object can be tempered if the confidence level of that categorization is low, or magnified if the confidence level of that categorization is high. In one example, an apparatus for data encoding is provided. The apparatus includes a memory and one or more processors (e.g., implemented in circuitry) coupled to the memory. The one or more processors are configured to and can: receive image data captured by an image sensor; determine that a first object image region in the image data depicts a first category of object of a plurality of categories of object; determine that a second object image region in the image data depicts a second category of object of the plurality of categories of object; identify a plurality of confidence levels corresponding to a plurality of confidence image regions of the image data, wherein each confidence level of the plurality of confidence levels identifies a confidence that a corresponding c