EP-4343678-B1 - METHOD AND APPARATUS FOR PROCESSING ARRAY IMAGE

EP4343678B1EP 4343678 B1EP4343678 B1EP 4343678B1EP-4343678-B1

Inventors

HEO, JINGU
KANG, BYONG MIN
NAM, DONG KYUNG
CHO, YANG HO

Dates

Publication Date: 20260506
Application Date: 20230908

Claims (13)

An image processing method comprising: receiving (1410) a plurality of sub images (131, 132, 133, 134) from an input array image (130) generated through an array lens (111), each of the plurality of sub images (131, 132, 133, 134) corresponding to different views; generating (1420) a plurality of temporary restored images (331, ..., 332) generated by demosaicing the plurality of sub images (131, 132, 133, 134) based on gradients surrounding a pixel position for which a pixel value needs to be determined; determining (1430) matching information based on an optical flow representing a difference between corresponding pixel locations based on a view difference between pixels of the plurality of sub images (131, 132, 133, 134) using a neural network model; the neural network model being pre-trained to estimate (810) the optical flow including the matching information (340) in response to an input of input data based on a view difference between the sub images (131, 132, 133, 134); performing a pixel-to-pixel matching (820) based on the matching information and providing a pixel distance representing a distance between pixels of matching pairs of the sub images (131, 132, 133, 134), and comparing (830, 1160) the pixel distance with a threshold; when the pixel distance of a matching pair is greater than the threshold, extracting (1440) the matching pair as one or more refinement targets; generating refined matching information by replacing at least one of pixels in the extracted one or more refinement targets based on a local search (845, 1170) for geometric consistency refinement for the refinement targets of a region based on pixel locations of the one or more refinement targets; and generating (1460) an output image (140, 350) of a single view by merging the plurality of temporary restored images based on the refined matching information.
The image processing method of claim 1, wherein each of the plurality of sub images (131, 132, 133, 134) of the input array image (130) iteratively comprises image data in a 2*2 array type arranged in a first channel signal-a second channel signal-the second channel signal-a third channel signal format based on a 2*2 color filter array, CFA, and wherein the generating (1420) the plurality of temporary restored images comprises: setting (450) a region of interest, ROI, based on first pixels in which the second channel signal is dominant among the first channel signal, the second channel signal, and the third channel signal of the sub images (131, 132, 133, 134); and based on the gradient between the neighboring pixels of the plurality of sub images (131, 132, 133, 134), performing the demosaicing (510, 710, 1120) by applying interpolation in a first gradient direction to second pixels comprised in the ROI and applying interpolation in a second gradient direction to third pixels not in the ROI, the second gradient direction being different from the first direction.
The image processing method of claim 2, wherein the setting of the ROI comprises determining a first gradient value based on an interpolation result using the second channel signal around a first pixel of a first sub image of the plurality of sub images (131, 132, 133, 134) and a second gradient value based on an interpolation result using the third channel signal and the first channel signal around the first pixel, and setting the ROI based on the first pixel based on a difference between the first gradient value and the second gradient value being less than a threshold value, or wherein the performing of the demosaicing comprises performing interpolation in the first gradient direction indicating a smaller gradient of a vertical direction and a horizontal direction of a first pixel of the ROI, and performing interpolation in the second gradient direction indicating a larger gradient of the vertical direction and the horizontal direction of a second pixel outside the ROI.
The image processing method of one of claims 1 to 3, wherein the generating of the plurality of temporary restored images comprises: generating color data by performing the demosaicing on raw data of the plurality of sub images (131, 132, 133, 134) using edge information based on the gradient between the neighboring pixels of each of the plurality of sub images (131, 132, 133, 134); and generating the plurality of temporary restored images based on the plurality of sub images (131, 132, 133, 134) by performing upsampling using the edge information, wherein the generating of the plurality of temporary restored images further comprises: determining a sharpening filter using the edge information; applying the sharpening filter to the plurality of temporary restored images based on a sharpening parameter; and adjusting the sharpening parameter based on a difference between a sharpening result and a target image.
The image processing method of one of claims 1 to 4, wherein the refining of the matching information comprises: selecting a first refinement target from the one or more refinement targets, the first refinement target comprising a first pixel of a first temporary restored image and a second pixel of a second temporary restored image from among the plurality of temporary restored images; determining a corresponding pixel, in a real world, to the first pixel by performing undistortion (841) on the first pixel and reprojection to the real world based on a first calibration parameter; determining a temporary pixel of the second temporary restored image by performing reprojection (843) to the second temporary restored image and distortion on the corresponding pixel based on a second calibration parameter; determining a new second pixel of the second temporary restored image by performing the local search (845) based on a location of the temporary pixel in the second temporary restored image; and updating a matching target of the first pixel to the new second pixel.
The image processing method of one of claims 1 to 5, wherein the generating the output image (140, 350) comprises generating the output image (140, 350) based on a weighted sum of each pixel of a reference image of the plurality of temporary restored images and a matching pixel of one or more other images of the temporary restored images based on the refined matching information, wherein a weighted sum of a first pixel of the reference image and a second pixel of the one or more other images is determined based on a first weight based on a difference between an intensity of the first pixel and an intensity of the second pixel, a second weight based on a pixel distance between the first pixel and the second pixel, and a third weight based on whether the first pixel and the second pixel correspond to raw data.
An image processing apparatus comprising: a memory (1220) configured to store instructions; and a processor (1210) configured to execute the one or more instructions to: receive (1410) a plurality of sub images (131, 132, 133, 134) from an input array image (130) generated through an array lens (111), each of the plurality of sub images (131, 132, 133, 134) corresponding to different views; generate (1420) a plurality of temporary restored images (331, ..., 332) generated by demosaicing the plurality of sub images (131, 132, 133, 134) based on gradients surrounding a pixel position for which a pixel value needs to be determined; determine (1430) matching information based on an optical flow representing a difference between corresponding pixel locations based on a view difference between pixels of the plurality of sub images (131, 132, 133, 134) using a neural network model; the neural network model being pre-trained to estimate (810) the optical flow including the matching information (340) in response to an input of input data based on a view difference between the sub images (131, 132, 133, 134); perform a pixel-to-pixel matching (820) based on the matching information and provide a pixel distance representing a distance between pixels of matching pairs of the sub images (131, 132, 133, 134), and compare (830, 1160) the pixel distance with a threshold; when the pixel distance of a matching pair is greater than the threshold, extract (1440) the matching pair as one or more refinement targets; generate refined (1450) matching information by replacing at least one of pixels in the extracted one or more refinement targets based on a local search (845, 1170) for geometric consistency refinement for the refinement targets of a region based on pixel locations of the one or more refinement targets; and generate (1460) an output image (140, 350) of a single view by merging the plurality of temporary restored images based on the refined matching information.
The image processing apparatus of claim 7, wherein each of the plurality of sub images (131, 132, 133, 134) of the input array image (130) iteratively comprises image data in a 2*2 array type arranged in a first channel signal-a second channel signal-the second channel signal-a third channel signal format based on a 2*2 color filter array, CFA, and wherein the processor (1210) is further configured to: set (450) a region of interest, ROI, based on first pixels in which the second channel signal is dominant among the first channel signal, the second channel signal, and the third channel signal of the sub images (131, 132, 133, 134), and based on the gradient between the neighboring pixels of the plurality of sub images (131, 132, 133, 134), perform the demosaicing (510, 710, 1120) to generate the plurality of temporary restored images by applying interpolation in a first gradient direction to pixels comprised in the ROI and applying interpolation in a second gradient direction to pixels not comprised in the ROI, the second gradient direction being different from the first direction.
The image processing apparatus of claim 7 or 8, wherein the processor (1210) is further configured to: perform interpolation in the first gradient direction indicating a smaller gradient of a vertical direction and a horizontal direction of a first pixel of the ROI; and perform interpolation in the second gradient direction indicating a larger gradient of the vertical direction and the horizontal direction of a second pixel outside the ROI.
The image processing apparatus of one of claims 7 to 9, wherein the processor (1210) is further configured to: select a first refinement target from the one or more refinement targets, the first refinement target comprising a first pixel of a first temporary restored image and a second pixel of a second temporary restored image from among the plurality of temporary restored images; determine a corresponding pixel, in a real world, to the first pixel by performing undistortion on the first pixel and reprojection to the real world based on a first calibration parameter, determine a temporary pixel of the second temporary restored image by performing reprojection to the second temporary restored image and distortion on the corresponding pixel based on a second calibration parameter, determine a new second pixel of the second temporary restored image by performing a local search based on a location of the temporary pixel in the second temporary restored image, and update a matching target of the first pixel to the new second pixel.
The image processing apparatus of one of claims 7 to 10, wherein the processor (1210) is further configured to generate the output image (140, 350) based on a weighted sum of each pixel of a reference image of the plurality of temporary restored images and a matching pixel of one or more other images of the plurality of temporary restored images based on the refined matching information, wherein a weighted sum of a first pixel of the reference image and a second pixel of the one or more other images is determined based on a first weight based on a difference between an intensity of the first pixel and an intensity of the second pixel, a second weight based on a pixel distance between the first pixel and the second pixel, and a third weight based on whether the first pixel and the second pixel correspond to raw data.
A computer-readable storage medium storing instructions that, when executed by a processor (1310) of an electronic device comprising an imaging device configured to generate an input array image (130) having a plurality of sub images (131, 132, 133, 134), each of the (131, 132, 133, 134) plurality of sub images (131, 132, 133, 134) corresponding to different views, instruct the processor (1310) to: generate (1420) a plurality of temporary restored images (331, ..., 332) generated by demosaicing the plurality of sub images (131, 132, 133, 134) based on gradients surrounding a pixel position for which a pixel value needs to be determined; determine (1430) matching information based on an optical flow representing a difference between corresponding pixel locations based on a view difference between pixels of the plurality of sub images (131, 132, 133, 134) using a neural network model; the neural network model being pre-trained to estimate (810) the optical flow including the matching information (340) in response to an input of input data based on a view difference between the sub images (131, 132, 133, 134); perform a pixel-to-pixel matching (820) based on the matching information and provide a pixel distance representing a distance between pixels of matching pairs of the sub images (131, 132, 133, 134), and compare (830, 1160) the pixel distance with a threshold; when the pixel distance of a matching pair is greater than the threshold, extract (1440) the matching pair as one or more refinement targets; generate refined (1450) matching information by replacing at least one of pixels in the extracted one or more refinement targets based on a local search (845, 1170) for geometric consistency refinement for the refinement targets of a region based on pixel locations of the one or more refinement targets; and generate (1460) an output image (140, 350) of a single view by merging the plurality of temporary restored images based on the refined matching information.
The computer-readable storage medium of claim 12, wherein each of the plurality of sub images (131, 132, 133, 134) of the input array image (130) iteratively comprises image data in a 2*2 array type arranged in a first channel signal-a second channel signal-the second channel signal-a third channel signal format based on a 2*2 color filter array, CFA, and wherein the processor (1310) is further instructed to: set (450) a region of interest, ROI based on first pixels in which the second channel signal is dominant among the first channel signal, the second channel signal, and the third channel signal of the sub images (131, 132, 133, 134), and based on the gradient between the neighboring pixels of the plurality of sub images (131, 132, 133, 134), perform the demosaicing to generate the plurality of temporary restored images by applying interpolation in a first gradient direction to pixels comprised in the ROI and applying interpolation in a second gradient direction to pixels not comprised in the ROI, the second gradient direction being different from the first direction.

Description

BACKGROUND 1. Field The disclosure relates to a method and apparatus for processing an array image. 2. Description of Related Art Due to the development of optical technology and image processing technology, capturing devices are utilized in a wide range of fields such as multimedia content, security, and recognition. For example, a capturing device may be mounted on a mobile device, a camera, a vehicle, or a computer to capture an image, recognize an object, or obtain data for controlling a device. The volume of the capturing device may be determined based on the size of a lens, the focal length of the lens, and the size of a sensor. When the volume of the capturing device is limited, a long focal length may be provided in a limited space by transforming a lens structure. EP 4 050 553 A1 refers to a method and device for restoring image obtained from array camera. The image restoration method includes obtaining a plurality of images through lens elements included in the array camera, obtaining a global parameter of the plurality of images, generating first processed images by transforming a viewpoint of each of the plurality of images based on the obtained global parameter, obtaining a local parameter for each pixel corresponding to each of the first processed images, generating second processed images by transforming the first processed images based on the obtained local parameter, and generating a synthesized image of a target viewpoint based on synthesizing the second processed images. WO 2008/086037 A2 refers to a color filter array interpolation. A CFA demosaicing algorithm utilizes constant color difference rule to determine green, red and blue Bayer planes. Several high-level green plane interpolation algorithms are discussed, and a green plane interpolation algorithm utilizes inverse gradient weighted estimate. As an alternative, the green plane interpolation algorithms combine horizontal and vertical gradient estimates with the gradients computed as weights. Also disclosed is a one-dimensional transform-based green updating algorithm, which provides directional interpolation, thereby reducing effect of zipper artifacts in the demosaiced image. US 2014/055632 A1 refers to a feature based high resolution motion estimation from low resolution images captured using an array source. Systems and methods enable feature based high resolution motion estimation from low resolution images captured using an array camera. The method also includes synthesizing high resolution image portions, where the synthesized high resolution image portions contain identified plurality of detected features from a sequence of low resolution images. SUMMARY The invention is claimed in the independent claims. Preferred embodiments are specified in the dependent claims. According to one aspect of the present invention, there is provided an image processing method comprising: receiving a plurality of sub images from an input array image generated through an array lens, each of the plurality of sub images corresponding to different views; generating a plurality of temporary restored images generated by demosaicing the plurality of sub images based on gradients surrounding a pixel position for which a pixel value needs to be determined; determining matching information based on an optical flow representing a difference between corresponding pixel locations based on a view difference between pixels of the plurality of sub images using a neural network model; the neural network model being pre-trained to estimate the optical flow including the matching information in response to an input of input data based on a view difference between the sub images; performing a pixel-to-pixel matching based on the matching information and providing a pixel distance representing a distance between pixels of matching pairs of the sub images, and comparing the pixel distance with a threshold; when the pixel distance of a matching pair is greater than the threshold, extracting the matching pair as one or more refinement targets; generating refined matching information by replacing at least one of pixels in the extracted one or more refinement targets based on a local search for geometric consistency refinement for the refinement targets of a region based on pixel locations of the one or more refinement targets; and generating an output image of a single view by merging the plurality of temporary restored images based on the refined matching information. According to another aspect of the present invention, there is provided an image processing apparatus comprising: a memory configured to store instructions; and a processor configured to execute the one or more instructions to: receive a plurality of sub images from an input array image generated through an array lens, each of the plurality of sub images corresponding to different views; generate a plurality of temporary restored images generated by demosaicing the plurality of sub images based on grad