EP-3695375-B1 - IMAGE SIGNAL PROCESSOR FOR PROCESSING IMAGES

EP3695375B1EP 3695375 B1EP3695375 B1EP 3695375B1EP-3695375-B1

Inventors

HWANG, HAU
PANKAJ, TUSHAR SINHA
GUPTA, VISHAL
LEE, JISOO

Dates

Publication Date: 20260513
Application Date: 20181005

Claims (19)

A computer-implemented method of processing image data using one or more neural networks, the method comprising: obtaining (1302) a patch of raw image data (621), the patch of raw image data including a subset of pixels of a frame of raw image data captured using one or more image sensors, wherein the patch of raw image data includes a single color component for each pixel of the subset of pixels; applying (1304) at least one neural network to the patch of raw image data to determine a plurality of color component values for one or more pixels of the subset of pixels, wherein applying the at least one neural network to the patch of raw image data comprises: applying a plurality of strided convolutional filters to the patch of raw image data to generate reduced resolution data representative of the patch of raw image data, the plurality of strided convolutional filters including: a first strided convolutional filter (624) having a first array of weights, wherein application of the first strided convolutional filter to the patch of raw image data generates a first set of weighted data representative of the patch of raw image data, the first set of weighted data having a first resolution; and a second strided convolutional filter (623) having a second array of weights, wherein application of the second strided convolutional filter to the patch of raw image data generates a second set of weighted data representative of the patch of raw image data, the second set of weighted data having a second resolution that is of a lower resolution than the first resolution; upscaling the second set of weighted data having the second resolution to the first resolution; and generating combined weighted data representative of the patch of raw image data by combining the upscaled second set of weighted data with the first set of weighted data having the first resolution; and generating (1306) a patch of output image data (630) based on application of the at least one neural network to the patch of raw image data, the patch of output image data including a subset of pixels of a frame of output image data and including the plurality of color component values for one or more pixels of the subset of pixels of the frame of output image data, wherein application of the at least one neural network causes the patch of output image data to include fewer pixels than the patch of raw image data.
The method of claim 1, wherein the first strided convolutional filter comprises a first convolutional network, CNN, (624) for application to the patch of raw image data, and a second CNN (632) provided to process the output from the first CNN to generate the first set of weighted data, the first CNN being a strided CNN and the second CNN having a stride equal to 1; and wherein the second strided convolutional filter comprises a third convolutional network, CNN, (623) for application to the patch of raw image data, and a fourth CNN (631) provided to process the output from the third CNN to generate the second set of weighted data, the first CNN being a strided CNN and the second CNN having a stride equal to 1.
The method of claim 1, wherein the first strided convolutional filter comprises a first convolutional network, CNN, (625) for application to the patch of raw image data, and a second CNN (633) provided to process the output from the first CNN to generate the first set of weighted data, the first CNN being a strided CNN and the second CNN having a stride equal to 1; and wherein the second strided convolutional filter comprises a third convolutional network, CNN, (624) for application to the patch of raw image data in parallel with a fourth CNN (623), a fifth CNN (632) provided to process the output from the third CNN (624), a sixth CNN (631) provided to process the output from the fourth CNN (623), and a seventh CNN(626) provided to process a combined output from the fifth CNN (632) and the sixth CNN (631), to generate the second set of weighted data, the third CNN (624) and the fourth CNN (623) being strided CNNs, and the fifth CNN, sixth CNN and seventh CNN having a stride equal to 1.
The method of claim 1, further comprising: applying one or more convolutional filters to the combined weighted data to generate feature data representative of the patch of raw image data, each convolutional filter of the one or more convolutional filters including an array of weights.
The method of claim 4, further comprising: upscaling the feature data to a full resolution; and generating combined feature data representative of the patch of raw image data by combining the upscaled feature data with full resolution feature data, the full resolution feature data being generated by applying a convolutional filter to a full resolution version of the patch of raw image data.
The method of claim 5, wherein generating the patch of output image data includes: applying a final convolutional filter to the feature data or the combined feature data to generate the output image data.
An apparatus for processing image data using one or more neural networks, comprising: a memory configured to store image data; and a processor configured to: obtain a patch of raw image data (621), the patch of raw image data including a subset of pixels of a frame of raw image data captured using one or more image sensors, wherein the patch of raw image data includes a single color component for each pixel of the subset of pixels; apply at least one neural network to the patch of raw image data to determine a plurality of color component values for one or more pixels of the subset of pixels, wherein applying the at least one neural network to the patch of raw image data comprises: applying a plurality of strided convolutional filters to the patch of raw image data to generate reduced resolution data representative of the patch of raw image data, the plurality of strided convolutional filters including: a first strided convolutional filter (624) having a first array of weights, wherein application of the first strided convolutional filter to the patch of raw image data generates a first set of weighted data representative of the patch of raw image data, the first set of weighted data having a first resolution; and a second strided convolutional filter (623) having a second array of weights, wherein application of the second strided convolutional filter to the patch of raw image data generates a second set of weighted data representative of the patch of raw image data, the second set of weighted data having a second resolution that is of a lower resolution than the first resolution; upscaling the second set of weighted data having the second resolution to the first resolution; and generating combined weighted data representative of the patch of raw image data by combining the upscaled second set of weighted data with the first set of weighted data having the first resolution; and generate a patch of output image data (630) based on application of the at least one neural network to the patch of raw image data, the patch of output image data including a subset of pixels of a frame of output image data and including the plurality of color component values for one or more pixels of the subset of pixels of the frame of output image data, wherein application of the at least one neural network causes the patch of output image data to include fewer pixels than the patch of raw image data.
The apparatus of claim 7, wherein the first strided convolutional filter comprises a first convolutional network, CNN, (624) for application to the patch of raw image data, and a second CNN (632) provided to process the output from the first CNN to generate the first set of weighted data, the first CNN being a strided CNN and the second CNN having a stride equal to 1; and wherein the second strided convolutional filter comprises a third convolutional network, CNN, (623) for application to the patch of raw image data, and a fourth CNN (631) provided to process the output from the third CNN to generate the second set of weighted data, the first CNN being a strided CNN and the second CNN having a stride equal to 1.
The apparatus of claim 7, wherein the first strided convolutional filter comprises a first convolutional network, CNN, (625) for application to the patch of raw image data, and a second CNN (633) provided to process the output from the first CNN to generate the first set of weighted data, the first CNN being a strided CNN and the second CNN having a stride equal to 1; and wherein the second strided convolutional filter comprises a third convolutional network, CNN, (624) for application to the patch of raw image data in parallel with a fourth CNN (623), a fifth CNN (632) provided to process the output from the third CNN (624), a sixth CNN (631) provided to process the output from the fourth CNN (623), and a seventh CNN(626) provided to process a combined output from the fifth CNN (632) and the sixth CNN (631), to generate the second set of weighted data, the third CNN (624) and the fourth CNN (623) being strided CNNs, and the fifth CNN, sixth CNN and seventh CNN having a stride equal to 1.
The apparatus of claim 7, wherein the frame of raw image data includes image data from the one or more image sensors filtered by a color filter array.
The apparatus of claim 10, wherein the color filter array includes a Bayer color filter array.
The apparatus of claim 7, wherein each strided convolutional filter of the one or more strided convolutional filters includes a plurality of channels, wherein each channel of the plurality of channels includes a different array of weights for generating a respective feature map array of weighted feature data values.
The apparatus of claim 7, wherein the processor is further configured to: apply one or more convolutional filters to the combined weighted data to generate feature data representative of the patch of raw image data, each convolutional filter of the one or more convolutional filters including an array of weights.
The apparatus of claim 13, wherein the processor is further configured to: upscale the feature data to a full resolution; and generate combined feature data representative of the patch of raw image data by combining the upscaled feature data with full resolution feature data, the full resolution feature data being generated by applying a convolutional filter to a full resolution version of the patch of raw image data.
The apparatus of claim 14 , wherein generating the patch of output image data includes: applying a final convolutional filter to the feature data or the combined feature data to generate the output image data.
The apparatus of claim 7, wherein the processor is further configured to: obtain additional data for augmenting the obtained patch of raw image data, the additional data including at least one or more of tone data, radial distance data, or auto white balance (AWB) gain data.
The apparatus of claim 7, wherein the at least one neural network includes a plurality of layers, and wherein the plurality of layers are connected with a high-dimensional representation of the patch of raw image data.
The apparatus of claim 7, further comprising a camera for capturing pictures.
A non-transitory computer-readable medium having stored thereon instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any of claims 1 to 6.

Description

FIELD The present disclosure generally relates to image processing, and more specifically to techniques and systems for performing image processing using an image signal processor. The application describes processing image data using one or more neural networks. MICHAËL GHARBI ET AL, "Deep joint demosaicking and denoising", ACM TRANSACTIONS ON GRAPHICS (TOG), ACM, US, (20161111), vol. 35, no. 6, pages 1 - 12, propose a joint denoising-demosaicking technique using a Convolutional Neural Network. RUNJIE TAN ET AL, "COLOR IMAGE DEMOSAICKING VIA DEEP RESIDUAL LEARNING", IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), (20170710), pages 793 - 798 present a customised convolutional neural network which is trained in an end-to-end manner from natural color images to address color demosaicking. BRIEF SUMMARY In some examples, techniques and systems are described for performing image processing. Traditional image signal processors (ISPs) have separate discrete blocks that address the various partitions of the image-based problem space. For example, a typical ISP has discrete functional blocks that each apply a specific operation to raw camera sensor data to create a final output image. Such functional blocks can include blocks for demosaicing, noise reduction (denoising), color processing, tone mapping, among many other image processing functions. Each of these functional blocks contains many hand-tuned parameters, resulting in an ISP with a large number of hand-tuned parameters (e.g., over 10,000) that must be re-tuned according to the tuning preference of each customer. Such hand-tuning is very time-consuming and expensive. A machine learning ISP is described herein that uses machine learning systems and methods to derive the mapping from raw image data captured by one or more image sensors to a final output image. In some examples, raw image data can include a single color or a grayscale value for each pixel location. For example, a sensor with a Bayer pattern color filter array (or other suitable color filter array) with one of either red, green, or blue filters at each pixel location can be used to capture raw image data with a single color per pixel location. In some cases, a device can include multiple image sensors to capture the raw image data processed by the machine learning ISP. The final output image can contain processed image data derived from the raw image data. The machine learning ISP can use a neural network of convolutional filters (e.g., convolutional neural networks (CNNs)) for the ISP task. The neural network of the machine learning ISP can include several similar or repetitive blocks of convolutional filters with a high number of channels (e.g., an order of magnitude larger than the number of channels in an RGB or YCbCr image). The machine learning ISP functions as a single unit, rather than having individual functional blocks that are present in a traditional ISP. The neural network of the ISP can include an input layer, multiple hidden layers, and an output layer. The input layer includes the raw image data from one or more image sensors. The hidden layers can include convolutional filters that can be applied to the input data, or to the outputs from previous hidden layers to generate feature maps. The filters of the hidden layers can include weights used to indicate an importance of the nodes of the filters. In some cases, the neural network can have a series of many hidden layers, with early layers determining simple and low level characteristics of a the raw image input data, and later layers building up a hierarchy of more complex and abstract characteristics. The neural network can then generate the final output image (making up the output layer) based on the determined high-level features. The invention is defined in the appended independent claims. Optional features are defined in the dependent claims. The foregoing, together with other features and embodiments, will become more apparent upon referring to the following specification, claims, and accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and the payment of the necessary fee. Illustrative embodiments of the present invention are described in detail below with reference to the following drawing figures: FIG. 1 is a block diagram illustrating an example of an image signal processor, in accordance with some examples;FIG. 2 is a block diagram illustrating an example of a machine learning image signal processor, in accordance with some examples;FIG. 3 is a block diagram illustrating an example of a neural network, in accordance with some examples;FIG. 4 is a diagram illustrating an example of training a neural network system of a machine learning image signal processor, in accordance with some examples;FIG. 5 is a