CN-122023111-A - Image processing method, image processing apparatus, electronic device, storage medium, and program product

CN122023111ACN 122023111 ACN122023111 ACN 122023111ACN-122023111-A

Abstract

The embodiment of the disclosure provides an image processing method, an image processing device, electronic equipment, a storage medium and a program product, wherein the image processing method comprises the steps of obtaining an image to be processed, wherein the image to be processed is an image in a Bayer format, performing demosaicing and purple fringing correction processing on the image to be processed based on an image processing model, and generating a target image, the image processing model comprises a first network and a second network, the image processing model performs preliminary training on the first network through a first mosaic image and a first reference image with a first corresponding relation, then performs joint training on the first network and the second network through a second mosaic image and a second reference image with a second corresponding relation, and the first network further performs long jump connection on the generated multi-level feature image when performing joint training, and transmits the multi-level feature image to the second network. By adopting the technical scheme, the end-to-end optimization from demosaicing to purple fringing correction can be realized, and the quality of the target image is improved.

Inventors

CHEN XIANFENG
YAN HU
WANG DONGJIAN

Assignees

晶晨芯半导体(成都)有限公司

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (18)

1. An image processing method, comprising: Acquiring an image to be processed, wherein the image to be processed is an image in a Bayer format; Performing demosaicing and purple fringing correction on the image to be processed based on an image processing model to generate a target image, wherein the image processing model comprises a first network and a second network, the image processing model is obtained by performing preliminary training on the first network through a first mosaic image and a first reference image with a first corresponding relation, and then performing joint training on the first network and the second network through a second mosaic image and a second reference image with a second corresponding relation; And when the joint training is carried out, the first network also transmits the generated multi-level feature map to the second network through long-jump connection.
2. The image processing method according to claim 1, wherein, when performing joint training, an input image is subjected to encoding and decoding processing through the first network, a demosaiced image is generated, and the multi-level feature map is generated in a first encoder and/or a first decoder of the first network; Inputting at least a part of the multi-level feature map into a second coding layer of a second encoder corresponding to the spatial resolution of the feature map of each level through the long-jump connection, so that the second coding layer can perform feature fusion on the demosaiced image output by the first network and the multi-level feature map introduced through the long-jump connection; wherein the first encoder and the first decoder are connected by a first jump and the second encoder and the second decoder are connected by a second jump.
3. The image processing method according to claim 2, wherein the first network includes: The first input layer is used for carrying out transformation processing on the image to be processed in the channel dimension to generate a first initial characteristic representation; The first encoder comprises a plurality of cascaded first encoding layers, wherein each first encoding layer comprises a first downsampling module and a first feature extraction module, wherein the first downsampling module is used for performing downsampling processing on the first initial feature representation to generate a first encoding feature map, and the first feature extraction module is used for performing feature extraction processing on the first encoding feature map to generate a first depth feature map which is used as the input of the next first encoding layer or used as the input of a first decoder; The first decoder comprises a plurality of cascaded first decoding layers, wherein each first decoding layer comprises a first up-sampling module and a second feature extraction module, wherein the first up-sampling module is used for up-sampling the first depth feature map to generate a first reconstruction feature map, and the second feature extraction module is used for carrying out feature fusion on the first reconstruction feature map and the first depth feature map from a corresponding level to generate a second depth feature map which is used as the input of the next first decoding layer or used as the input of a first feature fusion layer; and the first feature fusion layer is used for fusing the first initial feature representation and the output of the first decoder to generate the demosaicing image.
4. The image processing method according to claim 2, wherein the second network includes: The second input layer is used for carrying out transformation processing on the demosaicing image in the channel dimension to generate a second initial characteristic representation; The second encoder comprises a plurality of cascaded second coding layers, wherein each second coding layer comprises a second downsampling module and a third feature extraction module; the third feature extraction module is used for carrying out feature extraction processing on the second coding feature map to generate a third depth feature map, and the third depth feature map is used as the input of the next second coding layer or used as the input of a second decoder; The second decoder comprises a plurality of cascaded second decoding layers, wherein each second decoding layer comprises a second upsampling module and a fourth feature extraction module, wherein the second upsampling module is used for upsampling the third depth feature map to generate a second reconstructed feature map, and the fourth feature extraction module is used for carrying out feature fusion on the second reconstructed feature map and the third depth feature map from a corresponding level to generate a fourth depth feature map which is used as the input of the next second decoding layer or used as the input of a second feature fusion layer; And the second feature fusion layer is used for fusing the second initial feature representation and the output of the second decoder to generate the target image.
5. The image processing method according to claim 1, wherein the first network and the second network are jointly trained based on a loss function including a first function for constraining brightness and color accuracy of the restored image at a pixel level and a second function for constraining consistency of the predicted image and the true image in a color direction.
6. The image processing method according to claim 5, wherein the first function is: Wherein, the Representing the total number of pixels; Representing the first reference image in the second reference image Intensity values of the individual pixels; the first of the predicted images representing the image processing model Intensity values of the individual pixels; Is a constant; the second function is: Wherein, the Representing the total number of pixels; Representing the first reference image in the second reference image Intensity vector values for a set of color components of the individual pixels; the first of the predicted images representing the image processing model Intensity vector values for a set of color components of the individual pixels; The loss function is: Wherein, the The hyper-parameters representing the first function are represented, Representing the hyper-parameters of the second function.
7. The image processing method according to claim 1, wherein one or more of the following are also satisfied when the joint training is performed: According to the preset training cycle number or iteration step number, a cosine annealing learning rate scheduling algorithm is adopted to adjust the learning rate, so that the learning rate is gradually attenuated from an initial learning rate to a minimum learning rate according to a cosine function curve in each training cycle or the preset iteration step number; and updating the parameters of the first network and the second network by adopting an adaptive gradient optimization algorithm, wherein the adaptive gradient optimization algorithm determines the learning rate of the parameters of the first network and the second network based on the first moment estimation and the second moment estimation of the parameter gradient, and updates the parameters of the first network and the second network according to the learning rate.
8. The image processing method according to claim 1, wherein the first mosaic image and the first reference image of the first correspondence are acquired by: decomposing the first reference image into a plurality of independent color channels, wherein each color channel corresponds to one color one by one, and each color channel comprises a plurality of pixel blocks; And sampling each color channel by adopting a sampling unit consisting of pixel blocks with preset width and height to obtain the first mosaic image.
9. The image processing method according to claim 8, wherein the sampling each of the color channels by using a sampling unit composed of pixel blocks having a predetermined width and a predetermined height to obtain the first mosaic image, comprises: The sampling unit is adopted to respectively execute sampling operation on each color channel, and a sampling area formed on the color channel is determined for each sampling, wherein the area sampled for any two times in each color channel does not have the same pixel block; obtaining Bayer array basic pixel units from each sampling area according to a preset selection rule; and arranging the Bayer array basic pixel units to obtain the first mosaic image.
10. The image processing method according to claim 8, wherein the first reference image is subjected to an edge detection operation and the detected edge region is subjected to blurring processing before the first reference image is decomposed into a plurality of independent color channels.
11. The image processing method according to claim 10, wherein the performing an edge detection operation on the first reference image and blurring the detected edge region includes: Identifying a region of variation in the first reference image based on spatially varying characteristics of pixel intensities in the first reference image; Determining an edge region corresponding to the change region and an adjacent region thereof according to the spatial position of the change region in the first reference image; and performing spatial smoothing processing on the first reference image in the edge region and the adjacent region thereof.
12. The image processing method according to claim 1, wherein the second mosaic image and the second reference image having the second correspondence are acquired in such a manner that: Performing color difference simulation of different degrees on different color channels in the second reference image to generate a degraded image; Decomposing the degraded image into a plurality of independent color channels, wherein each color channel corresponds to one color one by one, and each color channel comprises a plurality of pixel blocks; And sampling each color channel by adopting a sampling unit consisting of pixel blocks with preset width and height to obtain the second mosaic image.
13. The image processing method according to claim 12, wherein the performing color difference simulations of different color channels in the second reference image to generate the degraded image includes one or more of: Applying no reference blurring to the first color channel or applying reference blurring to the second color channel and the third color channel respectively, and applying blurring processing to the second color channel and the third color channel respectively, wherein the blurring intensities applied to the second color channel and the third color channel are different and are larger than the blurring intensity of the first color channel; taking the position of the first color channel as a reference position, and respectively applying geometric transformation to the second color channel and the third color channel, wherein the geometric deviation amplitude of the second color channel is larger than that of the third color channel; and synthesizing the processed first color channel, second color channel and third color channel to obtain the degraded image.
14. The image processing method according to claim 1, characterized by further comprising, before inputting the first mosaic image and/or the second mosaic image into the image processing model: Extracting all R pixels, gr pixels, gb pixels and B pixels from the first mosaic image and/or the second mosaic image respectively, and mapping each R pixels, gr pixels, gb pixels and B pixels to independent channels; And combining the independent channels to form a Bayer tensor with the spatial resolution of 1/2x1/2 of the original size and the channel number of 4.
15. An image processing apparatus, comprising: an acquisition unit for acquiring an image to be processed, the image to be processed is a Bayer-format image; The processing unit is used for performing demosaicing and purple correction processing on the image to be processed based on an image processing model to generate a target image, wherein the image processing model comprises a first network and a second network, the image processing model is obtained by performing preliminary training on the first network through a first mosaic image and a first reference image with a first corresponding relation and then performing joint training on the first network and the second network through a second mosaic image and a second reference image with a second corresponding relation; And when the joint training is carried out, the first network also transmits the generated multi-level feature map to the second network through long-jump connection.
16. An electronic device comprising a processor and a memory, wherein the memory has stored thereon a computer program capable of being run on the processor, the processor being configured to perform the image processing method according to any of claims 1 to 14 when the computer program is run.
17. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the image processing method of any one of claims 1 to 14.
18. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the image processing method of any of claims 1 to 14.

Description

Image processing method, image processing apparatus, electronic device, storage medium, and program product Technical Field The embodiment of the disclosure relates to the field of image processing, in particular to an image processing method, an image processing device, electronic equipment, a storage medium and a program product. Background In modern digital imaging devices, an image signal Processor (IMAGE SIGNAL Processor, ISP) plays a critical role. The ISP pipeline is responsible for converting RAW (RAW) data captured by the image sensor into a visible high quality color image. ISP pipelines typically contain a series of independent image processing modules, such as demosaicing, white balancing, denoising, color correction, and purple fringing correction, which are performed serially in a predetermined order. However, some drawbacks remain with existing image processing approaches. Disclosure of Invention In view of this, embodiments of the present disclosure provide an image processing method, apparatus, electronic device, storage medium, and program product, which can implement end-to-end optimization from demosaicing to purple fringing correction, and improve the quality of a target image. In order to achieve the above object, the embodiments of the present disclosure provide the following technical solutions. In a first aspect, an embodiment of the present disclosure provides an image processing method, including: Acquiring an image to be processed, wherein the image to be processed is an image in a Bayer format; Performing demosaicing and purple fringing correction on the image to be processed based on an image processing model to generate a target image, wherein the image processing model comprises a first network and a second network, the image processing model is obtained by performing preliminary training on the first network through a first mosaic image and a first reference image with a first corresponding relation, and then performing joint training on the first network and the second network through a second mosaic image and a second reference image with a second corresponding relation; And when the joint training is carried out, the first network also transmits the generated multi-level feature map to the second network through long-jump connection. Optionally, in performing joint training, encoding and decoding an input image through the first network to generate a demosaiced image, and generating the multi-level feature map in a first encoder and/or a first decoder of the first network; Inputting at least a part of the multi-level feature map into a second coding layer of a second encoder corresponding to the spatial resolution of the feature map of each level through the long-jump connection, so that the second coding layer can perform feature fusion on the demosaiced image output by the first network and the multi-level feature map introduced through the long-jump connection; wherein the first encoder and the first decoder are connected by a first jump and the second encoder and the second decoder are connected by a second jump. Optionally, the first network includes: The first input layer is used for carrying out transformation processing on the image to be processed in the channel dimension to generate a first initial characteristic representation; The first encoder comprises a plurality of cascaded first encoding layers, wherein each first encoding layer comprises a first downsampling module and a first feature extraction module, wherein the first downsampling module is used for downsampling the first initial feature representation to generate a first encoding feature map, and the first feature extraction module is used for carrying out feature extraction on the first encoding feature map to generate a first depth feature map which is used as an input of the next first encoding layer or as an input of a first decoder; The first decoder comprises a plurality of cascaded first decoding layers, wherein each first decoding layer comprises a first up-sampling module and a second feature extraction module, wherein the first up-sampling module is used for up-sampling the first depth feature map to generate a first reconstruction feature map, and the second feature extraction module is used for carrying out feature fusion on the first reconstruction feature map and the first depth feature map from a corresponding level to generate a second depth feature map which is used as the input of the next first decoding layer or used as the input of a first feature fusion layer; and the first feature fusion layer is used for fusing the first initial feature representation and the output of the first decoder to generate the demosaicing image. Optionally, the second network includes: The second input layer is used for carrying out transformation processing on the demosaicing image in the channel dimension to generate a second initial characteristic representation; The second encoder comprises a plura