US-12620059-B2 - Method and device for deep guided filter processing

US12620059B2US 12620059 B2US12620059 B2US 12620059B2US-12620059-B2

Abstract

A method of image processing includes: determining a first feature, wherein the first feature has a dimensionality D1; determining a second feature, wherein the second feature has a dimensionality D2 and is based on an output of a feature extraction network; generating a third feature by processing the first feature, the third feature having a dimensionality D3; generating a guidance by processing the second feature, the guidance having the dimensionality D3; generating a filter output by applying a deep guided filter (DGF) to the third feature using the guidance; generating a map based on the filter output; and outputting a processed image based on the map.

Inventors

Qingfeng Liu
Hai Su
Mostafa El-Khamy

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20211227

Claims (18)

1 . A method of image processing, comprising: determining a first feature, wherein the first feature has a dimensionality D1; determining a second feature, wherein the second feature has a dimensionality D2 and is based on an output of a feature extraction network; generating a third feature by processing the first feature, the third feature having a dimensionality D3; generating a guidance by processing the second feature, the guidance having the dimensionality D3 and indicating a criterion for smoothing; generating a filter output by applying a deep guided filter (DGF) to the third feature using the guidance as the criterion for smoothing; generating a map based on the filter output using depthwise separable convolutions; and outputting a processed image based on the map; further comprising determining the first feature based on an image to be processed such that the first feature encodes semantic information about the image to be processed, wherein the image includes a plurality of semantic areas, and the smoothing is applied in a lesser amount to a boundary between two or more of the plurality of semantic areas.
2 . The method of claim 1 , further comprising determining the second feature based on an image to be processed such that the second feature encodes boundary information.
3 . The method of claim 1 , wherein D1 is larger than D2.
4 . The method of claim 1 , wherein the first feature and the second feature are determined based on an image to be processed, the image to be processed has a dimensionality D4, and D4 is larger than each of D1, D2, and D3.
5 . The method of claim 1 , wherein: processing the second feature comprises applying a first convolution to the second feature to generate a first convolved feature, and outputting the processed image based on the filter output comprises aggregating the filter output with the first convolved feature.
6 . The method of claim 5 , wherein aggregating the filter output with the first convolved feature comprises concatenating the filter output with the first convolved feature.
7 . The method of claim 5 , wherein processing the second feature comprises applying a second convolution to the first convolved feature to generate a second convolved feature, and the guidance is based on the second convolved feature.
8 . The method of claim 5 , wherein applying the DGF to the third feature using the guidance comprises: generating a downsampled version of the guidance; obtaining coefficients using a filter process in which the downsampled version of the guidance is used; upsampling the coefficients to match a dimensionality of the guidance; and applying the coefficients to the guidance.
9 . The method of claim 1 , wherein applying the DGF to the third feature using the guidance comprises: generating a downsampled version of the guidance; obtaining coefficients using a filter process in which the downsampled version of the guidance is used; applying the coefficients to the downsampled version of the guidance to generate a result; and upsampling the result.
10 . A system, comprising: a processing circuit; and a memory used to store instructions executed by the processing circuit, wherein the instructions execute a method of image processing comprising: determining a first feature, wherein the first feature has a dimensionality D1; determining a second feature, wherein the second feature has a dimensionality D2 and is based on an output of a feature extraction network; generating a third feature by processing the first feature, the third feature having a dimensionality D3; generating a guidance by processing the second feature, the guidance having the dimensionality D3 and indicating a criterion for smoothing; generating a filter output by applying a deep guided filter (DGF) to the third feature using the guidance as the criterion for smoothing; generating a map based on the filter output using depthwise separable convolutions; and outputting a processed image based on the map; wherein when executed, the instructions further comprise determining the first feature based on an image to be processed such that the first feature encodes semantic information about the image to be processed, wherein the image includes a plurality of semantic areas, and the smoothing is applied in a lesser amount to a boundary between two or more of the plurality of semantic areas.
11 . The system of claim 10 , wherein when executed, the instructions further comprise determining the second feature based on an image to be processed such that the second feature encodes boundary information.
12 . The system of claim 10 , wherein D1 is larger than D2.
13 . The system of claim 10 , wherein in the method of image processing, the first feature and the second feature are determined based on an image to be processed, the image to be processed has a dimensionality D4, and D4 is larger than each of D1, D2, and D3.
14 . The system of claim 10 , wherein when executed, the instructions further comprise: processing the second feature comprises applying a first convolution to the second feature to generate a first convolved feature, and outputting the processed image based on the filter output comprises aggregating the filter output with the first convolved feature.
15 . The system of claim 14 , wherein when executed, the instructions further comprise: aggregating the filter output with the first convolved feature comprises concatenating the filter output with the first convolved feature.
16 . The system of claim 14 , wherein when executed, the instructions further comprise: processing the second feature comprises applying a second convolution to the first convolved feature to generate a second convolved feature, and the guidance is based on the second convolved feature.
17 . The system of claim 14 , wherein when executed, the instructions further comprise: applying the DGF to the third feature using the guidance comprises: generating a downsampled version of the guidance; obtaining coefficients using a filter process in which the downsampled version of the guidance is used; upsampling the coefficients to match a dimensionality of the guidance; and applying the coefficients to the guidance.
18 . The system of claim 10 , wherein when executed, the instructions further comprise: applying the DGF to the third feature using the guidance comprises: generating a downsampled version of the guidance; obtaining coefficients using a filter process in which (i) the downsampled version of the guidance is used; applying the coefficients to the downsampled version of the guidance to generate a result; and upsampling the result.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is based on and claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/161,827, filed Mar. 16, 2021, to U.S. Provisional Patent Application No. 63/190,128, filed May 18, 2021, and to U.S. Provisional Patent Application No. 63/224,312, filed Jul. 21, 2021, the contents of which are incorporated herein by reference. FIELD The present disclosure relates generally to methods and devices for deep guided filter (DGF) image processing, and methods of training or optimizing the low complexity DGF. BACKGROUND Semantic segmentation is a process used in certain computer vision tasks, among other things. Neural network based semantic segmentation may use a dense prediction network that aims to classify pixels (e.g., each pixel) in an input image into a category (e.g., a set category or predefined category). For some tasks, such as applying smoothing filters to an image, content-aware image signal processing (ISP), or autonomous driving, achieving accuracy in such classification in the semantic boundary region may be important. However, achieving such accuracy may, in some processes, come at the expense of increased computational complexity, which may involve a burdensome amount of time and/or computing resources. SUMMARY According to some embodiments, a method of image processing includes: determining a first feature, wherein the first feature has a dimensionality D1; determining a second feature, wherein the second feature has a dimensionality D2 and is based on an output of a feature extraction network; generating a third feature by processing the first feature, the third feature having a dimensionality D3; generating a guidance by processing the second feature, the guidance having the dimensionality D3; generating a filter output by applying a deep guided filter (DGF) to the third feature using the guidance; generating a map based on the filter output; and outputting a processed image based on the map. According to some embodiments, a system includes a processing circuit configured to implement a method of image processing. The method includes: determining a first feature, wherein the first feature has a dimensionality D1; determining a second feature, wherein the second feature has a dimensionality D2 and is based on an output of a feature extraction network; generating a third feature by processing the first feature, the third feature having a dimensionality D3; generating a guidance by processing the second feature, the guidance having the dimensionality D3; generating a filter output by applying a deep guided filter (DGF) to the third feature using the guidance; generating a map based on the filter output; and outputting a processed image based on the map. BRIEF DESCRIPTION OF THE DRAWINGS Certain aspects, features, and advantages of certain embodiments of the present disclosure will be readily apparent from the following detailed description and the accompanying drawings, in which: FIG. 1 illustrates an example embodiment of a communication system 100 configured for electronic communication; FIG. 2 illustrates a comparative example of an image processing method that uses a DGF; FIG. 3 illustrates an example embodiment of an image processing method that uses a DGF; FIG. 4 illustrates an example embodiment of the DGF used in the image processing method shown in FIG. 3; FIG. 5 illustrates another example embodiment of an image processing method that uses a DGF; FIG. 6 illustrates an example embodiment of a dual-resolution DGF; FIG. 7 illustrates an example embodiment of a single resolution DGF; FIG. 8 illustrates an example embodiment of a DGF training process; and FIG. 9 illustrates an example embodiment of a system configured to manage image processing that uses a DGF. DETAILED DESCRIPTION Certain embodiments described herein provide for improved image smoothing via use of an improved DGF and semantic segmentation, and may include low complexity DGF and pixel-level prediction. This image processing may be part of a larger image processing method or pipeline, or may be used independently. In some embodiments, a smoothing filter that smooths an image is applied. It may be desirable, in some embodiments, to apply weaker or less smoothing to boundaries between different semantic areas of the image, relative to other areas of the image, to help maintain a sharp distinction between those semantic areas (e.g., boundaries between grass and sky in the image). Semantic segmentation may be used to help identify or define such boundaries, and smoothing may then be performed accordingly. Certain comparative smoothing image processing techniques, such as those shown in FIG. 2 and described in detail below, involve inputting a “guidance” to a DGF. DGFs may include, for example, edge-preserving smoothing filters (smoothing filters that preserve sharp boundaries between semantic areas). The DGF may make use of a guidance that indicates, expl