US-12626330-B2 - Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium

US12626330B2US 12626330 B2US12626330 B2US 12626330B2US-12626330-B2

Abstract

An image processing method, an image processing device, a training method of a neural network, an image processing method based on a combined neural network model, a constructing method of a combined neural network model, a neural network processor, and a storage medium are provided. The image processing method includes: obtaining, based on an input image, initial feature images of N stages with resolutions from high to low, N is a positive integer and N>2; performing, based on initial feature images of second to N-th stages, cyclic scaling processing on an initial feature image of a first stage, to obtain an intermediate feature image; and performing merging processing on the intermediate feature image to obtain an output image. The cyclic scaling processing includes hierarchically-nested scaling processing of N−1 stages, and scaling processing of each stage includes down-sampling processing, concatenating processing, up-sampling processing, and residual link addition processing.

Inventors

Pablo Navarrete Michelini
Wenbin Chen
Hanwen Liu
Dan Zhu

Assignees

BOE TECHNOLOGY GROUP CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20231227
Priority Date: 20191018

Claims (18)

1 . An image processing method based on a combined neural network model, wherein the combined neural network model comprises a plurality of neural network models, the plurality of neural network models are configured to execute an identical image processing task, input images of the plurality of neural network models are provided with identical resolution, output images of the plurality of neural network models are provided with identical resolution, and any two of the plurality of neural network models are different in at least one of a structure or a parameter; and the image processing method based on the combined neural network model comprises: inputting an input image into the plurality of neural network models in the combined neural network model, to obtain outputs of the plurality of neural network models, respectively; and fusing the outputs of the plurality of neural network models to obtain an output of the combined neural network model, wherein the plurality of neural network models comprise a first neural network model, the first neural network model is configured to perform a first image processing method, and the first image processing method comprises: obtaining an input image; obtaining, based on the input image, initial feature images of N stages with resolutions from high to low, wherein N is a positive integer and N>2; performing, based on initial feature images of second to N-th stages, cyclic scaling processing on an initial feature image of a first stage, to obtain an intermediate feature image; and performing merging processing on the intermediate feature image to obtain an output image, and the image processing method further comprises: performing crop processing on the input image to obtain a plurality of sub-input images with an overlapping region; wherein obtaining, based on the input image, the initial feature images of the N stages with resolution from high to low, comprises: obtaining, based on each of the sub-input images, sub-initial feature images of N stages with resolutions from high to low, wherein N is a positive integer and N>2; wherein performing, based on the initial feature images of the second to N-th stages, the cyclic scaling processing on the initial feature image of the first stage to obtain the intermediate feature image, comprises: performing, based on sub-initial feature images of second to N-th stages, cyclic scaling processing on a sub-initial feature image of a first stage, to obtain a sub-intermediate feature image; and wherein performing the merging processing on the intermediate feature image to obtain the output image, comprises: performing merging processing on the sub-intermediate feature image to obtain a corresponding sub-output image, and stitching sub-output images corresponding to the plurality of sub-input images into the output image.
2 . The image processing method based on the combined neural network model according to claim 1 , wherein fusing the outputs of the plurality of neural network models to obtain the output of the combined neural network model, comprises: averaging the outputs of the plurality of neural network models to obtain the output of the combined neural network model.
3 . The image processing method based on the combined neural network model according to claim 1 , wherein the cyclic scaling processing comprises hierarchically-nested scaling processing of N−1 stages, and scaling processing of each stage comprises down-sampling processing, concatenating processing, up-sampling processing, and residual link addition processing; down-sampling processing of an i-th stage performs, based on an input of scaling processing of the i-th stage, down-sampling to obtain a down-sampling output of the i-th stage, concatenating processing of the i-th stage performs, based on the down-sampling output of the i-th stage and an initial feature image of an (i+1)-th stage, concatenating to obtain a concatenating output of the i-th stage, up-sampling processing of the i-th stage obtains an up-sampling output of the i-th stage based on the concatenating output of the i-th stage, and residual link addition processing of the i-th stage performs residual link addition between the input of the scaling processing of the i-th stage and the up-sampling output of the i-th stage, to obtain an output of the scaling processing of the i-th stage, wherein i=1, 2, . . . , N−1; and scaling processing of a (j+1)-th stage is nested between down-sampling processing of a j-th stage and concatenating processing of the j-th stage, and an output of the down-sampling processing of the j-th stage serves as an input of the scaling processing of the (j+1)-th stage, wherein j=1, 2, . . . , N−2.
4 . The image processing method according to claim 1 , wherein among the initial feature images of the N stages, resolution of the initial feature image of the first stage is provided with a highest value, and the resolution of the initial feature image of the first stage is identical to resolution of the input image.
5 . The image processing method according to claim 1 , wherein for the N states, resolution of an initial feature image of a former stage is an integer multiple of resolution of an initial feature image of a latter stage.
6 . The image processing method according to claim 1 , wherein obtaining, based on the input image, the initial feature images of the N stages with resolutions from high to low, comprises: concatenating the input image with a random noise image to obtain a concatenating input image; and performing analysis processing of N different stages on the concatenating input image, to obtain the initial feature images of the N stages with resolutions from high to low, respectively.
7 . The image processing method according to claim 1 , wherein the plurality of sub-input images are identical in size, centers of the plurality of sub-input images form a uniform and regular grid, an overlapping region of two adjacent sub-input images is provided with a constant size in both a row direction and a column direction, and a pixel value of each pixel point in the output image is expressed as: Y p = 1 ∑ k = 1 T s k ⁢ ∑ k = 1 T s k ⁢ Y k , ( p ) , wherein Y p represents a pixel value of any pixel point p in the output image, T represents a count of sub-output images comprising the pixel point p, Y k,(p) represents a pixel value of the pixel point p in a k-th sub-output image comprising the pixel point p, and S k represents a distance between the pixel point p in the k-th sub-output image comprising the pixel point p, and a center of the k-th sub-output image comprising the pixel point p.
8 . A training method of a neural network, wherein the neural network comprises an analysis network, a cyclic scaling network, and a merging network, and the training method comprises: obtaining a first training input image; processing, by using the analysis network, the first training input image, to obtain training initial feature images of N stages with resolutions from high to low, wherein N is a positive integer and N>2; performing, by using the cyclic scaling network and based on training initial feature images of second to N-th stages, cyclic scaling processing on a training initial feature image of a first stage, to obtain a training intermediate feature image; performing, by using the merging network, merging processing on the training intermediate feature image to obtain a first training output image; calculating, based on the first training output image, a loss value of the neural network through a loss function; and modifying a parameter of the neural network according to the loss value of the neural network, wherein the cyclic scaling processing comprises hierarchically-nested scaling processing of N−1 stages, and scaling processing of each stage comprises down-sampling processing, concatenating processing, up-sampling processing, and residual link addition processing which are sequentially performed; down-sampling processing of an i-th stage performs, based on an input of scaling processing of the i-th stage, down-sampling to obtain a down-sampling output of the i-th stage, concatenating processing of the i-th stage performs, based on the down-sampling output of the i-th stage and an initial feature image of an (i+1)-th stage, concatenating to obtain a concatenating output of the i-th stage, up-sampling processing of the i-th stage obtains an up-sampling output of the i-th stage based on the concatenating output of the i-th stage, and residual link addition processing of the i-th stage performs residual link addition between the input of the scaling processing of the i-th stage and the up-sampling output of the i-th stage, to obtain an output of the scaling processing of the i-th stage, wherein i=1, 2, . . . , N−1; and scaling processing of a (j+1)-th stage is nested between down-sampling processing of a j-th stage and concatenating processing of the j-th stage, and an output of the down-sampling processing of the j-th stage serves as an input of the scaling processing of the (j+1)-th stage, wherein j=1, 2, . . . , N−2.
9 . The training method of the neural network according to claim 8 , wherein the loss function is expressed as: L ⁡ ( Y , X ) = ∑ k = 1 N E [ ❘ "\[LeftBracketingBar]" S k - 1 ( Y ) - S k - 1 ( X ) ❘ "\[RightBracketingBar]" ] , wherein L(Y, X) represents the loss function, Y represents the first training output image, X represents a first training standard image corresponding to the first training input image, S k−1 (Y) represents an output obtained by performing down-sampling processing of a (k−1)-th stage on the first training output image, S k−1 (X) represents an output obtained by performing the down-sampling processing of the (k−1)-th stage on the first training standard image, and E[ ] represents calculation of matrix energy.
10 . The training method of the neural network according to claim 8 , wherein processing, by using the analysis network, the first training input image to obtain the training initial feature images of the N stages with resolutions from high to low, comprises: concatenating the first training input image with a random noise image to obtain a training concatenating input image; and performing, by using the analysis network, analysis processing of N different stages on the training concatenating input image, to obtain the training initial feature images of the N stages with resolutions from high to low, respectively.
11 . The training method of the neural network according to claim 10 , wherein calculating, based on the first training output image, the loss value of the neural network through the loss function, comprises: processing the first training output image by using a discriminative network, and calculating the loss value of the neural network based on an output of the discriminative network corresponding to the first training output image.
12 . The training method of the neural network according to claim 11 , wherein the discriminative network comprises: down-sampling sub-networks of M−1 stages, discriminative sub-networks of M stages, a merging sub-network, and an activation layer; the down-sampling sub-networks of the M−1 stages are configured to perform down-sampling processing of different stages on an input of the discriminative network, so as to obtain outputs of the down-sampling sub-networks of the M−1 stages; the input of the discriminative network and the outputs of the down-sampling sub-networks of the M−1 stages serve as inputs of the discriminative sub-networks of the M stages, respectively; the discriminative sub-network of each stage comprises a brightness processing sub-network, a first convolution sub-network, and a second convolution sub-network which are sequentially connected; an output of a second convolution sub-network in a discriminative sub-network of a t-th stage and an output of a first convolution sub-network in a discriminative sub-network of a (t+1)-th stage are concatenated as an input of a second convolution sub-network in the discriminative sub-network of the (t+1)-th stage, wherein t=1, 2, . . . , M−1; the merging sub-network is configured to perform merging processing on an output of a second convolution sub-network in a discriminative sub-network of an M-th stage, to obtain a discriminative output image; and the activation layer is configured to process the discriminative output image to obtain a value indicating quality of the input of the discriminative network.
13 . The training method of the neural network according to claim 12 , wherein the brightness processing sub-network comprises a brightness feature extraction sub-network, a normalization sub-network, and a translation correlation sub-network, the brightness feature extraction sub-network is configured to extract a brightness feature image, the normalization sub-network is configured to perform normalization processing on the brightness feature image to obtain a normalized brightness feature image, and the translation correlation sub-network is configured to perform multiple image translation processing on the normalized brightness feature image to obtain a plurality of shift images, and is configured to generate a plurality of correlation images according to correlation between the normalized brightness feature image and each of the shift images.
14 . The training method of the neural network according to claim 12 , wherein the loss function is expressed as: L ⁡ ( Y , X ) = λ 1 ⁢ L G ( Y W = 1 ) + λ 2 ⁢ L L ⁢ 1 ( S M ( Y W = 1 ) , S M ( X ) ) + λ 3 ⁢ L c ⁢ o ⁢ n ⁢ t ( Y W = 1 , X ) + λ 4 ⁢ L L ⁢ 1 ( Y W = 0 , X ) + λ 5 ⁢ L L ⁢ 1 ( S M ( Y W = 0 ) , S M ( X ) ) wherein L(Y, X) represents the loss function, Y represents the first training output image, Y comprises Y W=1 and Y W=0 , X represents a first training standard image corresponding to the first training input image, L G (Y W=1 ) represents a generative loss function, Y W=1 represents a first training output image obtained in a case where a noise amplitude of the random noise image is not zero, L L1 (S M (Y W=1 ), S M (X)) represents a first contrast loss function, L cont (Y W=1 , X) represents a content loss function, L L1 ((Y W=0 ), X) represents a second contrast loss function, Y W=0 represents a first training output image obtained in a case where the noise amplitude of the random noise image is zero, L L1 (S M (Y W=0 ), S M (X)) represents a third contrast loss function, S M ( ) represents performing down-sampling processing of an M-th stage, and λ 1 , λ 2 , λ 3 , λ 4 , and λ 5 represent preset weight values, respectively; the generative loss function L G (Y W=1 ) is expressed as: L G ( Y W = 1 ) = - E [ log ⁢ ( Sigmoid ( C ⁡ ( Y W = 1 ) - C ⁡ ( X ) ) ) ] , wherein C(Y W=1 ) represents a discriminative output image obtained in the case where the noise amplitude of the random noise image is not zero, and C(X) represents a discriminative output image obtained by taking the first training standard image as the input of the discriminative network; the first contrast loss function L L1 (S M (Y W=1 ), S M (X)), the second contrast loss function L L1 ((Y W=0 ), X), and the third contrast loss function L L1 (S M (Y W=0 ), S M (X)) are respectively expressed as: { L L ⁢ 1 ( S M ( Y W = 1 ) , S M ( X ) ) = E [ ❘ "\[LeftBracketingBar]" S M ( Y W = 1 ) - S M ( X ) ❘ "\[RightBracketingBar]" ] L L ⁢ 1 ( Y W = 0 , X ) = E [ ❘ "\[LeftBracketingBar]" Y W = 0 - X ❘ "\[RightBracketingBar]" ] L L ⁢ 1 ( S M ( Y W = 0 ) , S M ( X ) ) = E [ ❘ "\[LeftBracketingBar]" S M ( Y W = 0 ) - S M ( X ) ❘ "\[RightBracketingBar]" ] , wherein E[ ] represents calculation of matrix energy; the content loss function L cont (Y W=1 , X) is expressed as: L cont ( Y W = 1 , X ) = 1 2 ⁢ S 1 ⁢ ∑ ij ( F ij - P ij ) , wherein S 1 is a constant, F ij represents a value of a j-th position in a first content feature image of a first training output image extracted by an i-th convolution kernel in a content feature extraction module, and P ij represents a value of a j-th position in a second content feature image of a first training standard image extracted by the i-th convolution kernel in the content feature extraction module.
15 . The training method of the neural network according to claim 11 , further comprising: training the discriminative network based on the neural network; and alternately performing a training process of the discriminative network and a training process of the neural network to obtain a trained neural network, wherein training the discriminative network based on the neural network, comprises: obtaining a second training input image; processing, by using the neural network, the second training input image to obtain a second training output image; calculating, based on the second training output image, a discriminative loss value through a discriminative loss function; and modifying a parameter of the discriminative network according to the discriminative loss value.
16 . The training method of the neural network according to claim 15 , wherein the discriminative loss function is expressed as: L D ( V W = 1 ) = - E [ log ⁢ ( Sigmoid ( C ⁡ ( U W = 1 ) - C ⁡ ( V W = 1 ) ) ) ] , wherein L D (V W=1 ) represents the discriminative loss function, U represents a second training standard image corresponding to the second training input image, V W=1 represents a second training output image obtained in a case where a noise amplitude of the random noise image is not zero, C(U) represents a discriminative output image obtained by taking the second training standard image as the input of the discriminative network, and C(V W=1 ) represents a discriminative output image obtained in the case where the noise amplitude of the random noise image is not zero.
17 . The training method of the neural network according to claim 8 , further comprising: previous to training, performing crop processing and decode processing on each sample image in a training set, to obtain a plurality of sub-sample images in binary data format; and during training, training the neural network based on the plurality of sub-sample images in the binary data format.
18 . The training method of the neural network according to claim 17 , wherein the plurality of sub-sample images are identical in size.

Description

This application is a continuation application of U.S. application Ser. No. 17/419,350, filed on Jun. 29, 2021, which is a U.S. National Phase Entry of International Application No. PCT/CN2020/120586 filed on Oct. 13, 2020, designating the United States of America and claiming priority to Chinese Patent Application No. 201910995755.2, filed on Oct. 18, 2019. The present application claims priority to and the benefit of the above-identified applications and the above-identified applications are incorporated by reference herein in their entirety. TECHNICAL FIELD Embodiments of the present disclosure relate to an image processing method, an image processing device, a training method of a neural network, an image processing method based on a combined neural network model, a constructing method of a combined neural network model, a neural network processor, and a storage medium. BACKGROUND Currently, deep learning technology based on artificial neural networks has made great progress in fields such as image classification, image capture and search, facial recognition, age and voice recognition, etc. The advantage of deep learning lies in that it can solve extremely different technical problems with a relatively similar system by using a general architecture. A convolutional neural network (CNN) is a kind of artificial neural networks which has been developed in recent years and attracted wide attention. The CNN is a special method of image recognition, which is a highly effective network with forward feedback. At present, the application scope of the CNN is not only limited to the field of image recognition, but can also be applied to other application directions, such as face recognition, text recognition, image processing, etc. SUMMARY At least one embodiment of the present disclosure provides an image processing method, and the image processing method comprises: obtaining an input image; obtaining, based on the input image, initial feature images of N stages with resolutions from high to low, where N is a positive integer and N>2; performing, based on initial feature images of second to N-th stages, cyclic scaling processing on an initial feature image of a first stage, to obtain an intermediate feature image; and performing merging processing on the intermediate feature image to obtain an output image, where the cyclic scaling processing comprises hierarchically-nested scaling processing of N−1 stages, and scaling processing of each stage comprises down-sampling processing, concatenating processing, up-sampling processing, and residual link addition processing; down-sampling processing of an i-th stage performs, based on an input of scaling processing of the i-th stage, down-sampling to obtain a down-sampling output of the i-th stage, concatenating processing of the i-th stage performs, based on the down-sampling output of the i-th stage and an initial feature image of an (i+1)-th stage, concatenating to obtain a concatenating output of the i-th stage, up-sampling processing of the i-th stage obtains an up-sampling output of the i-th stage based on the concatenating output of the i-th stage, and residual link addition processing of the i-th stage performs residual link addition between the input of the scaling processing of the i-th stage and the up-sampling output of the i-th stage, to obtain an output of the scaling processing of the i-th stage, where i=1, 2, . . . , N−1; and scaling processing of a (j+1)-th stage is nested between down-sampling processing of a j-th stage and concatenating processing of the j-th stage, and an output of the down-sampling processing of the j-th stage serves as an input of the scaling processing of the (j+1)-th stage, where j=1, 2, . . . , N−2. For example, the concatenating processing of the i-th stage performing, based on the down-sampling output of the i-th stage and the initial feature image of the (i+1)-th stage, concatenating to obtain the concatenating output of the i-th stage, comprises: taking the down-sampling output of the i-th stage as an input of scaling processing of the (i+1)-th stage, to obtain an output of the scaling processing of the (i+1)-th stage; and concatenating the output of the scaling processing of the (i+1)-th stage with the initial feature image of the (i+1)-th stage to obtain the concatenating output of the i-th stage. For example, scaling processing of at least one stage is continuously performed a plurality of times, and an output of a former scaling processing serves as an input of a latter scaling processing. For example, the scaling processing of each stage is continuously performed twice. For example, among the initial feature images of the N stages, resolution of the initial feature image of the first stage is provided with a highest value, and the resolution of the initial feature image of the first stage is identical to resolution of the input image. For example, resolution of an initial feature image of a former stage is an integer mult