EP-4339836-B1 - NETWORK MODEL COMPRESSION METHOD, APPARATUS AND DEVICE, IMAGE GENERATION METHOD, AND MEDIUM

EP4339836B1EP 4339836 B1EP4339836 B1EP 4339836B1EP-4339836-B1

Inventors

WU, JIE
LI, SHAOJIE
XIAO, Xuefeng

Dates

Publication Date: 20260513
Application Date: 20220919

Claims (12)

A computer-implemented image generation method, comprising: inputting a random noise signal into a second generator to enable the second generator to generate a false image according to the random noise signal (S410); and inputting the false image into a second discriminator to enable that the second discriminator discriminates that the false image is true and then outputs the false image (S420), wherein the second generator and the second discriminator are obtained by using a network model compression method, a network model to be compressed comprises a first generator and a first discriminator, characterized in that the network model compression method comprises: performing pruning processing on the first generator to obtain the second generator (S110) (S210) (S310); and configuring states of convolution kernels in the first discriminator to enable a part of the convolution kernels to be in an activated state and the other part of the convolution kernels to be in a suppressed state, so as to obtain the second discriminator (S120), wherein a loss difference between the first generator and the first discriminator is a first loss difference, a loss difference between the second generator and the second discriminator is a second loss difference, and an absolute value of a difference value between the first loss difference and the second loss difference is less than a first preset threshold, wherein configuring states of convolution kernels in the first discriminator to enable a part of the convolution kernels to be in an activated state and the other part of the convolution kernels to be in a suppressed state, so as to obtain a second discriminator (S120), comprising: freezing a retention factor corresponding to each convolution kernel in the second discriminator, and determining a first weight parameter of the second discriminator, wherein the retention factor is used for characterizing importance of a convolution kernel corresponding the retention factor (S250); freezing the first weight parameter of the second discriminator and a second weight parameter of the second generator, and determining respective retention factors (S260); and repeatedly performing operations of determining the first weight parameter of the second discriminator and determining the respective retention factors until the absolute value of the difference value between the first loss difference and the second loss difference is less than the first preset threshold, wherein determining the respective retention factors comprises: determining the respective retention factors according to an objective function of the respective retention factors; in response to a retention factor being less than a second preset threshold, determining the retention factor to be 0; and in response to a retention factor being greater than or equal to the second preset threshold, determining the retention factor to be 1 (S370); wherein before determining the respective retention factors according to an objective function of the respective retention factors, the network model compression method further comprises: determining the objective function of the respective retention factors according to an objective function of the second generator, an objective function of the second discriminator, a loss function of the second discriminator with respect to false pictures, an objective function of the first generator, and a loss function of the first discriminator with respect to false pictures (S240) (S350).
The image generation method according to claim 1, wherein the first weight parameter comprises weight parameters corresponding to other elements in the second discriminator other than the respective retention factors.
The image generation method according to claim 1, wherein the second weight parameter comprises weight parameters corresponding to elements in the second generator.
The image generation method according to claim 1, wherein determining a first weight parameter of the second discriminator comprises: determining the first weight parameter of the second discriminator according to an objective function of the second discriminator; and the network model compression method further comprises: determining the second weight parameter of the second generator according to an objective function of the second generator.
The image generation method according to claim 4, wherein before determining the second weight parameter of the second generator according to an objective function of the second generator, the network model compression method further comprises: determining the objective function of the second generator according to a loss function of the second generator (S220); and determining the objective function of the second discriminator according to a loss function of the second discriminator with respect to real pictures and a loss function of the second discriminator with respect to false pictures (S230) (S340).
The image generation method according to claim 5, wherein before determining the objective function of the second generator according to a loss function of the second generator (S220), the network model compression method further comprises: taking the first generator and the first discriminator as a teacher generative adversarial network, and taking the second generator and the second discriminator as a student generative adversarial network (S320); and determining the objective function of the second generator according to a loss function of the second generator (S220) comprises: determining the objective function of the second generator according to a distillation objective function between the teacher generative adversarial network and the student generative adversarial network, and the loss function of the second generator (S330).
The image generation method according to claim 6, wherein determining the objective function of the second generator according to a distillation objective function between the teacher generative adversarial network and the student generative adversarial network, and the loss function of the second generator (S330), comprises: summing, according to weights, the distillation objective function and an objective function component determined according to the loss function of the second generator, to determine the objective function of the second generator; and before determining the objective function of the second generator according to a distillation objective function between the teacher generative adversarial network and the student generative adversarial network, and the loss function of the second generator (S330), the network model compression method further comprises: determining a first similarity metric function according to a similarity between intermediate feature maps of at least one layer in the first generator and the second generator; inputting false pictures generated by the first generator into the first discriminator to obtain a first intermediate feature map of at least one layer in the first discriminator; inputting false pictures generated by the second generator into the first discriminator to obtain a second intermediate feature map of at least one layer in the first discriminator; determining a second similarity metric function according to a similarity between the first intermediate feature map of the at least one layer and the second intermediate feature map of the at least one layer; and determining the distillation objective function according to the first similarity metric function and the second similarity metric function.
The image generation method according to claim 7, wherein determining a first similarity metric function according to a similarity between intermediate feature maps of at least one layer in the first generator and the second generator, comprises: inputting an intermediate feature map of an i-th layer in the first generator and an intermediate feature map of an i-th layer in the second generator into a similarity metric function to obtain a first sub-similarity metric function corresponding to the i-th layer, wherein i is a positive integer, i takes a value from 1 to M, and M is a total number of layers of the first generator and the second generator; and determining the first similarity metric function according to first sub-similarity metric functions corresponding to respective layers; determining a second similarity metric function according to a similarity between the first intermediate feature map of the at least one layer and the second intermediate feature map of the at least one layer, comprises: inputting a first intermediate feature map and a second intermediate feature map corresponding to a j-th layer into a similarity metric function to obtain a second sub-similarity metric function corresponding to the j-th layer, wherein j is a positive integer, 1 ≤ j ≤N, j takes a value from 1 to N, and N is a total number of layers of the first discriminator; and determining the second similarity metric function according to second sub-similarity metric functions corresponding to respective layers.
An image generation apparatus, comprising: a second generator, configured to receive a random noise signal to generate a false image according to the random noise signal; and a second discriminator, configured to receive the false image to discriminate that the false image is true and then output the false image, wherein the second generator and the second discriminator are obtained by using a network model compression apparatus (500), a network model to be compressed comprises a first generator and a first discriminator, characterized in that the network model compression apparatus comprises a pruning module (510) and a configuration module (520); wherein the pruning module (510) is configured to perform pruning processing on the first generator to obtain the second generator; the configuration module (520) is configured to configure states of convolution kernels in the first discriminator to enable a part of the convolution kernels to be in an activated state and the other part of the convolution kernels to be in a suppressed state, so as to obtain the second discriminator; and a loss difference between the first generator and the first discriminator is a first loss difference, a loss difference between the second generator and the second discriminator is a second loss difference, and an absolute value of a difference value between the first loss difference and the second loss difference is less than a first preset threshold, wherein the configuration module (520) is configured to configure states of convolution kernels in the first discriminator to enable a part of the convolution kernels to be in an activated state and the other part of the convolution kernels to be in a suppressed state, so as to obtain a second discriminator, comprising: freezing a retention factor corresponding to each convolution kernel in the second discriminator, and determining a first weight parameter of the second discriminator, wherein the retention factor is used for characterizing importance of a convolution kernel corresponding the retention factor (S250); freezing the first weight parameter of the second discriminator and a second weight parameter of the second generator, and determining respective retention factors (S260); and repeatedly performing operations of determining the first weight parameter of the second discriminator and determining the respective retention factors until the absolute value of the difference value between the first loss difference and the second loss difference is less than the first preset threshold, wherein determining the respective retention factors comprises: determining the respective retention factors according to an objective function of the respective retention factors; in response to a retention factor being less than a second preset threshold, determining the retention factor to be 0; and in response to a retention factor being greater than or equal to the second preset threshold, determining the retention factor to be 1 (S370); wherein before determining the respective retention factors according to an objective function of the respective retention factors, the network model compression method further comprises: determining the objective function of the respective retention factors according to an objective function of the second generator, an objective function of the second discriminator, a loss function of the second discriminator with respect to false pictures, an objective function of the first generator, and a loss function of the first discriminator with respect to false pictures (S240) (S350).
An image generation device, comprising: a memory, storing a computer program; and a processor, configured to execute the computer program, wherein the computer program, when executed by the processor, causes the processor to perform the image generation method according to any one of claims 1 to 8.
A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the image generation method according to any one of claims 1 to 8.
A computer program product, comprising a computer program carried on a non-transitory computer-readable medium, wherein the computer program comprises program code for performing the network model compression method according to any one of claims 1 to 8.

Description

TECHNICAL FIELD The present disclosure relates to the field of computer technology and, in particular, to a network model compression method, apparatus and device, an image generation method, and a medium. BACKGROUND Generative Adversarial Network (GAN) is a deep learning model and is one of the most promising methods for unsupervised learning on complex distributions in recent years, which is widely used in various image synthesis tasks, such as image generation, image resolution, and super-resolution. The non-patent document "Slimmable Generative Adversarial Networks" introduces slimmable GANs (SlimGANs), which can flexibly switch the width of the generator to accommodate various quality-efficiency trade-offs at runtime. Specifically, multiple discriminators that share partial parameters are leveraged to train the slimmable generator. To facilitate the consistency between generators of different widths, a stepwise inplace distillation technique that encourages narrow generators to learn from wide ones is presented. As for class-conditional generation, a sliceable conditional batch normalization that incorporates the label information into different widths is proposed. The methods are validated, both quantitatively and qualitatively, by extensive experiments and a detailed ablation study. The non-patent document "Teachers Do More Than Teach Compressing Image-To-Image Models" aims to address issues that Generative Adversarial Networks (GANs) suffer from low efficiency due to tremendous computational cost and bulky memory usage in generating high-fidelity images. This work is realized by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation. First, the search space of generative models is visited, introducing an inception-based residual block into generators. Second, to achieve target computation cost, a one-step pruning algorithm that searches a student architecture from the teacher model and substantially reduces searching cost is proposed. It requires no l1 sparsity regularization and its associated hyper-parameters, simplifying the training procedure. Finally, it proposes to distill knowledge through maximizing feature similarity between teacher and student via an index named Global Kernel Alignment (GKA). The compressed networks achieve similar or even better image fidelity (FID, mIoU) than the original models with much-reduced computational cost, e.g., MACs. SUMMARY At least one embodiment of the present disclosure provides a network model compression method, apparatus and device, an image generation method, and a medium, which may solve one or more problems in the art. The object is achieved by the features of the respective independent claims. Further embodiments are defined in the respective dependent claims. BRIEF DESCRIPTION OF DRAWINGS The drawings herein are incorporated into and form a part of the specification, illustrate the embodiments consistent with the present disclosure, and are used in conjunction with the specification to explain the principles of the present disclosure. In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or in prior art, the drawings to be used in the description of the embodiments or prior art will be briefly described below, and it will be obvious to those ordinarily skilled in the art that other drawings can be obtained on the basis of these drawings without inventive work. FIG. 1 is a flowchart of a network model compression method according to an embodiment of the present disclosure;FIG. 2 is a flowchart of a network model compression method according to an embodiment of the present disclosure;FIG. 3 is a flowchart of a network model compression method according to an embodiment of the present disclosure;FIG. 4 is a flowchart of an image generation method according to an embodiment of the present disclosure;FIG. 5 is a schematic diagram of a structure of a network model compression apparatus according to an embodiment of the present disclosure; andFIG. 6 is a schematic diagram of a structure of a network model compression device according to an embodiment of the present disclosure. DETAILED DESCRIPTION In order to understand the above objects, features and advantages of the present disclosure more clearly, the solutions of the present disclosure will be further described below. It should be noted that, in case of no conflict, the features in one embodiment or in different embodiments can be combined. Many specific details are set forth in the following description to fully understand the present disclosure, but the present disclosure can also be implemented in other ways different from those described here; obviously, the embodiments in the specification are a part but not all of the embodiments of the present disclosure. As GAN with a larger model usually consumes more computing resources, when it is ap