US-12620099-B2 - Picture quality-sensitive semantic segmentation for use in training image generation adversarial networks

US12620099B2US 12620099 B2US12620099 B2US 12620099B2US-12620099-B2

Abstract

A method includes training a semantic segmentation network to generate semantic segmentation maps having class-wise probability values. The method also includes generating a semantic segmentation map using the trained semantic segmentation network. The method further includes utilizing the semantic segmentation map during training of an image generation network as part of a loss function that includes multiple losses. The semantic segmentation network may be trained to be sensitive to picture quality of an output image generated by the image generation network during the training of the image generation network such that increased degradation of the picture quality of the output image results in decreased prediction confidence by the semantic segmentation network. The semantic segmentation network may be trained to vary the class-wise probability values based on the picture quality.

Inventors

Tien C. Bau
Hrishikesh Deepak Garud

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20220802

Claims (20)

1 . A method comprising: training a semantic segmentation network to generate semantic segmentation maps comprising class-wise probability values; generating a semantic segmentation map using the trained semantic segmentation network; and utilizing the semantic segmentation map during training of an image generation network as part of a loss function that comprises multiple losses; wherein training the semantic segmentation network comprises training the semantic segmentation network to be sensitive to picture quality of an output image generated by the image generation network during the training of the image generation network such that increased degradation of the picture quality of the output image results in decreased prediction confidence by the semantic segmentation network; wherein training the semantic segmentation network to be sensitive to picture quality comprises training the semantic segmentation network to vary the class-wise probability values based on the picture quality; and wherein training the semantic segmentation network to vary the class-wise probability values based on the picture quality comprises scaling each class-wise probability value using a confusion factor having an inverse relationship with the picture quality such that each class-wise probability value indicates higher confusion when the picture quality is lower.
2 . The method of claim 1 , further comprising: deploying the trained image generation network without the semantic segmentation network.
3 . The method of claim 1 , wherein the multiple losses of the loss function comprise a perceptual loss provided by the semantic segmentation network during the training of the image generation network.
4 . The method of claim 1 , wherein the multiple losses of the loss function comprise a reconstruction loss.
5 . The method of claim 1 , further comprising: determining the confusion factor based on a degree of degradation of the output image generated by the image generation network.
6 . The method of claim 1 , wherein the multiple losses of the loss function comprise a semantic segmentation loss provided by the semantic segmentation network, a pixel loss, and a generative adversarial network (GAN) loss provided by a discriminator network.
7 . The method of claim 6 , wherein the loss function further comprises a perceptual loss provided by a pre-trained perceptual neural network.
8 . The method of claim 1 , wherein the image generation network comprises a super-resolution neural network or an image simulation network.
9 . An electronic device comprising: at least one memory configured to store instructions; and at least one processing device configured when executing the instructions to: train a semantic segmentation network to generate semantic segmentation maps comprising class-wise probability values; generate a semantic segmentation map using the trained semantic segmentation network; and train an image generation network based on the semantic segmentation map as part of a loss function that comprises multiple losses; wherein the at least one processing device is configured to train the semantic segmentation network to be sensitive to picture quality of an output image generated by the image generation network during the training of the image generation network based on a confusion factor, the confusion factor having an inverse relationship with the picture quality; wherein, to train the semantic segmentation network to be sensitive to picture quality, the at least one processing device is configured to train the semantic segmentation network to vary the class-wise probability values based on the picture quality; and wherein, to train the semantic segmentation network to vary the class-wise probability values based on the picture quality, the at least one processing device is configured to scale each class-wise probability value using the confusion factor having the inverse relationship with the picture quality such that each class-wise probability value indicates higher confusion when the picture quality is lower.
10 . The electronic device of claim 9 , wherein increased degradation of the picture quality of the output image results in decreased prediction confidence by the semantic segmentation network.
11 . The electronic device of claim 9 , wherein the multiple losses of the loss function comprise a perceptual loss provided by the semantic segmentation network during the training of the image generation network.
12 . The electronic device of claim 9 , wherein the multiple losses of the loss function comprise a reconstruction loss.
13 . The electronic device of claim 9 , wherein the multiple losses of the loss function comprise a semantic segmentation loss provided by the semantic segmentation network, a pixel loss, and a generative adversarial network (GAN) loss provided by a discriminator network.
14 . The electronic device of claim 13 , wherein the loss function further comprises a perceptual loss provided by a pre-trained perceptual neural network.
15 . The electronic device of claim 9 , wherein the image generation network comprises a super-resolution neural network or an image simulation network.
16 . A non-transitory machine-readable medium containing instructions that when executed cause at least one processor of an electronic device to: train a semantic segmentation network to generate semantic segmentation maps comprising class-wise probability values; generate a semantic segmentation map using the trained semantic segmentation network; and train an image generation network based on the semantic segmentation map as part of a loss function that comprises multiple losses; wherein the instructions that when executed cause the at least one processor to train the semantic segmentation network comprise instructions that when executed cause the at least one processor to train the semantic segmentation network to be sensitive to picture quality of an output image generated by the image generation network during the training of the image generation network based on a confusion factor, the confusion factor having an inverse relationship with the picture quality; wherein the instructions that when executed cause the at least one processor to train the semantic segmentation network to be sensitive to picture quality comprise instructions that when executed cause the at least one processor to train the semantic segmentation network to vary the class-wise probability values based on the picture quality; and wherein the instructions that when executed cause the at least one processor to train the semantic segmentation network to vary the class-wise probability values based on the picture quality comprise instructions that when executed cause the at least one processor to scale each class-wise probability value using the confusion factor having the inverse relationship with the picture quality such that each class-wise probability value indicates higher confusion when the picture quality is lower.
17 . The non-transitory machine-readable medium of claim 16 , wherein increased degradation of the picture quality of the output image results in decreased prediction confidence by the semantic segmentation network.
18 . The non-transitory machine-readable medium of claim 16 , wherein the multiple losses of the loss function comprise a perceptual loss provided by the semantic segmentation network during the training of the image generation network.
19 . The non-transitory machine-readable medium of claim 16 , wherein the multiple losses of the loss function comprise a reconstruction loss.
20 . The non-transitory machine-readable medium of claim 16 , wherein the multiple losses of the loss function comprise a semantic segmentation loss provided by the semantic segmentation network, a pixel loss, and a generative adversarial network (GAN) loss provided by a discriminator network.

Description

CROSS-REFERENCE TO RELATED APPLICATION AND PRIORITY CLAIM This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/244,988 filed on Sep. 16, 2021, which is hereby incorporated by reference in its entirety. TECHNICAL FIELD This disclosure relates generally to imaging systems. More specifically, this disclosure relates to a system and method for picture quality-sensitive semantic segmentation for use in training image generation adversarial networks. BACKGROUND Image generation algorithms typically create new images from scratch by learning abstract contextual information of real-life objects, such as cars, trees, mountains, clouds, and the like. Image generation algorithms are useful or important for multiple applications like training data generation, super-resolution, simulation, and the like. Typically, machine learning models are trained using special methods and loss functions to achieve desired results. For example, generative adversarial network (GAN)-based super-resolution algorithms often try to generate the most realistic high-resolution images with the aid of perceptual loss and discriminator loss. Most of these algorithms generate details that are plausible but not realistic, meaning one can easily tell they are artificially generated on close inspection. SUMMARY This disclosure provides a system and method for picture quality-sensitive semantic segmentation for use in training image generation adversarial networks. In a first embodiment, a method includes training a semantic segmentation network to generate semantic segmentation maps having class-wise probability values. The method also includes generating a semantic segmentation map using the trained semantic segmentation network. The method further includes utilizing the semantic segmentation map during training of an image generation network as part of a loss function that includes multiple losses. In a second embodiment, an electronic device includes at least one memory configured to store instructions. The electronic device also includes at least one processing device configured when executing the instructions to train a semantic segmentation network to generate semantic segmentation maps having class-wise probability values. The at least one processing device is also configured when executing the instructions to generate a semantic segmentation map using the trained semantic segmentation network. The at least one processing device is further configured when executing the instructions to utilize the semantic segmentation map during training of an image generation network as part of a loss function that includes multiple losses. In a third embodiment, a non-transitory machine-readable medium contains instructions that when executed cause at least one processor of an electronic device to train a semantic segmentation network to generate semantic segmentation maps having class-wise probability values. The medium also contains instructions that when executed cause the at least one processor to generate a semantic segmentation map using the trained semantic segmentation network. The medium further contains instructions that when executed cause the at least one processor to utilize the semantic segmentation map during training of an image generation network as part of a loss function that includes multiple losses. Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims. Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The terms “transmit,” “receive,” and “communicate,” as well as derivatives thereof, encompass both direct and indirect communication. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrase “associated with,” as well as derivatives thereof, means to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like. Moreover, various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The terms “application” and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any