US-12620201-B2 - Translating images based on semantic information

US12620201B2US 12620201 B2US12620201 B2US 12620201B2US-12620201-B2

Abstract

In implementation of techniques for translating images based on semantic information, a computing device implements a translation system to receive an input image in a first format, encoded semantic information describing a domain of the input image, and a selection of a second format. The translation system decodes the encoded semantic information using a machine learning model. The translation system then generates an output image in the second format by translating the input image from the first format to the second format using the machine learning model, the machine learning model guided by the decoded semantic information. The translation system then displays the output image in the second format in a user interface.

Inventors

Arthur Jules Martin Roullier
Tamy BOUBEKEUR
Rosalie Noémie Raphaëlle Martin
Romain Pierre Rouffet
Adrien Michel Paul Kaiser

Assignees

ADOBE INC.

Dates

Publication Date: 20260505
Application Date: 20231122

Claims (20)

1 . A method comprising: receiving, by a processing device, an input image in a first format, a selection of a second format, and semantic information describing a type of content depicted in the input image and selected from an index of multiple types of content; generating, by the processing device, an output image depicting the content in the second format by translating the input image from the first format to the second format using a machine learning model, the machine learning model guided by the semantic information; and displaying, by the processing device, the output image in the second format in a user interface.
2 . The method of claim 1 , wherein the input image is one of multiple layered channel images that form an overall image.
3 . The method of claim 1 , wherein the semantic information corresponds to a classification label in the index.
4 . The method of claim 3 , wherein the classification label in the index is manually selected.
5 . The method of claim 3 , wherein the classification label in the index is automatically selected based on automated classification.
6 . The method of claim 3 , wherein the semantic information is an embedding vector that corresponds to the classification label in the index.
7 . The method of claim 6 , wherein the machine learning model generates the embedding vector based on data from the classification label in the index.
8 . The method of claim 6 , wherein the embedding vector is adjusted to meet a threshold input vector size for input to the machine learning model.
9 . The method of claim 1 , wherein the machine learning model is trained using conditional dropout.
10 . A system comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: receiving an embedding vector at a connection between an encoding path and a decoding path of a machine learning model, the embedding vector including encoded semantic information describing a type of content depicted in a digital image and selected from an index of multiple types of content; and training the machine learning model on the type of the content of the digital image while decoding the encoded semantic information of the embedding vector using the decoding path of the machine learning model.
11 . The system of claim 10 , wherein the encoded semantic information corresponds to a classification label in the index.
12 . The system of claim 11 , wherein the classification label in the index is manually selected.
13 . The system of claim 11 , wherein the classification label in the index is automatically selected based on automated classification.
14 . The system of claim 10 , wherein the digital image is one of multiple layered channel images of an overall image.
15 . The system of claim 10 , wherein the machine learning model is trained using conditional dropout.
16 . The system of claim 10 , wherein the embedding vector is adjusted to meet a threshold input vector size for input to the machine learning model.
17 . A non-transitory computer-readable storage medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: receiving an input image in a first format, a selection of a second format, and encoded semantic information describing a type of content depicted in the input image an selected from an index of multiple types of content; decoding the encoded semantic information using a machine learning model; generating an output image depicting the content in the second format by translating the input image from the first format to the second format using the machine learning model, the machine learning model guided by the decoded semantic information; and displaying the output image in the second format in a user interface.
18 . The non-transitory computer-readable storage medium of claim 17 , wherein the encoded semantic information describing the input image corresponds to a classification label in the index.
19 . The non-transitory computer-readable storage medium of claim 18 , wherein the encoded semantic information describing the input image includes an embedding vector that corresponds to the classification label in the index.
20 . The method of claim 1 , wherein the content depicted in the input image is a surface of a material, and the index of the multiple types of content correspond to different types of materials.

Description

BACKGROUND A U-Net is a type of convolutional neural network (CNN) architecture that is commonly used for image segmentation and analysis. The U-Net architecture is characterized by its U-shaped structure formed using an encoder and a decoder. The U-Net architecture also features skip connections that connect the encoder and the decoder at multiple levels. The skip connections help preserve fine-grained details and spatial information, allowing the network to produce precise segmentation maps and image filters. U-Nets are widely applied to various tasks, including biomedical image segmentation (including cell and tissue segmentation), image-to-image translation tasks, and other tasks. However, some applications of U-Nets result in visual inaccuracies, computational inefficiencies, and increased power consumption in real world scenarios. SUMMARY Techniques and systems for translating images based on semantic information are described. In an example, a translation system receives an input image in a first format, encoded semantic information describing a domain of the input image, and a selection of a second format. For example, the input image is one of multiple layered channel images that form an overall image, and the encoded semantic information corresponds to a classification label in an index of multiple classification labels. In some examples, the classification label in the index is manually selected. In other examples, the classification label in the index is automatically selected based on automated classification. Additionally or alternatively, the encoded semantic information is an embedding vector that corresponds to the classification label in the index. The translation system decodes the encoded semantic information using a machine learning model. The translation system then generates an output image in the second format by translating the input image from the first format to the second format using the machine learning model, the machine learning model guided by the decoded semantic information. The translation system then displays the output image in the second format in a user interface. This Summary introduces a selection of concepts in a simplified form that are further described below in the Detailed Description. As such, this Summary is not intended to identify essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. BRIEF DESCRIPTION OF THE DRAWINGS The detailed description is described with reference to the accompanying figures. Entities represented in the figures are indicative of one or more entities and thus reference is made interchangeably to single or plural forms of the entities in the discussion. FIG. 1 is an illustration of a digital medium environment in an example implementation that is operable to employ techniques and systems for translating images based on semantic information as described herein. FIG. 2 depicts a system in an example implementation showing operation of a translation module for translating images based on semantic information. FIG. 3 depicts an example of encoding semantic information. FIG. 4 depicts a chart comparing material channels generated using translating images based on semantic information to channel images generated using conventional techniques. FIG. 5 depicts a procedure in an example implementation of translating images based on semantic information. FIG. 6 depicts a procedure in an additional example implementation of translating images based on semantic information. FIG. 7 illustrates an example system including various components of an example device that can be implemented as any type of computing device as described and/or utilized with reference to FIGS. 1-6 to implement embodiments of the techniques described herein. DETAILED DESCRIPTION Overview Image translation tasks involve translating a channel image to a different type of channel image to support a variety of tasks, including image segmentation, style transfer, depth estimation, and material estimation in digital images. A channel image is a component or layer that makes up an image. For example, an RGB digital image includes a red channel image, a green channel image, and a blue channel image layered together. Some image translation tasks involve translating the RGB channel images to a different collection of channels. Specifically, image translation of RGB channel images that depict materials supports physically-based rendering (“PBR”) for three dimensional (3D) materials in gaming, architecture, design, fashion, film, or other applications. For example, an RGB image is captured that depicts a ceramic tile wall, with the intention of applying the ceramic tile wall to a 3D wall in a virtual environment. To accomplish this, channel images of the RGB image are translated to channel images of a material map, including a height channel, a base color channel, and a normal channel. However, trans