US-12620204-B2 - Synthetic digital image generation

US12620204B2US 12620204 B2US12620204 B2US 12620204B2US-12620204-B2

Abstract

There is provided mechanisms for rendering a synthetic digital image. A method is performed by an image processing device. The method comprises obtaining (S 102 ) an original digital image. The original digital image represents a depiction of a visual scene that comprises at least one object. The method comprises identifying (S 104 ) a first object in the original digital image. The first object is bounded by a bounding box in the original digital image. The method comprises generating (S 106 ) a synthetic object by a Generative Adversarial Network processing the first object and a second object. The synthetic object has a shape defined by a binary segmentation mask as applied to the first object. The synthetic object has a texture and colour based on the second object. The method comprises rendering (S 108 ) the synthetic digital image by, in the original digital image, replacing the first object in the bounding box with the synthetic object.

Inventors

Volodya Grancharov
Ludwig THAUNG

Assignees

TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Dates

Publication Date: 20260505
Application Date: 20201019

Claims (18)

1 . A method for rendering a synthetic digital image, the method being performed by an image processing device, the method comprising: obtaining an original digital image, the original digital image representing a depiction of a visual scene that comprises at least one object; identifying a first object in the original digital image, the first object being bounded by a bounding box in the original digital image; generating a synthetic object by a Generative Adversarial Network processing the first object and a second object, wherein i) the synthetic object has a shape defined by a binary segmentation mask as applied to the first object, and ii) the synthetic object has a texture and colour based on the second object; and rendering the synthetic digital image by, in the original digital image, replacing the first object in the bounding box with the synthetic object.
2 . The method of claim 1 , wherein the second object is obtained from outside the original digital image.
3 . The method of claim 2 , wherein the second object is obtained from any of: a production stencil, a three-dimensional, 3D, Computer Aided Design, CAD, drawing, a digital blueprint, a digital image different from the original digital image.
4 . The method of claim 1 , wherein the visual scene comprises at least two objects, and wherein the second object is obtained from the original digital image.
5 . The method of claim 1 , wherein the bounding box of the first object is a first bounding box, wherein the second object is bounded by a second bounding box, and wherein at least one of the first object and the second object is scaled such that the second bounding box is of same size as the first bounding box.
6 . The method of claim 1 , wherein the first object is delimited by a contour, and wherein the binary segmentation mask follows the contour of the first object.
7 . The method of claim 1 , wherein the first object is delimited by a contour, and wherein the binary segmentation mask when applied to the first object deviates from the contour of the first object.
8 . The method of claim 7 , wherein the binary segmentation mask extends beyond the contour of the first object, and wherein the texture and colour in the part of the synthetic object that extends beyond the contour of the first object is based on at least one of: the texture and colour of the second object, texture and colour of a third object, context-based texture and colour as provided by the Generative Adversarial Network.
9 . The method of claim 1 , wherein part of the texture and colour of the first object is preserved when being processed by the Generative Adversarial Network.
10 . The method of claim 9 , wherein the first object is delimited by a contour, wherein the binary segmentation mask when applied to the first object at least partly is confined within the contour of the first object, and wherein the part of the texture and colour of the first object that is not at least partly covered by the binary segmentation mask is preserved when being processed by the Generative Adversarial Network.
11 . The method of claim 1 , wherein the binary segmentation mask applied to the first object is a first binary segmentation mask, wherein a second binary segmentation mask is applied to the second object when being processed by the Generative Adversarial Network.
12 . The method of claim 1 , wherein the method further comprises: feeding the synthetic digital image as training data to a visual object detector.
13 . The method of claim 1 , wherein the first object represents a depiction of a piece of industrial equipment in the visual scene.
14 . The method of claim 13 , wherein the piece of industrial equipment is part of any of: a telecommunication system, an electric power grid, an oil and/or gas production facility, an industrial process site.
15 . An image processing device for generating a synthetic digital image, the image processing device comprising processing circuitry, the processing circuitry being configured to cause the image processing device to: obtain an original digital image, the original digital image representing a depiction of a visual scene that comprises at least one object ( 120 : 150 ); identify a first object in the original digital image, the first object being bounded by a bounding box in the original digital image; generate a synthetic object by a Generative Adversarial Network processing the first object and a second object, wherein the synthetic object has a shape defined by a binary segmentation mask as applied to the first object, and wherein the synthetic object has a texture and colour based on the second object; and render the synthetic digital image by, in the original digital image, replacing the first object in the bounding box with the synthetic object.
16 . The image processing device of claim 15 , wherein the second object is obtained from outside the original digital image.
17 . The image processing device of claim 16 , wherein the second object is obtained from: a production stencil, a three-dimensional Computer Aided Design drawing, a digital blueprint, or a digital image different from the original digital image.
18 . A non-transitory computer readable storage medium storing a computer program for generating a synthetic digital image, the computer program comprising computer code which, when run on processing circuitry of an image processing device, causes the image processing device to: obtain an original digital image, the original digital image representing a depiction of a visual scene that comprises at least one object ( 120 : 150 ); identify a first object in the original digital image, the first object being bounded by a bounding box in the original digital image; generate a synthetic object by a Generative Adversarial Network processing the first object and a second object, wherein the synthetic object has a shape defined by a binary segmentation mask as applied to the first object, and wherein the synthetic object has a texture and colour based on the second object; and render the synthetic digital image by, in the original digital image, replacing the first object in the bounding box with the synthetic object.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a 35 U.S.C. § 371 National Stage of International Patent Application No. PCT/EP2020/079349, filed 2020 Oct. 19. TECHNICAL FIELD Embodiments presented herein relate to a method, an image processing device, a computer program, and a computer program product for generating a synthetic digital image. BACKGROUND Applications of visual object detectors can be found, for example, in the fields of recognizing objects on the road for self-driving vehicles, detecting object in smart factories, creating auto inventory of power grid installations or telecommunication installations. etc. To be able to detect an object, the visual object detector first needs to be trained on data annotated with the correct location and class of the object. Historically, data annotation has been achieved by, in a set of images or video frames, manual labeling of the objects of interests. This is expensive and time-consuming procedure. Since some visual object detectors are based on Convolutional Neural Networks (CNNs) with many (such as millions of) training parameters, large portions of training data are needed. One approach to make visual object detectors, trained with limited amounts of annotated data, more robust is to perform some type of data augmentation of the training data. Such data augmentation could involve simple manipulations of the training data, such as shifting, applying geometric transform, introducing perturbation on the color channels, etc. Such data augmentation could be regarded as artificial and completely disconnected from the specific domain where the visual object detector would be applied, and therefore bring only limited improvement in the training procedure. More advanced methods for generations of domain relevant augmented data for training of visual object detectors have been proposed. These methods are based on rendering synthetic objects (with different poses) on top of a random background, thus enabling large amounts of annotated data to be generated; the location of each rendered synthetic objects is readily available at the rendering engine. This approach, however, is not directly applicable for generating training data, in terms of synthetic digital images, for visual object detectors that should operate in scenarios, domains, or applications, where there are strict relations for connections and spatial relations between the objects to be detected. Hence, there is still a need for efficient generation of such synthetic digital images. SUMMARY An object of embodiments herein is to provide efficient rendering of large amounts of synthetic digital images where there are strict relations for connections and spatial relations between objects. According to a first aspect there is presented a method for rendering a synthetic digital image. The method is performed by an image processing device. The method comprises obtaining an original digital image. The original digital image represents a depiction of a visual scene that comprises at least one object. The method comprises identifying a first object in the original digital image. The first object is bounded by a bounding box in the original digital image. The method comprises generating a synthetic object by a Generative Adversarial Network processing the first object and a second object. The synthetic object has a shape defined by a binary segmentation mask as applied to the first object. The synthetic object has a texture and colour based on the second object. The method comprises rendering the synthetic digital image by, in the original digital image, replacing the first object in the bounding box with the synthetic object. According to a second aspect there is presented an image processing device for generating a synthetic digital image. The image processing device comprises processing circuitry. The processing circuitry is configured to cause the image processing device to obtain an original digital image. The original digital image represents a depiction of a visual scene that comprises at least one object. The processing circuitry is configured to cause the image processing device to identify a first object in the original digital image. The first object is bounded by a bounding box in the original digital image. The processing circuitry is configured to cause the image processing device to generate a synthetic object by a Generative Adversarial Network processing the first object and a second object. The synthetic object has a shape defined by a binary segmentation mask as applied to the first object. The synthetic object has a texture and colour based on the second object. The processing circuitry is configured to cause the image processing device to render the synthetic digital image by, in the original digital image, replacing the first object in the bounding box with the synthetic object. According to a third aspect there is presented an image processing device for generating a synthetic