US-12626436-B2 - Using stable diffusion to produce background-free images conforming to target color

US12626436B2US 12626436 B2US12626436 B2US 12626436B2US-12626436-B2

Abstract

Generating custom team emblems using stable diffusion based on input text describing a desired image. A circle is overlaid in the center of a pure-color background representing each team's “color” and used as the input to stable diffusion img2img to produce emblems. This produces high-quality emblem outputs that generally match the input color.

Inventors

Sophia Zalewski

Assignees

Sony Interactive Entertainment LLC

Dates

Publication Date: 20260512
Application Date: 20231009

Claims (20)

1 . A method comprising: using a stable diffusion (SD) model to produce images in at least one desired color at least in part by: inputting to the SD model at least one guidance image in the form of a shape representing a foreground within a background, the shape being in a first color and the background being in at least a second color; adding noise within the shape but not adding noise inside an inner boundary of the shape or along edges of the background to force colors in an image output by the SD model to match input colors of the guidance image; and presenting the image output by the SD model.
2 . The method of claim 1 , comprising removing the noise within the shape.
3 . The method of claim 1 , comprising removing background in the image output by the SD model.
4 . The method of claim 1 , wherein the image output by the SD model comprises an emblem.
5 . The method of claim 1 , wherein the shape comprises a circular ring.
6 . The method of claim 1 , wherein the shape comprises one and only one color.
7 . The method of claim 1 , wherein the shape comprises a plurality of colors.
8 . The method of claim 1 , wherein the shape comprises a pixelated color palette.
9 . The method of claim 8 , wherein the pixelated color palette has a resolution of at least four pixels and no more than one hundred twenty eight by one hundred twenty eight (128×128) pixels.
10 . The method of claim 1 , comprising: inputting to the SD model at least one strength parameter used by the SD model to generate the output image.
11 . The method of claim 10 , wherein the strength parameter is at least 0.9.
12 . The method of claim 1 , wherein the guidance image comprises an emblem.
13 . The method of claim 1 , wherein the image comprises an emblem.
14 . The method of claim 1 , wherein at least one of the first color or the second color comprises one and only one color.
15 . One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: using a stable diffusion (SD) model to produce images in at least one desired color at least in part by: inputting to the SD model at least one guidance image in the form of a shape representing a foreground within a background, the shape being in a first color and the background being in at least a second color; adding noise within the shape but not adding noise inside an inner boundary of the shape or along edges of the background to force colors in an image output by the SD model to match input colors of the guidance image; and presenting the image output by the SD model.
16 . The media of claim 15 , wherein the operations comprise removing the noise within the shape.
17 . The media of claim 15 , wherein the operations comprise removing background in the image output by the SD model.
18 . The media of claim 15 , wherein the image output by the SD model comprises an emblem.
19 . The media of claim 15 , wherein at least one of the first color or the second color comprises one and only one color.
20 . A system comprising: one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: using a stable diffusion (SD) model to produce images in at least one desired color at least in part by: inputting to the SD model at least one guidance image in the form of a shape representing a foreground within a background, the shape being in a first color and the background being in at least a second color; adding noise within the shape but not adding noise inside an inner boundary of the shape or along edges of the background to force colors in an image output by the SD model to match input colors of the guidance image; and presenting the image output by the SD model.

Description

FIELD The present application relates to technically inventive, non-routine solutions that are necessarily rooted in computer technology and that produce concrete technical improvements, and more specifically to using generative networks to produce images with colors conforming to input. BACKGROUND Generative AI is a general term that refers to a type of neural network such as a large language model (LLM) such as a generative pre-trained transformer (GPTT) that can generate comparatively complex output based on comparatively terse input. An example of a LLM from the multi-domain realm is stable diffusion, which employs a series of neural networks to generate images from one or a few input words describing the desired image. As understood herein, improvements to stable diffusion are possible. SUMMARY As understood herein and with more particularity, the color palette of an image produced by stable diffusion (SD) may precisely match a desired palette of the desired image. As an example, it may desirable to generate custom in-game icons and emblems for a computer simulation such as a computer game based on different team colors, essentially meaning that SD should be fine-tuned for emblem generation. Accordingly, an apparatus includes at least one processor assembly configured to overlay an image of an enclosed shape in a center of a background. The image of the enclosed shape and the background represent at least two desired colors of an image output by a stable diffusion (SD) model. The processor assembly is configured to input the image of the enclosed shape in the center of the background along with the background as an image to the SD model, and present an output image from the SD model generated responsive to a text input and the image of the enclosed shape in the center of the background. In example embodiments, the enclosed shape includes a circular ring. The enclosed shape can be of one and only one color or it may include plural colors, such as a pixelated multi-color palette. The pixelated color palette can have a resolution of at least four pixels and no more than one hundred twenty eight by one hundred twenty eight (128×128) pixels. In some embodiments the processor assembly can be configured to input to the SD model at least one strength parameter used by the SD model to generate the output image. In non-limiting examples the strength parameter is at least 0.9. As set forth further below, if desired the processor assembly may be configured to add noise within the enclosed shape and not add noise to portions of the background. In some examples the output image from the SD model includes an inner image and a background surrounding the inner image, and the processor assembly may be configured to remove the background surrounding the inner image in a post-processing step. In another aspect, a method includes using a stable diffusion (SD) model to produce images in at least one desired color at least in part by inputting to the SD model at least one guidance image in the form of a shape in a foreground of a background. The shape is in a first color and the background is in at least a second color. The method includes adding noise within the shape but not adding noise inside an inner boundary of the shape or to portions of the background to force colors in an image output by the SD model to match input colors of the guidance image. In another aspect, an apparatus includes at least one computer medium that is not a transitory signal and that in turn includes instructions executable by at least one processor assembly to input at least one image to a stable diffusion (SD) model. The image includes an endless shape in a first color superimposed onto a background in at least a second color. The instructions are executable to add noise to the image only within boundaries of the endless shape and not to portions of the background. Further, the instructions are executable to receive from the SD model at least one output image having colors conforming to the first and second colors, and remove background portions from the output image to render a final image. The details of the present disclosure, both as to its structure and operation, can be best understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which: BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of an example system including an example in consistent with present principles; FIG. 2 illustrates example logic in example flow chart format consistent with present principles; FIG. 3 illustrates an example blocked color palette with resulting images; FIG. 4 illustrates an example pixelated patch color palette representation with resulting images; FIG. 5 illustrates three example pixelated patch color palette representations; FIG. 6 illustrates addition of a guidance image or portion thereof to example color palettes; FIG. 7 illustrates a guidance image line to be input with color palett