CN-121986361-A - Generating palette-compliant images using stable diffusion

CN121986361ACN 121986361 ACN121986361 ACN 121986361ACN-121986361-A

Abstract

Techniques are described for generating images, such as, but not limited to, custom team badges, based on input text describing a desired image using Stable Diffusion (SD). A pixelated multi-color palette (e.g., 300/400) with selectable guide images is input (202) to the SD model to achieve a desired target color of the image generated in response to the brief text input description. The optional guide image may be used to obtain a more accurate output image that conforms to the desired foreground object and allows for the use of lower intensity values.

Inventors

S. Zalevsky

Assignees

索尼互动娱乐有限责任公司

Dates

Publication Date: 20260505
Application Date: 20240927
Priority Date: 20231009

Claims (19)

1. An apparatus, comprising: At least one of the processor components is configured to, the at least one processor component is configured to: Inputting the palette representation as an input image with intensity parameters to a Stable Diffusion (SD) model; Inputting text into the SD model, and An image output by the SD model is presented in response to the input text conforming to the input palette.
2. The device of claim 1, wherein the processor component is configured to: The input palette representation is created at least in part by converting a block palette to a smaller pixelated tile palette representation.
3. The apparatus of claim 1, wherein the palette representation comprises a 16x16 pixelated "tile" palette.
4. The device of claim 2, wherein the processor component is configured to: A random color tile without spatial dependence is created at least in part by sampling random pixels in the block palette for each pixel in the palette representation being created to convert the block palette to the palette representation.
5. The apparatus of claim 1, wherein a resolution of the palette representation is at least four pixels.
6. The apparatus of claim 1, wherein a resolution of the palette representation is at least 8 pixels by 8 pixels.
7. The apparatus of claim 6, wherein a resolution of the palette representation does not exceed 128 by 128 (128 x 128) pixels.
8. The apparatus of claim 1, wherein the intensity parameter is at least 0.9.
9. The device of claim 1, wherein the processor component is configured to: at least one line of an image of an object is added to the input palette to produce an image that is visually similar to the object in the image and has a color that is related to the input palette.
10. The apparatus of claim 1, wherein the image comprises a badge.
11. A method, comprising: training a Stable Diffusion (SD) model to produce an image of at least one desired color at least in part by: inputting a tile representation of an input palette to the SD model, the tile representation having a resolution between 8x8 pixels and 256x256 pixels; Inputting at least a portion of a guide image to the SD model along with the tile representation; Using an intensity value between 0.9 and 1; and rendering an image output by the SD model at least partially in response to the tile representation, the portion of the guide image, and the intensity values.
12. The method of claim 11, wherein the image comprises a badge.
13. An apparatus, comprising: at least one computer medium that is not a transitory signal and that includes instructions executable by at least one processor component to: inputting the color representation as an input to a Stable Diffusion (SD) model; Receiving input text of the SD model; and rendering an image output by the SD model in response to the input text conforming to the input color.
14. The apparatus of claim 13, wherein the instructions are executable to: The color representation is established at least in part by converting a block palette to a pixelated tile palette representation.
15. The apparatus of claim 13, wherein the color representation comprises a 16x16 pixelated "tile" palette.
16. The apparatus of claim 14, wherein the instructions are executable to: a random color tile is created at least in part by sampling random pixels in the block palette for each pixel in the color representation being created to convert the block palette to the color representation.
17. The apparatus of claim 13, wherein the resolution of the color representation is at least four pixels and no more than 128 by 128 (128 x 128) pixels.
18. The apparatus of claim 13, wherein the instructions are executable to input an intensity parameter of at least 0.9 to the SD model along with the color representation.
19. The apparatus of claim 13, wherein the instructions are executable to: At least one line of the image of the object is added to the input color representation.

Description

Generating palette-compliant images using stable diffusion Technical Field The present application relates to a technically innovative unconventional solution that has to be rooted in computer technology and that yields specific technical improvements, and more particularly to the generation of images with colors that conform to the input using a generative network. Background The generative AI refers to a generic term of a type of neural network, such as a Large Language Model (LLM), such as a generative pre-training transformer (GPTT) that can generate relatively complex outputs based on relatively compact inputs. An example of LLM in many fields is stable diffusion, which employs a series of neural networks to generate an image from one or several input words describing the desired image. As understood herein, stable diffusion may be improved. Disclosure of Invention As understood herein and more particularly, the palette of images produced by Stable Diffusion (SD) may be exactly matched to the desired palette of desired images. As an example, for computer simulations (such as computer games based on different team colors), it may be necessary to generate custom in-game icons and logos, which in effect means that the SDs should be fine-tuned to generate the logos. Accordingly, an apparatus includes at least one processor component configured to input a palette representation as an input image with intensity parameters to a Stable Diffusion (SD) model. The palette representation includes at least two colors. The processor component is configured to input text to the SD model and render an image output by the SD model in response to the input text conforming to the input palette. In an example embodiment, the processor component may be configured to establish the input palette representation at least in part by converting a block palette to a smaller pixelated tile palette representation. In these embodiments, the processor component may be configured to create a random color tile without spatial dependence at least in part by sampling random pixels in the block palette for each pixel in the palette representation being created to convert the block palette to the palette representation. In some examples, the palette representation may comprise a 16x16 pixelated "tile" palette. The palette representation may have a resolution of at least four pixels. Thus, the palette representation may have a resolution of at least 8 pixels by 8 pixels. In some examples, the palette representation has a resolution of no more than 128 by 128 (128 x 128) pixels. The intensity parameter may be at least 0.9. The intensity parameter determines how different the output image is from the input image in the SD model, the higher the intensity is, the greater the deviation is indicated. In an example implementation, the processor component may be configured to add at least one line of an image of an object to the input palette to produce an image that is visually similar to the object in the image and has a color that is related to the input palette. In another aspect, a method includes training a Stable Diffusion (SD) model to generate an image of at least one desired color at least in part by inputting a tile representation of an input palette to the SD model, wherein the tile representation has a resolution between 8x8 pixels and 256x256 pixels. The method includes inputting at least a portion of a pilot image to the SD model along with the tile representation and using an intensity value between 0.9 and 1. The method includes rendering an image output by the SD model at least partially in response to the tile representation, the portion of the pilot image, and the intensity value. In another aspect, an apparatus includes at least one computer storage medium that is not a transient signal and further includes instructions executable by at least one processor component to input a color representation as an input to a Stable Diffusion (SD) model. The instructions are executable to receive input text of the SD model and render an image output by the SD model in response to the input text conforming to the input color. The details of the present disclosure, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which: Drawings FIG. 1 is a block diagram of an example system including examples inconsistent with the principles of the present invention; FIG. 2 illustrates example logic in an example flow chart format consistent with principles of the invention; FIG. 3 illustrates an example block palette and resulting image; FIG. 4 illustrates an example pixelated tile palette representation and resulting image; FIG. 5 illustrates three example pixelated tile palette representations; FIG. 6 illustrates adding a guide image or portion thereof in an example palette; FIG. 7 illustrates the input of a leading image line with a pale