CN-122029572-A - Sketch-to-image pipeline with automatic hinting
Abstract
This document describes techniques associated with sketch-to-image pipelines with automatic hints. These techniques include a method of generating an image in a two-stage process using an image-to-text model and a text-to-image model. The input image, which may be a sketch drawn, is processed to automatically generate a description of the content. The automatically generated description is combined with the style template to form the hint. The hint is used as an input of text to the image model. The input image is also used as an input of text to the image model to increase similarity to the input image. This approach simplifies image generation by eliminating the need for users of the pipeline to type prompts (while giving them the option to do so if they wish) and at the same time giving them control over how close they wish to stay to the input sketch.
Inventors
- William Roger Osborne
- Mick Clou Kose
- Mika Petrie Lanto
- Mora Elizabeth. Lynch
Assignees
- 谷歌有限责任公司
Dates
- Publication Date
- 20260512
- Application Date
- 20250505
- Priority Date
- 20240513
Claims (15)
- 1. A method for sketching a to image pipeline, the method comprising: Receiving a sketch drawn by a user via a drawing application on an electronic device; automatically generating user-editable text describing the sketch using an image recognition model; Receiving a user-selected style defining a desired style of the output image; Constructing a prompt based on the user-editable text and the user-selected style; detecting edge information corresponding to the sketch; Generating a plurality of stylized output images based on the edge information and the hints using an image generation model, and A subset of the plurality of stylized output images is provided for display and user selection.
- 2. The method of claim 1, further comprising preprocessing the sketch prior to automatically building the user-editable text.
- 3. The method of claim 2, wherein the preprocessing comprises applying at least one of gray scale processing, color removal, color correction, resizing, cropping, feature extraction, segmentation, or image compression.
- 4. The method of any of claims 1-3, wherein the image recognition model and the image generation model are artificial intelligence AI-based models.
- 5. The method of any of claims 1-4, further comprising applying one or more filters to prevent illicit content from being recognized by the image recognition model or generated by the image generation model.
- 6. The method of any of claims 1-5, wherein the image generation model comprises a plurality of text-to-image diffusion models for controlling various input conditions and for fine tuning the image generation model to generate the output image according to the user-selected style.
- 7. The method of any one of claims 1 to 6, wherein: the edge information corresponding to edges of one or more objects and elements in the sketch, and Generating the plurality of stylized output images includes generating the plurality of stylized output images each having edges similar to the edges of the one or more objects and elements in the sketch such that each of the plurality of stylized images is similar to the sketch.
- 8. The method of any one of claims 1 to 7, further comprising: receiving a user input, the user input selecting an output image from a displayed subset of the plurality of output images, and Causing the selected output image to be automatically inserted into an application running on the electronic device separate from the drawing application.
- 9. The method of any one of claims 1 to 7, further comprising: receiving a user input, the user input selecting an output image from a displayed subset of the plurality of output images, and The selected output image is used as a new input to the sketch-to-image pipeline.
- 10. The method of any of claims 1 to 9, further comprising processing one or more of the plurality of stylized output images using one or more image post-processing operations.
- 11. The method of claim 10, wherein the one or more post-processing operations comprise at least one of color enhancement, color removal, contrast enhancement, noise reduction, sharpening, image compression, image restoration, background removal, magnification, blending, or cropping.
- 12. An electronic device, comprising: a display device for displaying content; One or more processors configured to perform any of the methods of claims 1-11 using at least an image recognition model and an image generation model.
- 13. The electronic device of claim 12, wherein the one or more processors are further configured to execute the drawing application to generate the sketch drawn by the user via the drawing application.
- 14. The electronic device of claim 12 or claim 13, wherein the one or more processors are configured to implement a preprocessing operation that applies one or more operations to the sketch prior to automatically generating the user-editable text.
- 15. A computer program product comprising computer executable instructions which, when executed by a computing device, cause the computing device to perform any of the methods of claims 1 to 11.
Description
Sketch-to-image pipeline with automatic hinting Background Generated Artificial Intelligence (AI) models have been developed to assist users in creating images, such as text-to-image models, which can be based on written descriptions to create images. However, it may be difficult for some users to literally specify the specific details of the final image they contemplate, such as the specific manner in which the cat's tail curls, the specific shape of the flower petals, the specific manner in which the butterfly wings are colored, etc. If there is no such concrete expression in the description, the final image created by the text-to-image model may not be desired by the user, which results in poor user experience on the one hand, and further instructions/commands for image creation from the user/user device to the AI model on the other hand, which at least places a burden on the network capacity required to transmit the image creation related instructions/commands from the user/user device to the AI model and the image created by the AI model from the AI model to the user/user device, and on the processing resources required for the AI model to repeatedly create the image. Disclosure of Invention This document describes techniques associated with sketch-to-image pipelines with automatic hints. These techniques include a method of generating an image in a two-stage process using an image-to-text model and a text-to-image model. The input image (which may be a sketch drawn) is processed to automatically generate a description of the content. The automatically generated description is combined with the style template to form the hint. The hint is an instruction/command issued to the underlying computer system and used as input of text to the image model. The input image is also used as an input of text to the image model to increase similarity to the input image. This approach simplifies image generation by eliminating the need for users of the pipeline to type prompts (while giving them the option to do so if they wish) and at the same time giving them control over how close they wish to stay to the input sketch. This summary is provided to introduce simplified concepts of sketching to an image pipeline with automatic hinting that are further described below in the detailed description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used to determine the scope of the claimed subject matter. Drawings Details of one or more aspects of a sketch-to-image pipeline with automatic hinting are described in this document with reference to the accompanying drawings. The same reference numbers are used throughout the drawings to reference like features and components: FIG. 1 illustrates an example environment in which aspects of a sketch-to-image pipeline with automatic hints may be implemented; FIG. 2 illustrates an example implementation of an electronic device in which automated management may be implemented; FIG. 3 illustrates an example pipeline of a sketch-to-image pipeline with automatic hints; FIG. 4 illustrates an example of sketch-to-image pipeline inputs and outputs with automatic hints as disclosed herein; FIG. 5 depicts an example method for a sketch-to-image pipeline with automatic hints; FIG. 6 illustrates an example wireless network apparatus that can be implemented in accordance with one or more aspects of the sketch-to-image pipeline with automatic hints described herein; FIG. 7 illustrates an example system including example apparatus implementing aspects of a sketch-to-image pipeline with automatic hints as described with reference to FIGS. 1-6 previously; FIG. 8 illustrates an example trainer for training a Large Language Model (LLM) for a sketch-to-image pipeline with automatic hinting; FIG. 9 illustrates an example of a generic transducer; FIG. 10 illustrates an example transformation of input tensor components in a language space; FIG. 11 illustrates an example of a fine-tuning (FT) trainer, and Fig. 12 illustrates an example of prompt engineering. Detailed Description SUMMARY Many generated Artificial Intelligence (AI) models have been developed to assist users in creating images, such as text-to-image models that can create images based on written descriptions, or image-to-image models that require extensive user input to define numerous settings and parameters. Many users may be overwhelmed with tools that require a large amount of user input. Further, without an accurate representation, the final image generated by the tool may not be desired by the user, which may result in an poor user experience on the one hand, and further instructions/commands for image creation from the user/user device to the AI model may be required on the other hand, which burden at least the network capacity required to transmit image creation related instructions/commands from the user/user device to the AI model and to transmit AI model-created images