US-20260127903-A1 - QUALIFYING LABELS AUTOMATICALLY ATTRIBUTED TO CONTENT IN IMAGES

US20260127903A1US 20260127903 A1US20260127903 A1US 20260127903A1US-20260127903-A1

Abstract

A method for image generation. The method including identifying a plurality of features of an image. The method including classifying each of the plurality of features using an artificial intelligence (AI) model trained to identify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels, wherein the image is provided as input to the AI model. The method including receiving feedback for a label, wherein the feedback is associated with a user. The method including modifying a label based on the feedback. The method including updating the plurality of labels with the label that is modified. The method including providing as input the plurality of labels that is updated into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image.

Inventors

Arran Green

Assignees

SONY INTERACTIVE ENTERTAINMENT INC.

Dates

Publication Date: 20260507
Application Date: 20251230

Claims (20)

1 . A method for image generation, the method comprising: identifying a plurality of features of an image; classifying the plurality of features using an artificial intelligence (AI) model trained to classify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels; generating a representation of the plurality of labels, wherein the representation is formatted as a hierarchy comprising a first label representing an object and a second label representing a sub-feature nested within the first label; presenting the representation in a user interface on a display; receiving, via the user interface, a user input selecting the second label within the representation and defining a modification to the second label; updating the plurality of labels based on the modification to the second label to generate an updated set of labels; and providing, as input, the updated set of labels into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image.
2 . The method of claim 1 , wherein the hierarchy comprises a hierarchical tree of labeled features, and wherein the second label is nested relative to the first label within the hierarchical tree.
3 . The method of claim 1 , wherein the sub-feature represented by the second label corresponds to a surface property of the object represented by the first label.
4 . The method of claim 1 , wherein receiving the user input defining the modification comprises receiving a change to a characteristic of the sub-feature from a first attribute to a second attribute.
5 . The method of claim 1 , wherein updating the plurality of labels further comprises: identifying a third label within the plurality of labels that is contextually dependent on the second label; and automatically modifying the third label to maintain semantic consistency with the modification to the second label.
6 . The method of claim 5 , wherein the second label represents an object type and the third label represents an environmental surface interacting with the object type.
7 . The method of claim 1 , wherein the image generation artificial intelligence system is configured to regenerate a region of the image corresponding to the second label while maintaining pixel integrity of a remainder of the image corresponding to the first label.
8 . The method of claim 1 , further comprising: displaying the updated image on the display concurrently with the representation formatted as the hierarchy; and highlighting a region of the updated image corresponding to the second label to indicate that the sub-feature is undergoing modification.
9 . The method of claim 1 , wherein the user interface is configured to receive the modification for the second label while maintaining the first label in an unmodified state.
10 . A system comprising: one or more processors; and a non-transitory computer-readable medium containing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: identifying a plurality of features of an image; classifying the plurality of features using an artificial intelligence (AI) model trained to classify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels; generating a representation of the plurality of labels, wherein the representation is formatted as a hierarchy comprising a first label representing an object and a second label representing a sub-feature nested within the first label; presenting the representation in a user interface on a display; receiving, via the user interface, a user input selecting the second label within the representation and defining a modification to the second label; updating the plurality of labels based on the modification to the second label to generate an updated set of labels; and providing, as input, the updated set of labels into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image.
11 . The system of claim 10 , wherein the representation is a hierarchical file system listing the plurality of labels as nested elements.
12 . The system of claim 10 , wherein the operations further comprise: analyzing the modification to the second label using a context analyzer; and automatically updating a third label in the hierarchy based on the context analyzer determining that the third label conflicts with the modification to the second label.
13 . The system of claim 10 , wherein defining the modification comprises providing a natural language descriptor into a text field associated with the second label in the hierarchy.
14 . The system of claim 10 , wherein the hierarchy includes a root level representing a scene context, a parent level representing the object, and a child level representing the sub-feature.
15 . The system of claim 10 , wherein the image generation artificial intelligence system utilizes a latent diffusion model to synthesize the updated image based on text embeddings derived from the updated set of labels.
16 . A non-transitory computer-readable medium containing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: identifying a plurality of features of an image; classifying the plurality of features using an artificial intelligence (AI) model trained to classify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels; generating a representation of the plurality of labels, wherein the representation is formatted as a hierarchy comprising a first label representing an object and a second label representing a sub-feature nested within the first label; presenting the representation in a user interface on a display; receiving, via the user interface, a user input selecting the second label within the representation and defining a modification to the second label; updating the plurality of labels based on the modification to the second label to generate an updated set of labels; and providing, as input, the updated set of labels into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image.
17 . The non-transitory computer-readable medium of claim 16 , wherein the hierarchy visually distinguishes between labels representing physical objects and labels representing stylistic attributes.
18 . The non-transitory computer-readable medium of claim 16 , wherein the operations further comprise: detecting a conflict between the modification to the second label and a third label within the hierarchy; automatically modifying the third label to resolve the conflict; and displaying the third label that is automatically modified in the user interface.
19 . The non-transitory computer-readable medium of claim 16 , wherein the sub-feature comprises a component part of the object.
20 . The non-transitory computer-readable medium of claim 16 , wherein the image is an initial image generated by the image generation artificial intelligence system based on an initial prompt.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/171,256, filed Feb. 17, 2023, the content of which is herein incorporated by reference in its entirety for all purposes. TECHNICAL FIELD The present disclosure is related to image generation, such as during development of a video game, and more specifically to enabling modification to an image using an image generation artificial intelligence model implementing latent diffusion techniques based on editing of one or more labeled features of the image. BACKGROUND OF THE DISCLOSURE Video games and/or gaming applications and their related industries (e.g., video gaming) are extremely popular and represent a large percentage of the worldwide entertainment market. Development of video games involves the generation of one or more images in sequential order, wherein the images are set within a scene of a particular video game. Developing each scene may involve many steps to identify and create objects within the scene, and may further involve movement of the objects within the scene throughout multiple images. Each object may include intricate characteristics that define that object. For example, characteristics of an object may include size, shape, color, surface makeup, etc. After an image or images of a scene have been developed, making changes to the characteristics of an object may also involve many intricate steps. For instance, the developer may have to individually change parameters of characteristics of an object that are changed, which can be time consuming, especially when those parameters are not readily available. Further, the change reflected in one object may not be consistent with other objects within the scene that is not changed. Additionally, it is difficult to make wholesale changes to a scene without redrawing the image from scratch. For example, wholesale changes to a scene may include changing an entire environment of the scene or changing a characteristic of the environment that would affect the entire scene or all the objects within the scene. In those cases, making a change to the environment would require redeveloping one or more images for the scene. It is in this context that embodiments of the disclosure arise. SUMMARY Embodiments of the present disclosure relate to image generation, such as during development of a video game, wherein a modified image is generated using an artificial intelligence (AI) model, such as an image generation artificial intelligence (IGAI) model, implementing latent diffusion techniques. More specifically, modifications to an image are enabled via editing of labeled features of the image, wherein the modified image is generated using an IGAI model implementing latent diffusion based on the labeled features of the image that are modified and/or unmodified. In one embodiment, a method for image generation is disclosed. The method including identifying a plurality of features of an image. The method including classifying each of the plurality of features using an artificial intelligence (AI) model trained to classify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels, wherein the image is provided as input to the AI model. The method including receiving feedback for a label, wherein the feedback is associated with a user. The method including modifying a label based on the feedback. The method including updating the plurality of labels with the label that is modified. The method including providing as input the plurality of labels that is updated into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image. In another embodiment, a non-transitory computer-readable medium storing a computer program for implementing a method is disclosed. The computer-readable medium including program instructions for identifying a plurality of features of an image. The computer-readable medium including program instructions for classifying each of the plurality of features using an artificial intelligence (AI) model trained to classify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels, wherein the image is provided as input to the AI model. The computer-readable medium including program instructions for receiving feedback for a label, wherein the feedback is associated with a user. The computer-readable medium including program instructions for modifying a label based on the feedback. The computer-readable medium including program instructions for updating the plurality of labels with the label that is modified. The computer-readable medium including program instructions for providing as input the plurality of labels that is updated into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image. In still another embodimen