US-20260127718-A1 - BLENDED OUTPUT GENERATION IN GENERATIVE ARTIFICIAL INTELLIGENCE MODELS

US20260127718A1US 20260127718 A1US20260127718 A1US 20260127718A1US-20260127718-A1

Abstract

Techniques and apparatus for generating content with multiple specified attributes using a generative artificial intelligence model are described. An example method generally includes receiving a request to generate an output of a machine learning model, the request specifying a plurality of attributes of the output of the machine learning model. A set of intermediate outputs is generated via a plurality of adapters of the machine learning model. Each respective adapter of the plurality of adapters may be associated with a respective attribute of the specified plurality of attributes and include a respective mask in a low-rank dimension associated with the respective adapter. The set of intermediate outputs is merged into a combined output of the plurality of adapters of the machine learning model, and the output of the machine learning model is generated based on the combined output of the plurality of adapters.

Inventors

Aniket ROY
Shweta Mahajan
Shubhankar Mangesh BORSE
Shreya KADAMBI
Ankita NAYAK
Risheek GARREPALLI
Hyojin Park
Debasmit DAS
Munawar HAYAT
Fatih Murat PORIKLI

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260507
Application Date: 20241107

Claims (20)

1 . A processing system for machine learning, comprising: at least one memory having executable instructions stored thereon; and one or more processors configured to execute the executable instructions in order to cause the processing system to: receive a request to generate an output of a machine learning model, the request specifying a plurality of attributes of the output of the machine learning model; generate a set of intermediate outputs via a plurality of adapters of the machine learning model, each respective adapter of the plurality of adapters being associated with a respective attribute of the specified plurality of attributes and including a respective mask in a low-rank dimension associated with the respective adapter; merge the set of intermediate outputs into a combined output of the plurality of adapters of the machine learning model; and generate the output of the machine learning model based on the combined output of the plurality of adapters.
2 . The processing system of claim 1 , wherein the output of the machine learning model is an image output and wherein the plurality of attributes includes: an object to be depicted in the image output generated by the machine learning model; and a style of the image output generated by the machine learning model.
3 . The processing system of claim 1 , wherein the machine learning model comprises a model trained based on one or more of a content loss, a style loss, or a scaling factor associated with a similarity term.
4 . The processing system of claim 1 , wherein the respective mask associated with the respective adapter comprises a weighting vector with learnable weight values in the low-rank dimension and wherein a masked output associated with a first adapter of the plurality of adapters is orthogonal to a masked output associated with a second adapter of the plurality of adapters.
5 . The processing system of claim 1 , wherein the set of intermediate outputs comprises a plurality of images, each image of the plurality of images corresponding to an image conforming to an attribute from the specified plurality of attributes.
6 . The processing system of claim 1 , wherein a first adapter of the plurality of adapters is biased to operating on earlier layers over later layers in the machine learning model and a second adapter of the plurality of adapters is biased to operating on later layers over earlier layers of the machine learning model.
7 . The processing system of claim 1 , wherein each respective mask is based on a loss based on a rank constraint and a sparsity constraint.
8 . The processing system of claim 1 , wherein: a rank associated with a first attribute of the specified plurality of attributes exceeds a rank associated with a second attribute of the specified plurality of attributes in a first plurality of layers of the machine learning model; and a rank associated with the second attribute of the specified plurality of attributes exceeds a rank associated with the first attribute of the specified plurality of attributes in a second plurality of layers of the machine learning model, the second plurality of layers being layers subsequent to the first plurality of layers.
9 . The processing system of claim 1 , wherein: the plurality of adapters of the machine learning model comprise adapters configured based on a cycle-consistency loss between a first attribute of the plurality of attributes and a second attribute of the plurality of attributes; and a cycle associated with the cycle-consistency loss comprises an application and a removal of an attribute from the specified plurality of attributes to data complying with another attribute from the specified plurality of attributes.
10 . The processing system of claim 9 , wherein at least one of the adapters has frozen weights and an output mask with learnable weights associated with the cycle-consistency loss.
11 . The processing system of claim 9 , wherein at least one of the adapters has learnable weights associated with the cycle-consistency loss in the low-rank dimension of the at least one of the adapters.
12 . A processing system for machine learning, comprising: at least one memory having executable instructions stored thereon; and one or more processors configured to execute the executable instructions in order to cause the processing system to: receive a first data set associated with a first attribute for which a machine learning model is to be trained; receive a second data set associated with a second attribute for which the machine learning model is to be trained; train a first adapter of the machine learning model to finetune outputs in accordance with the first attribute based on the first data set, the first adapter including a first mask in a low-rank dimension associated with the first adapter; train a second adapter of the machine learning model to finetune outputs in accordance with the second attribute based on the second data set, the second adapter including a second mask in a low-rank dimension associated with the second adapter, such that an output of the first adapter is orthogonal to an output of the second adapter; train a merged adapter comprising the trained first adapter and the trained second adapter; and deploy the machine learning model with the trained merged adapter.
13 . The processing system of claim 12 , wherein the machine learning model and is configured to generate an image output and wherein the plurality of attributes includes: an object to be depicted in the image output generated by the machine learning model; and a style of the image output generated by the machine learning model.
14 . The processing system of claim 12 , wherein the merged adapter is trained based on a loss associated with the first adapter, a loss associated with the second adapter, and a scaling factor associated with a similarity term.
15 . The processing system of claim 14 , wherein the scaling factor is applied to a combination of the first mask and the second mask.
16 . A processor-implemented method for machine learning, comprising: receiving a request to generate an output of a machine learning model, the request specifying a plurality of attributes of the output of the machine learning model; generating a set of intermediate outputs via a plurality of adapters of the machine learning model, each respective adapter of the plurality of adapters being associated with a respective attribute of the specified plurality of attributes and including a respective mask in a low-rank dimension associated with the respective adapter; merging the set of intermediate outputs into a combined output of the plurality of adapters of the machine learning model; and generating the output of the machine learning model based on the combined output of the plurality of adapters.
17 . The method of claim 16 , wherein the output of the machine learning model is an image output and wherein the plurality of attributes includes: an object to be depicted in the image output generated by the machine learning model; and a style of the image output generated by the machine learning model.
18 . The method of claim 16 , wherein the machine learning model comprises a model trained based on one or more of a content loss, a style loss, or a scaling factor associated with a similarity term.
19 . The method of claim 16 , wherein the respective mask associated with the respective adapter comprises a weighting vector with learnable weight values in the low-rank dimension and wherein a masked output associated with a first adapter of the plurality of adapters is orthogonal to a masked output associated with a second adapter of the plurality of adapters.
20 . The method of claim 16 , wherein: a rank associated with a first attribute of the specified plurality of attributes exceeds a rank associated with a second attribute of the specified plurality of attributes in a first plurality of layers of the machine learning model; and a rank associated with the second attribute of the specified plurality of attributes exceeds a rank associated with the first attribute of the specified plurality of attributes in a second plurality of layers of the machine learning model, the second plurality of layers being layers subsequent to the first plurality of layers.

Description

INTRODUCTION Aspects of the present disclosure relate to generative artificial intelligence models. Generative artificial intelligence models can be used in various environments in order to generate a response to an input prompt (also referred to as a query or an input). For example, generative artificial intelligence models can be used in chatbot applications in which large language models (LLMs) are used to generate an answer, or at least a response, to an input prompt. Other examples in which generative artificial intelligence models can be used include a latent diffusion model, in which a model generates an image or stream of images (e.g., video content) from an input text description of the content of the desired image or stream of images, decision transformers, in which future actions are predicted based on sequences of prior actions within a given environment, or the like. Generally, generative artificial intelligence models have many (e.g., millions or billions) of parameters, resulting in models that are large in size and incur a significant computational expense to train the model. Further, once trained, generative artificial intelligence models are often difficult (or impossible) to fine-tune, as the vast number of parameters makes overfitting (where the model fits too closely to the training data, resulting in loss of accuracy and generalization for runtime data) a major challenge (e.g., potentially relying on tremendous amounts of fine-tuning data to prevent overfitting). To allow for generative artificial intelligence models to be fine-tuned or modified, smaller model adapters may be trained for large models. For example, adapters may be trained to improve or enable video generation based on desired appearances, movement, and the like. BRIEF SUMMARY Certain aspects of the present disclosure provide a method for generating content using a generative artificial intelligence model. An example method generally includes receiving a request to generate an output of a machine learning model, the request specifying a plurality of attributes of the output of the machine learning model. A set of intermediate outputs is generated via a plurality of adapters of the machine learning model. Each respective adapter of the plurality of adapters may be associated with a respective attribute of the specified plurality of attributes and include a respective mask in a low-rank dimension associated with the respective adapter. The set of intermediate outputs is merged into a combined output of the plurality of adapters of the machine learning model, and the output of the machine learning model is generated based on the combined output of the plurality of adapters. Certain aspects of the present disclosure provide a method for training a generative artificial intelligence model to generate content. An example method generally includes receiving a first data set associated with a first attribute for which a machine learning model is to be trained and a second data set associated with a second attribute for which the machine learning model is to be trained. A first adapter of the machine learning model is trained to finetune outputs in accordance with the first attribute based on the first data set, the first adapter including a first mask in a low-rank dimension associated with the first adapter. A second adapter of the machine learning model is trained to finetune outputs in accordance with the second attribute based on the second data set, the second adapter including a second mask in a low-rank dimension associated with the second adapter, such that an output of the first adapter is orthogonal to an output of the second adapter. A merged adapter is trained, with the merged adapter comprising the trained first adapter and the trained second adapter. The machine learning model is deployed with the trained merged adapter. Certain aspects of the present disclosure provide a method for generating content using a generative artificial intelligence model. An example method generally includes receiving a request to generate an output of a machine learning model, the request specifying a plurality of attributes for the output of the machine learning model. A set of intermediate outputs is generated via a plurality of adapters of the machine learning model. Generally, each respective adapter of the plurality of adapters is associated with a respective attribute of the specified plurality of attributes and is trained based on a cycle-consistency loss between different attributes of the specified plurality of attributes. The set of intermediate outputs is merged into a combined output of the plurality of adapters of the machine learning model, and the output of the machine learning model is generated based on the combined output of the plurality of adapters. Certain aspects of the present disclosure provide a method for training a generative artificial intelligence model to generate content. An example method generally inclu