CN-122003684-A - Training targets for distillation guided diffusion models

CN122003684ACN 122003684 ACN122003684 ACN 122003684ACN-122003684-A

Abstract

A method for training a diffusion model includes compressing the diffusion model by removing at least one of one or more model parameters or one or more gigamultiply-add operations (GMACs). The method also includes performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining the conditional output and the unconditional output from the respective teacher model. The method further includes performing a step-wise distillation on the compressed diffusion model after the pilot conditioning.

Inventors

R. Gary Pali
S. M. Boser
J.Zheng
HOU QIQI
S. Kadabi
M. Hayat
F. M. Policles

Assignees

高通股份有限公司

Dates

Publication Date: 20260508
Application Date: 20240829
Priority Date: 20231023

Claims (20)

1. An apparatus for training a diffusion model, the apparatus comprising: one or more processors, and One or more memories coupled with the one or more processors and storing instructions that, when executed by the one or more processors, are operable to cause the apparatus to: compressing the diffusion model by removing at least one of one or more model parameters or one or more gigamultiply add operations (GMAC); Performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining conditional and unconditional outputs from the respective teacher model, and Step-wise distillation is performed on the compressed diffusion model after the pilot conditioning.
2. The apparatus of claim 1, wherein: the conditional output is based on text strings received at the first teacher model, and The unconditional output is based on the null string received at the second teacher model.
3. The apparatus of claim 1, wherein: the compressed diffusion model receives boot values from a boot-strap embedding during the boot-strap conditioning, and The pilot value modulates the output of the compressed diffusion model.
4. The apparatus of claim 1, wherein: The step number distillation includes two sequential teacher models and a compressed diffusion model, and The step number distillation includes one step of distilling two steps associated with the two sequential teacher models into a compressed diffusion model.
5. The apparatus of claim 1, wherein execution of the instructions further causes the apparatus to perform epsilon-to-velocity conversion prior to compressing the diffusion model.
6. The apparatus of claim 1, wherein the diffusion model comprises a UNet architecture.
7. A method for training a diffusion model, the method comprising: compressing the diffusion model by removing at least one of one or more model parameters or one or more gigamultiply add operations (GMAC); Performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining conditional and unconditional outputs from the respective teacher model, and Step-wise distillation is performed on the compressed diffusion model after the pilot conditioning.
8. The method of claim 7, wherein: the conditional output is based on text strings received at the first teacher model, and The unconditional output is based on the null string received at the second teacher model.
9. The method of claim 7, wherein: the compressed diffusion model receives boot values from a boot-strap embedding during the boot-strap conditioning, and The pilot value modulates the output of the compressed diffusion model.
10. The method of claim 7, wherein: The step number distillation includes two sequential teacher models and a compressed diffusion model, and The step number distillation includes one step of distilling two steps associated with the two sequential teacher models into a compressed diffusion model.
11. The method of claim 7, further comprising performing an epsilon-to-velocity conversion prior to compressing the diffusion model.
12. The method of claim 7, wherein the diffusion model comprises a UNet architecture.
13. An apparatus for training a diffusion model, the apparatus comprising: Means for compressing the diffusion model by removing at least one of one or more model parameters or one or more gigamultiply-add operations (GMAC); means for performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining conditional and unconditional outputs from the respective teacher model, and Means for performing step-wise distillation on the compressed diffusion model after the guiding conditioning.
14. The apparatus of claim 13, wherein: the conditional output is based on text strings received at the first teacher model, and The unconditional output is based on the null string received at the second teacher model.
15. The apparatus of claim 13, wherein: the compressed diffusion model receives boot values from a boot-strap embedding during the boot-strap conditioning, and The pilot value modulates the output of the compressed diffusion model.
16. The apparatus of claim 13, wherein: The step number distillation includes two sequential teacher models and a compressed diffusion model, and The step number distillation includes one step of distilling two steps associated with the two sequential teacher models into a compressed diffusion model.
17. The apparatus of claim 13, further comprising means for performing an epsilon-to-velocity conversion prior to compressing the diffusion model.
18. The apparatus of claim 13, wherein the diffusion model comprises a UNet architecture.
19. A non-transitory computer readable medium having program code recorded thereon for training a diffusion model, the program code being executed by a processor and comprising: program code for compressing the diffusion model by removing at least one of one or more model parameters or one or more gigamultiply-add operations (GMAC); Program code for performing a guide conditioning to train the compressed diffusion model, the guide conditioning combining conditional and unconditional outputs from respective teacher models, and Program code for performing step-wise distillation on the compressed diffusion model after the pilot conditioning.
20. The non-transitory computer-readable medium of claim 19, wherein: the conditional output is based on text strings received at the first teacher model, and The unconditional output is based on the null string received at the second teacher model.

Description

Training targets for distillation guided diffusion models Cross Reference to Related Applications The present application claims priority from U.S. patent application Ser. No. 18/492,492, filed on 10/23 2023, and entitled "TRAINING OBJECTIVES FOR DISTILLING GUIDED DIFFUSION MODELS (training target for distillation-guided diffusion model)", the disclosure of which is expressly incorporated by reference in its entirety. Technical Field Aspects of the present disclosure relate generally to improved training goals for distillation guided diffusion models. Background An artificial neural network may include an interconnected set of artificial neurons (e.g., a neuron model). An Artificial Neural Network (ANN) may be a computing device or a method represented to be performed by a computing device. Convolutional Neural Networks (CNNs) are one type of feedforward ANN. The convolutional neural network may include a set of neurons, where each neuron has a receptive field and commonly spells out an input space. Convolutional neural networks, such as deep convolutional neural networks (DCNs), have numerous applications. In particular, these neural network architectures are used for various technologies such as image recognition, speech recognition, acoustic scene classification, keyword retrieval, autopilot, and other classification tasks. In machine learning and data generation, diffusion refers to a method of generating a model for transforming data by a reversible transformation sequence. These generative models may be referred to as diffusion models. During the diffusion process, the diffusion model begins with a distribution (typically a gaussian distribution) and gradually transforms the data into the desired data distribution, thereby facilitating tasks such as image synthesis and denoising. Diffusion models require significant computational resources such as power, memory, and/or processor load, resulting in a tradeoff between training time and quality of the generated data. Disclosure of Invention Some aspects of the present disclosure relate to a method for training a diffusion model, the method comprising compressing the diffusion model by removing one or more model parameters and/or one or more giga-multiply-add operations (GMACs). The method further includes performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining the conditional output and the unconditional output from the respective teacher model. The method also includes performing a step-wise distillation on the compressed diffusion model after the pilot conditioning. Some other aspects of the disclosure relate to an apparatus comprising means for compressing a diffusion model by removing one or more model parameters and/or one or more GMACs. The apparatus further includes means for performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining the conditional output and the unconditional output from the respective teacher model. The apparatus also includes means for performing step-wise distillation on the compressed diffusion model after the pilot conditioning. In some other aspects of the present disclosure, a non-transitory computer readable medium having non-transitory program code recorded thereon is disclosed. The program code is executed by the processor and includes program code to compress the diffusion model by removing one or more model parameters and/or one or more GMACs. The program code also includes program code to perform a guide-conditioning to train the compressed diffusion model, the guide-conditioning combining the conditional output and the unconditional output from the respective teacher model. The program code also includes program code to perform step-wise distillation on the compressed diffusion model after the boot-strap conditioning. Additionally, some other aspects of the disclosure relate to an apparatus having one or more processors and one or more memories coupled with the one or more processors and storing instructions that, when executed by the one or more processors, are operable to cause the apparatus to compress a diffusion model by removing one or more model parameters and/or one or more GMACs. Execution of the instructions also causes the apparatus to perform a guided conditioning to train the compressed diffusion model, the guided conditioning combining the conditional output and the unconditional output from the respective teacher model. Execution of the instructions further causes the apparatus to perform step-number distillation on the compressed diffusion model after the boot-strap conditioning. In some aspects of the disclosure, a method for training a diffusion model includes randomly selecting a teacher model of a set of teacher models for each iteration of a step-wise distillation training process. The method further includes applying a cropped input space within the step-wise distillation of the randomly select