CN-122003683-A - Control neural network reasoning and training based on distillation guided diffusion model

CN122003683ACN 122003683 ACN122003683 ACN 122003683ACN-122003683-A

Abstract

A method for training a control neural network includes initializing a baseline diffusion model for training the control neural network, each stage of a control neural network training pipeline corresponding to an element of the baseline diffusion model. The method also includes training the control neural network in a step-wise manner, each stage of the control neural network training pipeline receiving input from a previous stage of the control neural network training pipeline and corresponding elements of the diffusion model.

Inventors

R. Gary Pali
S. M. Boser
J.Zheng
HOU QIQI
S. Kadabi
M. Hayat
F. M. Policles

Assignees

高通股份有限公司

Dates

Publication Date: 20260508
Application Date: 20240829
Priority Date: 20231023

Claims (20)

1. An apparatus for training a control neural network, the apparatus comprising: one or more processors, and One or more memories coupled with the one or more processors and storing instructions that, when executed by the one or more processors, are operable to cause the apparatus to: Initializing a baseline diffusion model for training the control neural network, each stage of a control neural network training pipeline corresponding to an element of the baseline diffusion model, and The control neural network is trained in a step-wise manner, each stage of the control neural network training pipeline receiving input from a previous stage of the control neural network training pipeline and corresponding elements of the diffusion model.
2. The apparatus of claim 1, wherein the controlling a neural network training pipeline comprises: a control neural network architecture compression stage corresponding to a compressed UNet architecture of the baseline diffusion model; A guide conditioning stage corresponding to a guide conditioning chemo-model of the baseline diffusion model, and A step number distillation stage corresponding to a student model of step number distillation of the baseline diffusion model.
3. The apparatus of claim 1, wherein weights and parameters of the baseline diffusion model are maintained during the training of the control neural network.
4. The apparatus of claim 1, wherein the control neural network is trained to simulate behavior of the baseline diffusion model in a single forward pass.
5. The apparatus of claim 1, wherein: the control neural network receives an auxiliary input and a first input received at a baseline model; The control neural network generating a first output based on receiving the first input and the auxiliary input, and The first output modulates a second output of the baseline diffusion model.
6. The apparatus of claim 1, wherein the baseline diffusion model is trained prior to training the control neural network.
7. The apparatus of claim 1, wherein the baseline diffusion model training pipeline comprises at least a compression stage, a guide conditioning stage, and a step number distillation stage.
8. A method for training a control neural network, the method comprising: Initializing a baseline diffusion model for training the control neural network, each stage of a control neural network training pipeline corresponding to an element of the baseline diffusion model, and The control neural network is trained in a step-wise manner, each stage of the control neural network training pipeline receiving input from a previous stage of the control neural network training pipeline and corresponding elements of the diffusion model.
9. The method of claim 8, wherein the controlling a neural network training pipeline comprises: a control neural network architecture compression stage corresponding to a compressed UNet architecture of the baseline diffusion model; A guide conditioning stage corresponding to a guide conditioning chemo-model of the baseline diffusion model, and A step number distillation stage corresponding to a student model of step number distillation of the baseline diffusion model.
10. The method of claim 8, wherein weights and parameters of the baseline diffusion model are maintained during the training of the control neural network.
11. The method of claim 8, wherein the control neural network is trained to simulate the behavior of the baseline diffusion model in a single forward pass.
12. The method according to claim 8, wherein: the control neural network receives an auxiliary input and a first input received at a baseline model; The control neural network generating a first output based on receiving the first input and the auxiliary input, and The first output modulates a second output of the baseline diffusion model.
13. The method of claim 8, wherein the baseline diffusion model is trained prior to training the control neural network.
14. The method of claim 8, wherein the baseline diffusion model training pipeline includes at least a compression stage, a pilot conditioning stage, and a step number distillation stage.
15. A non-transitory computer readable medium having program code recorded thereon for training a control neural network, the program code being executed by a processor and comprising: program code for initializing a baseline diffusion model for training the control neural network, each stage of a control neural network training pipeline corresponding to an element of the baseline diffusion model, and Program code for training the control neural network in a stepwise manner, each stage of the control neural network training pipeline receiving input from a previous stage of the control neural network training pipeline and corresponding elements of the diffusion model.
16. The non-transitory computer-readable medium of claim 15, wherein the controlling a neural network training pipeline comprises: a control neural network architecture compression stage corresponding to a compressed UNet architecture of the baseline diffusion model; A guide conditioning stage corresponding to a guide conditioning chemo-model of the baseline diffusion model, and A step number distillation stage corresponding to a student model of step number distillation of the baseline diffusion model.
17. The non-transitory computer-readable medium of claim 15, wherein weights and parameters of the baseline diffusion model are maintained during the training of the control neural network.
18. The non-transitory computer-readable medium of claim 15, wherein the control neural network is trained to simulate behavior of the baseline diffusion model in a single forward pass.
19. The non-transitory computer-readable medium of claim 15, wherein: the control neural network receives an auxiliary input and a first input received at a baseline model; The control neural network generating a first output based on receiving the first input and the auxiliary input, and The first output modulates a second output of the baseline diffusion model.
20. The non-transitory computer readable medium of claim 15, wherein the baseline diffusion model is trained prior to training the control neural network.

Description

Control neural network reasoning and training based on distillation guided diffusion model Cross Reference to Related Applications The present application claims priority from U.S. patent application Ser. No. 18/492,529, filed on 10/23 2023, entitled "CONTROL NEURAL NETWORK INFERENCE AND TRAINING BASED ON DISTILLED GUIDED DIFFUSION MODELS (control neural network reasoning and training based on distillation-guided diffusion model)", the disclosure of which is expressly incorporated by reference in its entirety. Technical Field Aspects of the present disclosure relate generally to improving training and reasoning for controlling neural networks by incorporating a distillation-guided diffusion model. Background An artificial neural network may include an interconnected set of artificial neurons (e.g., a neuron model). An Artificial Neural Network (ANN) may be a computing device or a method represented to be performed by a computing device. Convolutional Neural Networks (CNNs) are one type of feedforward ANN. The convolutional neural network may include a set of neurons, where each neuron has a receptive field and commonly spells out an input space. Convolutional neural networks, such as deep convolutional neural networks (DCNs), have numerous applications. In particular, these neural network architectures are used for various technologies such as image recognition, speech recognition, acoustic scene classification, keyword retrieval, autopilot, and other classification tasks. In machine learning and data generation, diffusion refers to a method of generating a model for transforming data by a reversible transformation sequence. These generative models may be referred to as diffusion models. During the diffusion process, the diffusion model begins with a distribution (typically a gaussian distribution) and gradually transforms the data into the desired data distribution, thereby facilitating tasks such as image synthesis and denoising. Diffusion models require significant computational resources such as power, memory, and/or processor load, resulting in a tradeoff between training time and quality of the generated data. Disclosure of Invention Some aspects of the present disclosure relate to a method for training a diffusion model, the method comprising compressing the diffusion model by removing one or more model parameters and/or one or more giga-multiply-add operations (GMACs). The method further includes performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining the conditional output and the unconditional output from the respective teacher model. The method also includes performing a step-wise distillation on the compressed diffusion model after the pilot conditioning. Some other aspects of the disclosure relate to an apparatus comprising means for compressing a diffusion model by removing one or more model parameters and/or one or more GMACs. The apparatus further includes means for performing a guided conditioning to train the compressed diffusion model, the guided conditioning combining the conditional output and the unconditional output from the respective teacher model. The apparatus also includes means for performing step-wise distillation on the compressed diffusion model after the pilot conditioning. In some other aspects of the present disclosure, a non-transitory computer readable medium having non-transitory program code recorded thereon is disclosed. The program code is executed by the processor and includes program code to compress the diffusion model by removing one or more model parameters and/or one or more GMACs. The program code also includes program code to perform a guide-conditioning to train the compressed diffusion model, the guide-conditioning combining the conditional output and the unconditional output from the respective teacher model. The program code also includes program code to perform step-wise distillation on the compressed diffusion model after the boot-strap conditioning. Additionally, some other aspects of the disclosure relate to an apparatus having one or more processors and one or more memories coupled with the one or more processors and storing instructions that, when executed by the one or more processors, are operable to cause the apparatus to compress a diffusion model by removing one or more model parameters and/or one or more GMACs. Execution of the instructions also causes the apparatus to perform a guided conditioning to train the compressed diffusion model, the guided conditioning combining the conditional output and the unconditional output from the respective teacher model. Execution of the instructions further causes the apparatus to perform step-number distillation on the compressed diffusion model after the boot-strap conditioning. In some aspects of the disclosure, a method for training a diffusion model includes randomly selecting a teacher model of a set of teacher models for each iteration of a step-wi