CN-122023169-A - Efficient image generation method based on diffusion model

CN122023169ACN 122023169 ACN122023169 ACN 122023169ACN-122023169-A

Abstract

The invention is suitable for the technical field of image generation and provides a high-efficiency image generation method based on a diffusion model, which comprises the following steps of adopting a double-flow neural network architecture and simultaneously training a noise estimation value and a variance estimation value through a maximum likelihood estimation loss function; the invention adopts a maximum likelihood estimation loss function, combines a double-flow network architecture to respectively estimate the mean value and the variance of noise, judges the denoising effect by calculating the KL divergence between adjacent denoising steps, stops denoising when the KL divergence is smaller than a preset threshold value, reduces invalid denoising steps, obviously shortens the generating time on the premise of ensuring the generating quality of the image, and provides a high-efficiency solution for real-time scenes such as image, video generation and the like.

Inventors

WANG XU
KUANG GUOWEN
ZHOU PAN
Zheng Shuoxin
Zou Huisi
LIU JINGYI
DENG MEITING

Assignees

深圳职业技术大学

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (10)

1. An efficient image generation method based on a diffusion model is characterized by comprising the following steps: The training stage comprises the steps of adopting a double-flow neural network architecture, and simultaneously training a noise estimation value and a variance estimation value through a maximum likelihood estimation loss function; and in the generation stage, calculating KL divergence between images generated in adjacent steps in the denoising process, and realizing early stop control of the denoising process according to the KL divergence value.
2. The efficient diffusion model-based image generation method of claim 1, wherein the dual-flow neural network architecture simultaneously estimates noise averages Sum of variances 。
3. The efficient diffusion model-based image generation method according to claim 1, wherein the specific steps of the generation stage include: Setting an image denoising designated step number N, and starting denoising from t=n-1. At each time t, estimating the corresponding by a dual-flow neural network And ; Based on the estimated value Denoising operation is carried out; Calculating KL divergence between adjacent steps ; When (when) When the threshold value is smaller than the preset threshold value the denoising process is terminated in advance.
4. A method of efficient diffusion model-based image generation according to claim 3, characterized in that the estimate ; Wherein, the For a preset coefficient that is time dependent, Is the noise added signal at time t.
5. A method of efficient image generation based on a diffusion model according to claim 3 wherein the maximum likelihood estimation loss function is: ; Or an equivalent form: ; Wherein, the As an estimate of the mean value, As the variance estimate value, the variance is calculated, Is a noise estimate.
6. A method of efficient diffusion model-based image generation according to claim 3, characterized in that the 。
7. The efficient diffusion model-based image generation method according to claim 1, wherein the specific steps of the generation stage include: Setting an image denoising designated step number N, and starting denoising from t=n-1. At each time t, the dual-flow neural network directly estimates the original image And ; Calculation of KL divergence ; And when the KL divergence value is smaller than a preset threshold value, the denoising process is terminated in advance.
8. The efficient diffusion model-based image generation method of claim 7, wherein the loss function is: 。
9. The efficient diffusion model-based image generation method according to claim 1, wherein the calculation of KL-divergence generates a difference in probability distribution of images based on adjacent denoising steps.
10. The efficient image generation method based on the diffusion model according to claim 1, wherein the judgment criterion for the early-stop control is whether a KL divergence value is smaller than a preset constant threshold.

Description

Efficient image generation method based on diffusion model Technical Field The invention relates to the technical field of image generation, in particular to a high-efficiency image generation method based on a diffusion model. Background The sampling process of the diffusion probability model can be considered as a process of step-wise denoising from pure gaussian random variables to obtain clean data, and can be modeled by discretizing the diffusion random differential equation (SDE) or diffusion Ordinary Differential Equation (ODE), which are defined by parameterized noise prediction models or data prediction models. The pilot sampling may be modeled by combining an unconditional model with the pilot model and controlling the pilot strength by super-parameters. The currently widely used guided sampling method is DDIM, which is proved to be a first-order diffusion ODE solver, and usually requires 100 to 250 times of large neural network calculation to converge, so that the calculation cost is high. The existing diffusion model mainly improves efficiency through two types of methods: 1. The sampling algorithm is improved, such as DPM-solver++ and EDM and other special solvers, and the number of steps required for generating is reduced from hundreds to tens or even steps through more accurate track estimation; 2. The model architecture and calculation are optimized, and the potential Diffusion technology adopted by Stable Diffusion is used for denoising in a low-dimensional space, so that the calculation amount is greatly reduced. The above-mentioned method can make the diffusion model maintain high-quality output, and at the same time, its inference speed can be raised by several tens to several hundreds times, so that it can be used for image, video and code generation, etc. However, the above methods do not explicitly establish a relationship between the number of steps and the amount of image information, and cannot scientifically guide early stop decision of the denoising process. Therefore, in view of the above situation, there is an urgent need to provide an efficient image generation method based on a diffusion model, so as to overcome the shortcomings in the current practical application. Disclosure of Invention The invention aims to provide a high-efficiency image generation method based on a diffusion model, which effectively solves the problems in the background technology. The invention is realized in such a way that a high-efficiency image generation method based on a diffusion model comprises the following steps: The training stage comprises the steps of adopting a double-flow neural network architecture, and simultaneously training a noise estimation value and a variance estimation value through a maximum likelihood estimation loss function; and in the generation stage, calculating KL divergence between images generated in adjacent steps in the denoising process, and realizing early stop control of the denoising process according to the KL divergence value. As a further scheme of the invention, the dual-flow neural network architecture estimates the noise mean simultaneouslySum of variances。 As a further scheme of the invention, the specific steps of the generation stage comprise: Setting an image denoising designated step number N, and starting denoising from t=n-1. At each time t, estimating the corresponding by a dual-flow neural networkAnd; Based on the estimated valueDenoising operation is carried out; Calculating KL divergence between adjacent steps ; When (when)When the threshold value is smaller than the preset threshold value the denoising process is terminated in advance. As a further scheme of the invention: The estimated value ; Wherein, the For a preset coefficient that is time dependent,Is the noise added signal at time t. As a further aspect of the present invention, the maximum likelihood estimation loss function is: ; Or an equivalent form: ; Wherein, the As an estimate of the mean value,As the variance estimate value, the variance is calculated,Is a noise estimate. As a further aspect of the invention, the。 As a further scheme of the invention, the specific steps of the generation stage comprise: Setting an image denoising designated step number N, and starting denoising from t=n-1. At each time t, the dual-flow neural network directly estimates the original imageAnd; Calculation of KL divergence; And when the KL divergence value is smaller than a preset threshold value, the denoising process is terminated in advance. As a further aspect of the invention, the loss function is: 。 As a further aspect of the invention, the calculation of the KL divergence generates a difference in probability distribution of the image based on adjacent denoising steps. As a further scheme of the invention, the judgment standard of the early stop control is whether the KL divergence value is smaller than a preset constant threshold value. Compared with the prior art, the invention has the bene