EP-4742146-A1 - DATA PROCESSING METHOD AND APPARATUS

EP4742146A1EP 4742146 A1EP4742146 A1EP 4742146A1EP-4742146-A1

Abstract

A data processing method is provided, and is applied to the field of artificial intelligence. The method includes: obtaining a plurality of first images and random noise, where different first images are denoised images predicted by using a denoising module in a diffusion model at different historical steps; and fusing the plurality of first images and the random noise, to obtain a denoised image at a current step. In this application, when a denoised image is predicted, denoised images that are predicted at different historical steps are fused, so that sampling error can be reduced under same calculation consumption, thereby reducing a quantity of sampling times.

Inventors

XUE, Shuchen
YI, Mingyang
LUO, Weijian
ZHANG, SHIFENG
SUN, Jiacheng

Assignees

Huawei Technologies Co., Ltd.

Dates

Publication Date: 20260513
Application Date: 20240726

Claims (20)

A data processing method, comprising: obtaining a plurality of first images and random noise, wherein different first images are denoised images predicted by using a denoising module in the diffusion model at different historical steps; and fusing the plurality of first images and the random noise, to obtain a denoised image at a current step.
The method according to claim 1, further comprising: determining a first weight corresponding to each historical step, wherein each first weight is related to a value obtained by mapping the historical step using a target mapping method, and the target mapping method is used to determine a magnitude of randomness at each step; and fusing the plurality of first images and the random noise comprises: fusing the plurality of first images and the random noise based on a plurality of first weights.
The method according to claim 2, wherein each first weight is specifically obtained by performing a stochastic Adams method on the value obtained by mapping the historical step using the target mapping method.
The method according to any one of claims 1 to 3, further comprising: determining a second weight corresponding to the random noise, wherein the second weight is related to the value obtained by mapping the historical step using the target mapping method, and the target mapping method is used to determine the magnitude of randomness at each step; and fusing the plurality of first images and the random noise comprises: fusing the plurality of first images and the random noise based on the second weight.
The method according to any one of claims 1 to 4, wherein fusing the plurality of first images and the random noise comprises: fusing a denoised image obtained at a latest step, the plurality of first images, and the random noise.
The method according to claim 5, further comprising: determining a third weight corresponding to the denoised image obtained at the latest step, wherein the third weight is related to a value obtained by mapping the latest step using the target mapping method, and the target mapping method is used to determine the magnitude of randomness at each step; and fusing the denoised image obtained at the latest step, the plurality of first images, and the random noise comprises: fusing the denoised image obtained at the latest step, the plurality of first images, and the random noise based on the third weight.
The method according to any one of claims 2 to 6, wherein the target mapping method is a piecewise constant function.
The method according to any one of claims 1 to 7, wherein the random noise is Gaussian random noise.
The method according to any one of claims 1 to 8, wherein fusing the plurality of first images and the random noise, to obtain the denoised image at the current step comprises: fusing the plurality of first images and the random noise, to obtain an initial value of the denoised image at the current step; processing the initial value by using a denoising module in the diffusion model, to obtain a processing result; and fusing the processing result, the plurality of first images, and the random noise, to obtain the denoised image at the current step.
The method according to claim 9, further comprising: determining a fourth weight corresponding to the processing result, wherein the fourth weight is related to a value obtained by mapping the current step using the target mapping method, and the target mapping method is used to determine the magnitude of randomness at each step; and fusing the processing result, the plurality of first images, and the random noise comprises: fusing the processing result, the plurality of first images, and the random noise based on the fourth weight.
A data processing apparatus, comprising: an obtaining module, configured to obtain a plurality of first images and random noise, wherein different first images are denoised images predicted by using a denoising module in the diffusion model at different historical steps; and a processing module, configured to fuse the plurality of first images and the random noise, to obtain a denoised image at a current step.
The apparatus according to claim 11, wherein the processing module is further configured to: determine a first weight corresponding to each historical step, wherein each first weight is related to a value obtained by mapping the historical step using a target mapping method, and the target mapping method is used to determine a magnitude of randomness at each step; and the processing module is specifically configured to: fuse the plurality of first images and the random noise based on a plurality of first weights.
The apparatus according to claim 12, wherein each first weight is specifically obtained by performing a stochastic Adams method on the value obtained by mapping the historical step using the target mapping method.
The apparatus according to any one of claims 11 to 13, wherein the processing module is further configured to: determine a second weight corresponding to the random noise, wherein the second weight is related to the value obtained by mapping the historical step using the target mapping method, and the target mapping method is used to determine the magnitude of randomness at each step; and the processing module is specifically configured to: fuse the plurality of first images and the random noise based on the second weight.
The apparatus according to any one of claims 11 to 14, wherein the processing module is specifically configured to: fuse a denoised image obtained at a latest step, the plurality of first images, and the random noise.
The apparatus according to claim 15, wherein the processing module is further configured to: determine a third weight corresponding to the denoised image obtained at the latest step, wherein the third weight is related to a value obtained by mapping the latest step using the target mapping method, and the target mapping method is used to determine the magnitude of randomness at each step; and the processing module is specifically configured to: fuse the denoised image obtained at the latest step, the plurality of first images, and the random noise based on the third weight.
The apparatus according to any one of claims 12 to 16, wherein the target mapping method is a piecewise constant function.
The apparatus according to any one of claims 11 to 17, wherein the random noise is Gaussian random noise.
The apparatus according to any one of claims 11 to 18, wherein the processing module is specifically configured to: fuse the plurality of first images and the random noise, to obtain an initial value of the denoised image at the current step; process the initial value by using a denoising module in the diffusion model, to obtain a processing result; and fuse the processing result, the plurality of first images, and the random noise, to obtain the denoised image at the current step.
The apparatus according to claim 19, wherein the processing module is further configured to: determine a fourth weight corresponding to the processing result, wherein the fourth weight is related to a value obtained by mapping the current step using the target mapping method, and the target mapping method is used to determine the magnitude of randomness at each step; and the processing module is specifically configured to: fuse the processing result, the plurality of first images, and the random noise based on the fourth weight.

Description

This application claims priority to Chinese Patent Application No. 202310950031.2, filed with the China National Intellectual Property Administration on July 28, 2023 and entitled "DATA PROCESSING METHOD AND APPARATUS THEREOF", which is incorporated herein by reference in its entirety. TECHNICAL FIELD This application relates to the field of artificial intelligence, and in particular, to a data processing method and an apparatus thereof. BACKGROUND Artificial intelligence (Artificial Intelligence, AI) is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer, to perceive an environment, obtain knowledge, and achieve an optimal result based on the knowledge. In other words, artificial intelligence is a branch of computer science, and attempts to understand essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is to research design principles and implementation methods of various intelligent machines, so that the machines have perception, inference, and decision-making functions. A stable diffusion model (stable diffusion model, which may be referred to as a diffusion model for short) is a generative model, and is configured to generate high-fidelity multimedia data, such as an image, a voice, and a video. The diffusion model generates an image through a diffusion process. The model performs a plurality of diffusion and reverse diffusion operations on noise, to complete training and inference of the model. This makes the generation process of the diffusion model more stable and less prone to issues such as mode collapse. The generative model is a model that generates new data approximately following target distribution by simulating data distribution. The diffusion model is among the best-performing generative models in recent years. It defines a forward Markov chain to gradually add noise to data, and then learns its reverse process to transform noise back into data. To ensure that stationary distribution approximates noise distribution, the diffusion model requires a sufficient quantity of iterations T (typically, T=1000). An original sampling method of the diffusion model is equivalent to performing T-step reverse sampling of a Markov chain, which is highly time-consuming and hinders widespread application of the diffusion model in downstream tasks. Therefore, a fast and high-quality sampling method for the diffusion model is urgently required. SUMMARY This application provides a data processing method, to reduce sampling error under same calculation consumption, thereby reducing a quantity of sampling times. According to a first aspect, this application provides a data processing method. The method includes: obtaining a plurality of first images and random noise, where different first images are denoised images predicted by using a denoising module in a diffusion model at different historical steps; and fusing the plurality of first images and the random noise, to obtain a denoised image at a current step. In this application, when a denoised image is predicted, denoised images that are predicted at different historical steps are fused, so that sampling error can be reduced under same calculation consumption, thereby reducing a quantity of sampling times. In a possible implementation, the method further includes: determining a first weight corresponding to each historical step, where each first weight is related to a value obtained by mapping the historical step using a target mapping method, and the target mapping method is used to determine a magnitude of randomness at each step; and fusing the plurality of first images and the random noise includes: fusing the plurality of first images and the random noise based on a plurality of first weights. In this embodiment of this application, a variance control function τ(t) may be applied to better control randomness in a sampling process. In comparison with a deterministic sampling method, randomness is introduced to improve sampling quality, and in comparison with an existing random sampling method, randomness in a sampling process is controllable. In a possible implementation, each first weight is specifically obtained by performing a stochastic Adams method on the value obtained by mapping the historical step using the target mapping method. By using the stochastic Adams method, problems of low efficiency and slow convergence of numerical schemes used in the existing technologies can be overcome. In a possible implementation, the method further includes: determining a second weight corresponding to the random noise, where the second weight is related to the value obtained by mapping the historical step using the target mapping method, and the target mapping method is used to determine the magnitude of randomness at each step; and fusing