CN-116385821-B - Diffusion model-based graph-text related multi-distribution sampling method and device

CN116385821BCN 116385821 BCN116385821 BCN 116385821BCN-116385821-B

Abstract

The invention provides a diffusion model-based graph-text related multi-distribution sampling method and device, which comprise the steps of obtaining data to be processed, wherein the data to be processed is one of image data, text data and graph-text data, determining the noise type and/or noise parameter value to be input according to the preset sampling type, and inputting the data to be processed, the noise type and/or noise parameter value to be input into a pre-trained graph-text related multi-distribution sampling model to obtain a sampling result, wherein the graph-text related multi-distribution sampling model is obtained by training a sample data pair and noise added in the sample data pair based on a pre-built neural network, and the sample data pair is a data pair formed by an image data sample and a text data sample. The invention considers a plurality of image-text distributions at the same time, carries out noise adding according to the preset sampling type, adopts the image-text related multi-distribution sampling model, realizes multifunction and has high universality, and is suitable for multi-distribution sampling of multi-mode data.

Inventors

ZHU JUN
BAO FAN
SU HANG

Assignees

清华大学

Dates

Publication Date: 20260512
Application Date: 20230306

Claims (9)

1. The graph-text related multi-distribution sampling method based on the diffusion model is characterized by comprising the following steps of: acquiring data to be processed, wherein the data to be processed is one of image data, text data and image-text data; According to a preset sampling type, determining the type of noise and/or the noise parameter value to be input; inputting the data to be processed, the noise type to be input and/or the noise parameter value into a pre-trained image-text related multi-distribution sampling model to obtain a sampling result; The image-text related multi-distribution sampling model is obtained by training a sample data pair and noise added in the sample data pair based on a pre-constructed neural network, wherein the sample data pair is a data pair formed by an image data sample and a text data sample; The image-text related multi-distribution sampling model is obtained through training the following steps: s1, acquiring the sample data pair; s2, acquiring target image noise and target text noise based on standard Gaussian distribution, and acquiring an image noise parameter value and a text noise parameter value; s3, calculating the linear combination of the image data sample and the image noise parameter value according to the image noise parameter value to obtain a noisy image sample, and calculating the linear combination of the text data sample and the text noise parameter value according to the text noise parameter value to obtain a noisy text sample; S4, inputting the image noise parameter value, the text noise parameter value, the noise-added image sample and the noise-added text sample into a picture-text related multi-distribution sampling model to obtain an image noise prediction result and a text noise prediction result; S5, calculating a two-norm square of the difference between the image noise prediction result and the target image noise and a two-norm square of the difference between the text noise prediction result and the target text noise, and training parameters of the image-text related multi-distribution sampling model by taking the sum of the two-norm squares as a target; And S6, repeating the steps S1-S5 until the preset training iteration times are reached, and taking the parameters of the last training model as the parameters of the final model to obtain the trained image-text related multi-distribution sampling model.
2. The diffusion model-based graph-text related multi-distribution sampling method according to claim 1, wherein determining the noise type and/or the noise parameter value to be input according to the preset sampling type specifically comprises: If the preset sampling type is image sampling, determining that the type of noise to be input is text noise, designating the text noise to be input as standard Gaussian noise, and determining that the parameter value of the text noise to be input is the maximum value in the preset range.
3. The diffusion model-based graph-text related multi-distribution sampling method according to claim 1, wherein determining the noise type and/or the noise parameter value to be input according to the preset sampling type specifically comprises: If the preset sampling type is text sampling, determining that the type of noise to be input is image noise, designating the image noise to be input as standard Gaussian noise, and determining that the parameter value of the image noise to be input is the maximum value in the preset range.
4. The diffusion model-based graph-text related multi-distribution sampling method according to claim 1, wherein determining the noise type and/or the noise parameter value to be input according to the preset sampling type specifically comprises: if the preset sampling type is the image-text combined sampling, determining that the image noise parameter value to be input and the text noise parameter value to be input are the same preset value.
5. The diffusion model-based graph-text related multi-distribution sampling method according to claim 1, wherein determining the noise type and/or the noise parameter value to be input according to the preset sampling type specifically comprises: If the preset sampling type is graph-to-text sampling, determining that the type of noise to be input is image noise, designating the image noise to be input as a preset image, and determining that the parameter value of the image noise to be input is 0.
6. The diffusion model-based graph-text related multi-distribution sampling method according to claim 1, wherein determining the noise type and/or the noise parameter value to be input according to the preset sampling type specifically comprises: if the preset sampling type is text-to-graph sampling, determining that the noise type to be input is text noise, designating the text noise to be input as a preset text, and determining that the image noise parameter value to be input is 0.
7. The image-text related multi-distribution sampling device based on the diffusion model is characterized by comprising: The data acquisition unit is used for acquiring data to be processed, wherein the data to be processed is one of image data, text data and graphic data; the noise determining unit is used for determining the type of noise and/or the noise parameter value to be input according to the preset sampling type; the sampling unit is used for inputting the data to be processed, the noise type to be input and/or the noise parameter value into a pre-trained image-text related multi-distribution sampling model so as to obtain a sampling result; The image-text related multi-distribution sampling model is obtained by training a sample data pair and noise added in the sample data pair based on a pre-constructed neural network, wherein the sample data pair is a data pair formed by an image data sample and a text data sample; the training method comprises the following steps of: S1, acquiring a sample data pair; s2, acquiring target image noise and target text noise based on standard Gaussian distribution, and acquiring an image noise parameter value and a text noise parameter value; s3, calculating the linear combination of the image data sample and the image noise parameter value according to the image noise parameter value to obtain a noisy image sample, and calculating the linear combination of the text data sample and the text noise parameter value according to the text noise parameter value to obtain a noisy text sample; S4, inputting the image noise parameter value, the text noise parameter value, the noise-added image sample and the noise-added text sample into a picture-text related multi-distribution sampling model to obtain an image noise prediction result and a text noise prediction result; S5, calculating a two-norm square of the difference between the image noise prediction result and the target image noise and a two-norm square of the difference between the text noise prediction result and the target text noise, and training parameters of the image-text related multi-distribution sampling model by taking the sum of the two-norm squares as a target; And S6, repeating the steps S1-S5 until the preset training iteration times are reached, and taking the parameters of the last training model as the parameters of the final model to obtain the trained image-text related multi-distribution sampling model.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the diffusion model based teletext dependent multi-distribution sampling method according to any one of claims 1 to 6 when the program is executed.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a diffusion model based teletext dependent multi-distribution sampling method according to any one of claims 1 to 6.

Description

Diffusion model-based graph-text related multi-distribution sampling method and device Technical Field The invention relates to the technical field of machine learning, in particular to a graph-text related multi-distribution sampling method and device based on a diffusion model. Background The diffusion model is a depth generation model for generating data similar to the data used to train them, and works on the principle that training data is destroyed by continuously adding gaussian noise, and then recovery data is learned by reversing this noise process. After training, we can use a diffusion model to generate data by simply passing randomly sampled noise through a learning denoising process. That is, the model is determined by the denoising model over noisy data. The graph-text related multi-distribution modeling problem refers to a problem of modeling a family of potential distributions of graph-text data, wherein the family of distributions comprises edge distributions of image data, edge distributions of text data, joint distributions of graph-text data, condition distributions of text-to-graph, and condition distributions of graph-to-text. For the problem of graph-text related multi-distribution modeling of graph-text data, existing diffusion models are designed to model a specific distribution, such as a conditional distribution of a graph-text. During training, the diffusion model adds noise to the image in the image-text data, and then predicts the noise in the image by taking the noisy image, the original text and the image noise size as inputs. The diffusion model thus trained only considers a single distribution, that is, the existing diffusion model cannot support multi-distribution modeling of any multi-modal (graphic) data, for example, one model simultaneously supports modeling of edge distribution of image data, edge distribution of text data, joint distribution of graphic data, condition distribution from text to graph, condition distribution from graph to text, and the like. This results in an existing diffusion model that is only single-function when used for sampling. In summary, the existing sampling method has the problems of single function and low universality. Disclosure of Invention The invention provides a diffusion model-based graph-text related multi-distribution sampling method and device, which are used for solving the defects of single function and low universality in the prior art and realizing the effects of multifunction and high universality and are suitable for multi-distribution sampling of any multi-mode data. The invention provides a graph-text related multi-distribution sampling method based on a diffusion model, which comprises the following steps: acquiring data to be processed, wherein the data to be processed is one of image data, text data and image-text data; According to a preset sampling type, determining the type of noise and/or the noise parameter value to be input; inputting the data to be processed, the noise type to be input and/or the noise parameter value into a pre-trained image-text related multi-distribution sampling model to obtain a sampling result; the image-text related multi-distribution sampling model is obtained by training a sample data pair and noise added in the sample data pair based on a pre-constructed neural network, wherein the sample data pair is a data pair formed by an image data sample and a text data sample. According to the graph-text related multi-distribution sampling method based on the diffusion model, which is provided by the invention, the graph-text related multi-distribution sampling model is obtained by training a sample data pair and noise added in the sample data pair based on a pre-constructed neural network, and the method specifically comprises the following steps: s1, acquiring a sample data pair, wherein the sample data pair comprises an image data sample and a text data sample; s2, acquiring target image noise and target text noise based on standard Gaussian distribution, and acquiring an image noise parameter value and a text noise parameter value; s3, calculating the linear combination of the image data sample and the image noise parameter value according to the image noise parameter value to obtain a noisy image sample, and calculating the linear combination of the text data sample and the text noise parameter value according to the text noise parameter value to obtain a noisy text sample; S4, inputting the image noise parameter value, the text noise parameter value, the noise-added image sample and the noise-added text sample into a picture-text related multi-distribution sampling model to obtain an image noise prediction result and a text noise prediction result; S5, calculating a two-norm square of the difference between the image noise prediction result and the target image noise and a two-norm square of the difference between the text noise prediction result and the target text noise, and training pa