CN-121982375-A - Generalized zero sample learning method based on diffusion model

CN121982375ACN 121982375 ACN121982375 ACN 121982375ACN-121982375-A

Abstract

The invention discloses a generalized zero sample learning method based on a diffusion model, which comprises the steps of obtaining image data and semantic description information of a visible category, training an improved frequency self-adaptive diffusion network according to the obtained image data and semantic description information, inputting semantic description of the invisible category into the trained diffusion network, generating a high-quality synthetic sample of a corresponding category by utilizing a filtering generation algorithm, training and classifying and identifying a multi-layer perceptron network on the synthetic sample and training data of the visible category, and inputting a real sample into the classifying and identifying multi-layer perceptron network to identify and classify the visible category and the invisible category data. Aiming at the frequency domain characteristics of the diffusion model generalized zero sample learning task, the invention designs and improves the generation algorithm and the network architecture of the diffusion model, effectively improves the performance of the diffusion model on the zero sample learning task, and further improves the generalization performance of the diffusion model on unseen data in a real scene.

Inventors

QIN JIE
WANG JIANCHAO
ZHOU PENG

Assignees

南京航空航天大学

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (4)

1. The generalized zero sample learning method based on the diffusion model is characterized by comprising the following steps of: step S1, obtaining image data of visible categories and semantic description information to construct a training sample data set, and training an improved frequency self-adaptive diffusion network by using the training sample data set; S2, inputting semantic description information of the image data of the invisible category into a trained frequency self-adaptive diffusion network, and generating a high-quality synthetic sample of the corresponding category of the image data of the invisible category by utilizing a filtering generation algorithm; Step S3, training, classifying and identifying a multi-layer perceptron network by using the category information of the image data of the visible category and the image data of the invisible category; and S4, identifying and classifying the image data of the seen class and the unseen class by using the trained classification and identification multi-layer perceptron network.
2. The method according to claim 1, wherein step S1 specifically comprises: S101, obtaining image data and semantic description information of a visible category; s102, extracting image data characteristics of the image data of the visible category by utilizing a pre-training ResNet-101 network; S103, training the improved frequency self-adaptive diffusion network by using the extracted image data characteristics and semantic description information, setting a prediction target of the frequency self-adaptive diffusion network as noise, and training by using a mean square error loss function.
3. The method according to claim 1, wherein step 3 specifically comprises: S301, constructing a data set containing a real sample and a synthetic sample, wherein the image data of the invisible category is the synthetic sample, and the image data of the visible category is formed by mixing the real sample and the synthetic sample; s302, training and classifying by utilizing a data set to identify a multi-layer perceptron network, wherein an interleaved entropy loss function is used in the training process.
4. The method of claim 1, wherein the improved frequency adaptive diffusion network uses a U-Net network as an infrastructure and uses frequency adaptive hopping connection modules instead of identity hopping connection modules.

Description

Generalized zero sample learning method based on diffusion model Technical Field The invention relates to the technical field of generalized zero sample learning, in particular to a generalized zero sample learning method based on a diffusion model. Background Generalized Zero-Shot Learning (GZSL) is a special classification task whose core is set to use only the already-seen class samples in the training phase and to accurately classify the already-seen class and the never-contacted, non-seen class samples in the testing phase. The core value of generalized zero sample learning is its cognitive ability to simulate real world open scenes. In real applications, the number of categories often increases explosively (e.g., new species discovery, new diseases, emerging merchandise, unknown military targets, etc.), and traditional supervised learning models are difficult to deal with due to limited coverage of training data. The generalized zero sample learning can be generalized to the non-class recognition by training only by using the class-seen data, so that the contradiction that the training data can not always cover all classes is solved to a certain extent. The generalized zero sample learning method is mainly divided into two types, namely a generation type and a discriminant type. The flow of the generating method is that the generating model is trained by using the seen type sample, the generating model is used for generating the undisee type sample, and then the undisee type sample is combined with the true seen type sample to train the classifying network. Diffusion models have recently exhibited excellent performance in the field of image generation, the ability to generate high quality, realistic images being of great concern. Nevertheless, the diffusion model is still behind the GAN or VAE-based generative zero sample learning method in the Generalized Zero Sample Learning (GZSL) task, which further limits the application of the diffusion model in the zero sample learning task. Therefore, based on the technical problems, an effective generalized zero sample learning method based on a diffusion model needs to be designed. Disclosure of Invention The invention provides a generalized zero sample learning method based on a diffusion model, which realizes that the diffusion model generates high-quality unseen samples and promotes a classification recognition multi-layer perceptron network to accurately recognize unseen samples and seen samples. The embodiment of the invention provides a generalized zero sample learning method based on a diffusion model, which comprises the following steps: step S1, obtaining image data of visible categories and semantic description information to construct a training sample data set, and training an improved frequency self-adaptive diffusion network by using the training sample data set; S2, inputting semantic description information of the image data of the invisible category into a trained frequency self-adaptive diffusion network, and generating a high-quality synthetic sample of the corresponding category of the image data of the invisible category by utilizing a filtering generation algorithm; Step S3, training, classifying and identifying a multi-layer perceptron network by using the category information of the image data of the visible category and the image data of the invisible category; and S4, identifying and classifying the image data of the seen class and the unseen class by using the trained classification and identification multi-layer perceptron network. Optionally, in one embodiment of the present invention, step S1 specifically includes: S101, obtaining image data and semantic description information of a visible category; s102, extracting image data characteristics of the image data of the visible category by utilizing a pre-training ResNet-101 network; S103, training the improved frequency self-adaptive diffusion network by using the extracted image data characteristics and semantic description information, setting a prediction target of the frequency self-adaptive diffusion network as noise, and training by using a mean square error loss function. Optionally, in one embodiment of the present invention, step 3 specifically includes: S301, constructing a data set containing a real sample and a synthetic sample, wherein the image data of the invisible category is the synthetic sample, and the image data of the visible category is formed by mixing the real sample and the synthetic sample; s302, training and classifying by utilizing a data set to identify a multi-layer perceptron network, wherein an interleaved entropy loss function is used in the training process. Alternatively, in one embodiment of the invention, the improved frequency adaptive diffusion network uses a U-Net network as the infrastructure, using frequency adaptive hopping connection modules instead of identity hopping connection modules. According to the generalized zero sample learnin