CN-120564250-B - Diffusion model-based self-adaptive fundus image classification method and system during continuous test

CN120564250BCN 120564250 BCN120564250 BCN 120564250BCN-120564250-B

Abstract

The application relates to a diffusion model-based self-adaptive fundus image classification method and a diffusion model-based self-adaptive fundus image classification system during continuous test, and relates to the technical field of medical image classification, wherein the method comprises the following steps: firstly, forward denoising is carried out on a fundus image through a diffusion model to obtain a denoised image, then, reverse denoising is carried out on the denoised image through the diffusion model, in the reverse denoising process, various gradient guidance is introduced, wherein the gradient guidance comprises content maintenance guidance, consistency guidance and style alignment guidance, gradient guidance data are introduced into the denoised image to carry out gradient fusion, an image is reconstructed, and finally, a classifier is utilized to predict a final reconstructed image. According to the application, a diffusion model is introduced to optimize the eye bottom image, so that the structure maintenance, style alignment and classification stability optimization of the continuous label-free test image are realized, the continuous label-free test image is gradually distributed close to a source domain, the domain alignment of an image level is realized, the source model parameter is not required to be modified, and the system stability and the deployment performance are ensured.

Inventors

LIU MINGSI
XU YANWU
LI XIANG
FANG HUIHUI
DUAN LIXIN
WANG JINGHAO

Assignees

人工智能与数字经济广东省实验室(广州)

Dates

Publication Date: 20260512
Application Date: 20250522

Claims (6)

1. The self-adaptive fundus image classification method based on the diffusion model during continuous test is characterized by comprising the following steps of: Acquiring fundus images to be classified; Performing time-step forward diffusion treatment on the fundus image to be classified through a preset diffusion model to obtain a noise-added image, and analyzing each time step of the forward diffusion process, wherein the diffusion model is an unconditional diffusion model; In the back diffusion process, aiming at each time step in the forward diffusion process, the noise-added image is processed through the diffusion model to obtain an intermediate image of the current time step; Performing guide recognition according to the intermediate image, and performing gradient guide analysis processing based on the intermediate image according to a guide recognition result to obtain a denoising intermediate image and gradient guide data, wherein the gradient guide data comprises content maintenance gradient information, consistency gradient information and style guide gradient information; Based on the gradient guidance data, carrying out gradient fusion processing by combining the denoising intermediate image, and carrying out iteration on each time step to finish a back diffusion process so as to obtain a final target reconstruction image; inputting the target reconstructed image into a freezing classifier for prediction processing to obtain a fundus image classification result; The method comprises the steps of performing guide recognition according to an intermediate image, performing gradient guide analysis processing on the intermediate image according to a guide recognition result, and obtaining a denoising intermediate image and gradient guide data, wherein the method comprises the steps of performing intermediate state estimation on the basis of the intermediate image, performing image distance analysis to determine image stability by combining a pure image corresponding to a fundus image to be classified, obtaining a guide application result, calculating a content retaining gradient on the basis of the denoising intermediate image when the guide application result is that guide application is determined, obtaining content retaining gradient information, performing enhancement processing on the basis of the denoising intermediate image, obtaining an enhancement image, performing gradient consistency prediction on the basis of the denoising intermediate image and the enhancement image, obtaining consistency gradient information, analyzing similarity between semantic texts of the denoising intermediate image and the multi-mode prediction model through a preset multi-mode prediction model, and guiding the denoising intermediate image to approach a target style corresponding to the semantic text in a reverse diffusion reconstruction process, so as to obtain style guide gradient information; Performing enhancement processing based on the denoising intermediate image to obtain an enhanced image, and performing gradient consistency prediction based on the denoising intermediate image and the enhanced image to obtain consistency gradient information, wherein the enhancement processing is performed based on the denoising intermediate image to obtain the enhanced image, and the denoising intermediate image and the enhanced image are taken as inputs according to the following steps of Predicting probability distribution according to Calculating a predictive consistency gradient, wherein, Representing the entropy of the measured prediction uncertainty, A series of image enhancements are represented and, Representing a single prediction probability distribution of the classifier on the denoising intermediate image; Analyzing the similarity between the denoising intermediate image and the semantic text of the multi-modal prediction model through a preset multi-modal prediction model, guiding the denoising intermediate image to approach to a target style corresponding to the semantic text in the back diffusion reconstruction process, and obtaining style guiding gradient information, wherein the method comprises the steps of inputting the denoising intermediate image into the multi-modal prediction model according to the following steps of Introducing semantic text of a multi-mode prediction model into a denoising intermediate image, and calculating similarity between the denoising intermediate image and the semantic text According to Calculating style guidance gradient information; In order to enter the text of the model, Is an image encoder of a model of the image, A text encoder that is a model of the device, Parameters of the gradient are guided for controlling the style; calculating content retention gradient based on the denoised intermediate image to obtain content retention gradient information, comprising taking the denoised intermediate image as input according to Calculating content retention gradients ; Based on the gradient guidance data, carrying out gradient fusion processing by combining the denoising intermediate image, and carrying out iteration of each time step to complete a back diffusion process to obtain a final target reconstruction image, wherein the method comprises the following steps of Fusing the denoising intermediate image and the gradient guide data; to control the parameters of the gradient.
2. The method according to claim 1, wherein performing time-step forward diffusion processing on the fundus image to be classified by a preset diffusion model to obtain a noisy image comprises: inputting the fundus image to be classified into the diffusion model according to Performing forward diffusion process to generate noise-added image ; Wherein, the Indicating that the added noise is subject to a standard normal distribution, Indicating the cumulative reserve ratio from time 1 to time 1 The successive products of all the coefficients are retained for each time step.
3. The method of claim 1, wherein processing the noisy image by the diffusion model during back diffusion for each time step in forward diffusion results in an intermediate image of the current time step, comprising: in the back diffusion process, analyzing the noise-added image by using the diffusion model by taking each time step in the forward diffusion process as a reference, and determining randomness introduction information, wherein the randomness introduction information comprises randomness introduction intensity; And processing the noise-added image by using the randomness introduced information as a reference through the diffusion model to obtain an intermediate image.
4. A method according to claim 3, wherein processing the noisy image with the diffusion model based on the randomness-introducing information to obtain an intermediate image comprises: For each time step, based on the randomness-introduced information, according to Generating an intermediate image of the current time step by using the diffusion model; Wherein, the A neural network model, i.e. the diffusion model, For the current time step The noise standard deviation is used for controlling whether to inject a part of extra noise into the current generated image, and the randomness introducing intensity is based on Determine, and The value of (a) is based on time steps The stage is scheduled when When expressed as deterministic denoising process, no randomness is introduced, when At this time, denoted as a denoising step with random perturbation, noise perturbation is introduced on the basis of model prediction.
5. The method according to claim 1, wherein performing intermediate state estimation based on the intermediate image, and performing image distance analysis to determine image stability by combining the solid image corresponding to the fundus image to be classified with the denoising intermediate image obtained by intermediate state estimation, to obtain a guidance application result, comprises: According to Intermediate state estimation is carried out to obtain a denoising intermediate image ; According to Analyzing first distance information between the noise-removed intermediate image and the solid black image and the solid white image in the solid color image, and according to Analyzing pixel differences of the denoising intermediate image and the fundus image to be classified to obtain second distance information; Comparing the first distance information with the second distance information, and When the application guidance is determined as a guidance application result; Wherein, the Is a pure black image, and is characterized by that, And when the first distance information is larger than or equal to the second distance information, determining that the denoising intermediate image forms a stable structure, and applying guidance.
6. A diffusion model-based continuous test-time adaptive fundus image classification system, comprising: The image acquisition module is used for acquiring fundus images to be classified; The forward diffusion module is used for performing time-step forward diffusion treatment on the fundus image to be classified through a preset diffusion model to obtain a noise-added image, and analyzing each time step of the forward diffusion process, wherein the diffusion model is an unconditional diffusion model; the back diffusion module is used for processing the noise-added image through the diffusion model for each time step in the forward diffusion process in the back diffusion process to obtain an intermediate image of the current time step; the gradient guiding module is used for guiding and identifying according to the intermediate image, and carrying out gradient guiding analysis processing based on the intermediate image aiming at a guiding and identifying result to obtain a denoising intermediate image and gradient guiding data, wherein the gradient guiding data comprises content maintenance gradient information, consistency gradient information and style guiding gradient information; the fusion reconstruction module is used for carrying out gradient fusion processing by combining the denoising intermediate image based on the gradient guide data, and carrying out iteration of each time step to finish a back diffusion process so as to obtain a final target reconstruction image; the prediction module is used for inputting the target reconstructed image into a freezing classifier for prediction processing to obtain a fundus image classification result; The method comprises the steps of performing guide recognition according to an intermediate image, performing gradient guide analysis processing on the intermediate image according to a guide recognition result, and obtaining a denoising intermediate image and gradient guide data, wherein the method comprises the steps of performing intermediate state estimation on the basis of the intermediate image, performing image distance analysis to determine image stability by combining a pure image corresponding to a fundus image to be classified, obtaining a guide application result, calculating a content retaining gradient on the basis of the denoising intermediate image when the guide application result is that guide application is determined, obtaining content retaining gradient information, performing enhancement processing on the basis of the denoising intermediate image, obtaining an enhancement image, performing gradient consistency prediction on the basis of the denoising intermediate image and the enhancement image, obtaining consistency gradient information, analyzing similarity between semantic texts of the denoising intermediate image and the multi-mode prediction model through a preset multi-mode prediction model, and guiding the denoising intermediate image to approach a target style corresponding to the semantic text in a reverse diffusion reconstruction process, so as to obtain style guide gradient information; Performing enhancement processing based on the denoising intermediate image to obtain an enhanced image, and performing gradient consistency prediction based on the denoising intermediate image and the enhanced image to obtain consistency gradient information, wherein the enhancement processing is performed based on the denoising intermediate image to obtain the enhanced image, and the denoising intermediate image and the enhanced image are taken as inputs according to the following steps of Predicting probability distribution according to Calculating a predictive consistency gradient, wherein, Representing the entropy of the measured prediction uncertainty, A series of image enhancements are represented and, Representing a single prediction probability distribution of the classifier on the denoising intermediate image; Analyzing the similarity between the denoising intermediate image and the semantic text of the multi-modal prediction model through a preset multi-modal prediction model, guiding the denoising intermediate image to approach to a target style corresponding to the semantic text in the back diffusion reconstruction process, and obtaining style guiding gradient information, wherein the method comprises the steps of inputting the denoising intermediate image into the multi-modal prediction model according to the following steps of Introducing semantic text of a multi-mode prediction model into a denoising intermediate image, and calculating similarity between the denoising intermediate image and the semantic text According to Calculating style guidance gradient information; In order to enter the text of the model, Is an image encoder of a model of the image, A text encoder that is a model of the device, Parameters of the gradient are guided for controlling the style; calculating content retention gradient based on the denoised intermediate image to obtain content retention gradient information, comprising taking the denoised intermediate image as input according to Calculating content retention gradients ; Based on the gradient guidance data, carrying out gradient fusion processing by combining the denoising intermediate image, and carrying out iteration of each time step to complete a back diffusion process to obtain a final target reconstruction image, wherein the method comprises the following steps of Fusing the denoising intermediate image and the gradient guide data; to control the parameters of the gradient.

Description

Diffusion model-based self-adaptive fundus image classification method and system during continuous test Technical Field The application relates to the technical field of medical image classification, in particular to a diffusion model-based self-adaptive fundus image classification method and system during continuous test. Background In the field of medical image classification, with the rapid development of deep learning (DEEP LEARNING), an image recognition method based on a convolutional neural network (Convolutional Neural Networks, CNN) is widely applied to disease auxiliary diagnosis, and particularly shows higher accuracy in the aspects of fundus image (Fundus Image) analysis, such as glaucoma classification, diabetic retinopathy (Diabetic Retinopathy, DR) classification and other tasks. However, the current deep learning model has the common problem of sensitivity to distribution change (Domain Shift), namely, the model performance is obviously reduced under the condition that the distribution of training data and test data is different (such as different imaging equipment, different hospital sources and the like), so that the actual deployment and popularization of the model in clinic are limited. To solve the above problems, the related art uses a Test-Time Adaptation (TTA) method. The method allows the model to adjust the model by using the label-free data in the test stage on the premise of not accessing the target domain labeling data, thereby improving the generalization capability of the model. Exemplary TTA methods include the TENT method based on entropy minimization (Entropy Minimization), the EATA method based on pseudo tag self-training, and the CoTTA method based on model regularization, among others. The method updates model parameters in a testing stage by adopting an auxiliary loss function, and realizes migration from a source domain to a target domain. However, in a self-adaptive (Continual Test-Time Adaptation, CTA) scene during continuous testing, when the model needs to adapt to a plurality of test data sequences from different domains in sequence, the conventional TTA method is prone to two problems, namely firstly, depending on model parameter updating, catastrophic forgetting (Catastrophic Forgetting) is easy to cause, namely, the model is adaptive to a new domain and forgets knowledge of an old domain, the Adaptation capability of the model to domain distribution change is poor, unstable learning is easy to cause, secondly, depending on pseudo labels or entropy regularization excessively can generate error supervision when predicting instability, further weakens model performance, cannot be suitable for medical image processing, and cannot guarantee the retention of medical structure information. In addition, the TTA research of the diffusion model at present is not developed for the medical field of fundus image classification, and particularly, the design of combining medical characteristics in the aspects of continuous self-adaption, structure preservation, semantic consistency and the like is lacking. It can be seen that the prior art fails to implement the application of the diffusion model to the self-adaptive fundus image classification, and has the problems of poor self-adaptive capability, unstable model and the like. Disclosure of Invention The application provides a diffusion model-based self-adaptive fundus image classification method and a diffusion model-based self-adaptive fundus image classification system for improving stability and generalization capability of a model under different image quality conditions in a test stage, and focusing on iterative optimization of medical images. On one hand, the application realizes that the diffusion model is applied to the processing of medical images such as fundus images, and on the other hand, the technical scheme of the application guides the test image pair Ji Yuanyu to be distributed by utilizing the diffusion denoising process, and simultaneously combines the mechanisms of anatomy structure maintenance, style guidance, consistency constraint and the like, thereby effectively improving the stability and robustness of the model facing multi-source data in the actual clinical environment, ensuring the stability and deployment of the system without modifying the parameters of the source model, and solving the problems of poor self-adaptive capacity and unstable model in the prior art. In a first aspect, the present application provides a diffusion model-based adaptive fundus image classification method during continuous testing, including: Acquiring fundus images to be classified; Performing time-step forward diffusion treatment on the fundus image to be classified through a preset diffusion model to obtain a noise-added image, and analyzing each time step of the forward diffusion process, wherein the diffusion model is an unconditional diffusion model; In the back diffusion process, aiming at each time step in the