CN-121999069-A - Method for generating layered double-flow diffusion model fundus image marked with optic disc atrophy arc

CN121999069ACN 121999069 ACN121999069 ACN 121999069ACN-121999069-A

Abstract

The application discloses a method for generating a layered double-flow diffusion model fundus image marked with optic disc atrophy arcs, which designs a double-flow generation framework, wherein the framework decouples a diffusion process into macroscopic semantic generation and fine granularity detail refinement, and balances contradiction between generating fine details and representing high-level perception characteristics of a latent space diffusion model. The application designs a corresponding two-stage generation strategy to enhance the guidance from semantics to details, obviously improves the authenticity and quality of generated images, designs a fundus perception loss, adopts a basic model trained on a large amount of fundus data as an image feature extractor, solves the problem of domain offset caused by the traditional natural image pre-training based feature extraction network calculation perception loss, and improves the characterization capability of an encoder on fundus image features.

Inventors

LI HUIQI
YANG TIANXIAO
YANG CHENGZHU
ZHANG WEIHANG
LIU HANRUO
LU SHUAI

Assignees

北京理工大学

Dates

Publication Date: 20260508
Application Date: 20260119

Claims (10)

1. A method for generating a layered double-flow diffusion model fundus image for labeling an optic disc atrophy arc is characterized by comprising the following steps: Initializing a fundus image dataset, and constructing a training set and a testing set; step 2, constructing a hierarchical vector quantization generation countermeasure network as a variable self-encoder, specifically: Step 2.1 constructing a layered encoder, including an underlying encoder And top layer decoder ; Step 2.1.1 bottom layer encoder Encoding an input image x to obtain a bottom hidden space image ; Step 2.1.2 Top layer encoder Latent space image to bottom layer Coding to obtain top layer hidden space image ; Step 2.2 constructing a layered decoder including a top layer decoder And bottom layer decoder ; Step 2.2.1 Top layer decoder Spatially discrete features hidden from the top layer Decoding to obtain the bottom hidden space image ; Step 2.2.2 bottom layer decoder For the fused bottom hidden space image Decoding to obtain a pixel space reconstruction image ; Wherein the top layer has hidden space discrete features The calculation process of (1) comprises: step 2.3, employing a top layer vector quantization module First, a convolution layer is used for hiding a spatial image of a top layer Pre-quantizing the transform, adjusting the distribution characteristics prior to quantization to make the potential representation more suitable for discrete quantization, replacing the top-level hidden space image with a quantized codebook vector having a nearest quantized bit number of 2048 Finishing the top-layer vector quantization to obtain a top-layer hidden space vector quantized image ; Wherein, the bottom layer is hidden in the space image The calculation process of (1) comprises: Step 2.4 Using a Top-level decoder pair Quantizing an image to a hidden space vector Decoding and combining with the underlying hidden space image Splicing the channel dimensions to obtain a bottom hidden space image Then, using convolution layer to conceal the space image of the bottom layer Pre-quantization transform, adjusting distribution characteristics before quantization to make potential representation more suitable for discrete quantization, replacing underlying hidden space image with quantized codebook vector with nearest quantized bit number of 2048 Finishing the quantization of the bottom layer vector to obtain a bottom layer hidden space vector quantized image ; Step 2.5, quantizing the image to the top hidden space vector using the transposed convolution upsampling module UP Upsampling is carried out, and the upsampled top layer hidden space vector quantized image is obtained Quantized image with underlying hidden space vector Splicing the channel dimensions to obtain a bottom hidden space image Using convolution layers to conceal the spatial image of the underlying layer Post-quantization transforming, adjusting the quantized representation to more closely match the input expectations of the decoder, and finally feeding into the underlying decoder Latent space image to bottom layer Decoding is carried out; step 2.6 constructing a penalty function for training layered vector quantization to generate an countermeasure network, in particular by vector quantization Pixel level Generating an anti-loss function based on a patch And a perceptual loss function based on fundus domain adaptation Composition; Step 3, constructing a layered double-flow hidden space diffusion model: the method comprises the following steps of dividing a top layer hidden space diffusion model and a bottom layer hidden space diffusion model into two mutually independent subnetworks, and specifically comprises the following steps: step 3.1, the input of the top layer hidden space denoising network is formed by splicing a fundus image and a corresponding optic disc atrophy arc mask in a channel dimension, and a top layer encoder of the layered encoder in step 2.1 is used And top-level vector quantization encoder of the vector quantization encoder of step 2.3 Mapping to obtain top-level hidden space discrete features Then, adding noise to the image to form a noise-containing characteristic diagram ; The input of the bottom layer hidden space denoising network is formed by splicing a fundus image and a corresponding optic disk atrophy arc mask in a channel dimension, and a bottom layer encoder of the layered encoder in the step 2.1 is used And the underlying vector quantization encoder of the vector quantization encoder of step 2.3 Mapping to obtain the bottom hidden space vector quantized image Then, adding noise to the image to form a noise-containing characteristic diagram ; Step 3.2, constructing a loss function of the top hidden space diffusion model and the bottom hidden space diffusion model; training a layered double-flow hidden space diffusion model; step 4.1, training hierarchical vector quantization to generate an countermeasure network; step 4.2, training a top layer hidden space diffusion model; inputting fundus images and optic disc atrophy arc marks in the training set into the layered vector quantization trained in the step 4.1 to generate an countermeasure network encoder to obtain a top-layer hidden space fundus characteristic map And top layer hidden space optic disk atrophy arc labeling feature map Taking the top-layer hidden space optic disc atrophy arc labeling feature map as a condition, and inputting the top-layer hidden space fundus feature map into a top-layer hidden space denoising network as an initial point ; According to the denoising diffusion probability model, defining a Markov chain denoising process with the length of T, and setting a noise coefficient table at the moment of T Top layer hidden space noisy feature map at time t Expressed as: Wherein the method comprises the steps of Is random noise, t represents a time tag of a denoising step; Will take the noisy characteristic map Inputting the top layer hidden space denoising network in the step 3.1, forward propagating to generate a denoised reconstructed hidden space feature map, calculating a top layer hidden space diffusion model loss function according to the denoised top layer hidden space feature map and the denoised top layer hidden space feature map, and performing back propagation to perform parameter optimization until a stopping condition is reached, and storing the trained top layer hidden space diffusion model; step 4.3, training a bottom hidden space diffusion model; Inputting the fundus image and the optic disc atrophy arc mark in the training set into the layered vector quantization trained in the step 4.1 to generate an countermeasure network encoder to obtain a fundus characteristic map of the underlying hidden space And bottom hidden space optic disk atrophy arc labeling feature map The bottom hidden space optic disk atrophy arc labeling feature map is used as a condition and the bottom hidden space fundus feature map is input into a bottom hidden space denoising network as an initial after channel dimension splicing ; According to the denoising diffusion probability model, defining a Markov chain denoising process with the length of T, and setting a noise coefficient table at the moment of T Top layer hidden space noisy feature map at time t Expressed as: Wherein the method comprises the steps of Is random noise, t represents the time stamp of the denoising step, Representing an initial noise-free feature image; Will take the noisy characteristic map Inputting the bottom layer hidden space denoising network in the step 3.1.2, and generating a reconstructed hidden space feature map after denoising by forward propagation, calculating a bottom layer hidden space diffusion model loss function according to the noisy bottom layer hidden space feature map and the denoised bottom layer hidden space feature map, and performing back propagation to perform parameter optimization until a stopping condition is reached, and storing a trained bottom layer hidden space diffusion model; And 5, generating a fundus image by using the trained layered double-flow hidden space diffusion model, wherein the method comprises the following steps of: step 5.1, generating a hidden space image in the first stage; the optical disc atrophy arc marks in the test set are respectively used as the trained network parameters in the step 4.2 as follows The condition input of the top layer hidden space diffusion model and the network parameters trained in the step 4.3 are as follows The condition input is used as a denoising starting point of the layered double-flow latent space diffusion model to generate a top-layer latent space characteristic image And underlying latent space feature map ; Step 5.2, generating a hidden space image in the second stage; using step 4.1 trained top layer decoder For the generated top layer latent space feature image Decoding and generating a latent space feature map of the bottom layer Splicing in channel dimension, taking the bottom hidden space optic disk atrophy arc labeling feature map as a condition, inputting the vector quantized spliced feature map into a bottom latent space diffusion model for two-stage generation to obtain Using a transpose convolution module UP to generate a top layer latent space feature image The subsampled and two-stage generated bottom latent space feature map Splicing in channel dimension to obtain latent space feature image for generating fundus image, and using bottom layer vector quantization decoder trained in step 4.1 And decoding the generated fundus image latent space feature map to obtain a generated fundus image of the pixel space.
2. The method for generating a hierarchical dual-flow diffusion model fundus map for labeling an optic disc atrophy arc according to claim 1, wherein in the step 1, the method for constructing the training set and the test set comprises the following steps: Collecting color fundus images with optic disc atrophy arc marks, finding the center coordinates of the optic disc atrophy arc areas through the optic disc atrophy arc marks, cutting the original fundus images to 720 x 720 local fundus images with the coordinates as the center if the image width L is smaller than or equal to 360, and cutting the original fundus images to 2L x 2L local fundus images with the coordinates as the center if the image width L is larger than 360; The size of the cut image is adjusted to 128 multiplied by 128 pixels by adopting a nearest neighbor interpolation mode, and the initialization processing of the data set is completed; And taking part of images as a training set and the rest images as a test set in a random distribution mode.
3. The method for generating a hierarchical dual-flow diffusion model fundus map for labeling an optic disc atrophy arc according to claim 1, wherein the step 2.1.1 specifically comprises: The initial convolution layer convolution kernel has a size of 3×3 and a step size of 1, and a feature map with a spatial dimension of 128×128×64 is extracted from an image x of 128×128×3 by an underlying encoder Downsampling by a downsampling module of (a) and a bottom layer encoder The down sampling module comprises a residual network and a down sampling layer, wherein the residual network comprises two groups of residual modules which are connected in series, the residual module consists of two groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3 multiplied by 3, the step length is 1, the convolution kernel size of the down sampling layer is 3 multiplied by 3, the step length is 2, the residual module expands the channel number of the feature map from 128 multiplied by 64 to 128 multiplied by 128, and the down sampling layer converts the space dimension of the feature map from 128 multiplied by 128 to 64 multiplied by 128; Using an underlying encoder The middle layer module comprises two residual modules and a self-attention module, wherein the input characteristics are output by the downsampling module and the characteristic size of the input is not changed, the residual modules consist of two groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3 multiplied by 3, and the step length is 1; Finally using the underlying encoder The output layer module of (2) performs feature output, the convolution kernel size of the output layer is 3×3, the step size is 1, and the image feature map of 64×64×128 is output as a bottom hidden space image of 64×64×3 。
4. The method for generating a hierarchical dual-flow diffusion model fundus map for labeling an optic disc atrophy arc according to claim 1, wherein the step 2.1.2 specifically comprises: the initial convolution layer convolution kernel has a size of 3×3, a step size of 1, and a feature map with a spatial dimension of 64×64×128 is extracted from a 64×64×3 image by a top layer encoder The residual module expands the channel number of the feature map from 64×64×128 to 64×64×256, and the downsampling layer transforms the spatial dimension of the feature map from 64×64×256 to 32×32×256; Using top layer encoders The middle layer module comprises two residual modules and a self-attention module, wherein the input characteristics are output by the downsampling module and the characteristic size of the input is not changed, the residual modules consist of two groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3 multiplied by 3, and the step length is 1; Finally, using top layer encoder The output layer module of (2) performs feature output, the convolution kernel size of the output layer is 3×3, the step size is 1, and the image feature map of 32×32×256 is output as a top layer hidden space image of 32×32×3 。
5. The method for generating a hierarchical dual-flow diffusion model fundus map for labeling an optic disc atrophy arc according to claim 1, wherein the step 2.2.1 specifically comprises: The initial convolution layer has a convolution kernel size of 3×3, a step size of 1, and features spatially discrete from the top layer of 32×32×3 Obtaining a top-level hidden space feature map of size 32×32×128 using a top-level decoder The middle layer module comprises two residual modules and a self-attention module, wherein the input characteristics are output by an initial convolution layer and the characteristic size of the input is not changed, the residual modules consist of two groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3 multiplied by 3, the step length is 1, and the attention module consists of the normalization layers and the convolution layers with the multi-layer convolution kernel size of 1 multiplied by 1, and the step length is 1; Through top layer decoder Up-sampling module of (a) performs up-sampling once, top-layer decoder The upsampling module of (2) comprises an upsampling layer and a residual network; the up-sampling layer convolution kernel is 3×3 in size and 2 in step size, and the spatial dimension of the feature map is transformed from 32×32×128 to 64×64×128; the residual network comprises three groups of residual modules connected in series, and the residual modules compress the channel number of the feature map and transform the channel number from 64×64×128 to 64×64×64; using top layer decoders The output layer module of the (2) performs characteristic output, the convolution kernel size of the output layer is 3 multiplied by 3, the step length is 1, will be 64×64×64 image feature map decoding bottom hidden space image of 64×64×3 。
6. The method for generating a hierarchical dual-flow diffusion model fundus map for labeling an optic disc atrophy arc according to claim 1, wherein the step 2.2.2 specifically comprises: The convolution kernel size of the initial convolution layer is 3 multiplied by 3, the step size is 1, and the bottom hidden space image of 64 multiplied by 3 Obtaining a bottom layer hidden space characteristic map with the size of 64 multiplied by 128 and using a bottom layer decoder The middle layer module comprises two residual modules and a self-attention module, wherein the input characteristics are output by an initial convolution layer and the characteristic size of the input is not changed, the residual modules consist of two groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3 multiplied by 3, the step length is 1, and the attention module consists of the normalization layers and the convolution layers with the multi-layer convolution kernel size of 1 multiplied by 1, and the step length is 1; through an underlying decoder Up-sampling module of (a) performs up-sampling once, and bottom layer decoder The upsampling module of (2) comprises an upsampling layer and a residual network; the up-sampling layer convolution kernel size is 3 x3, the step size is 2, transforming the spatial dimension of the feature map from 64×64×128 to 128×128×128; the space dimension of the feature map is changed from 64× 64×128 transforms into 128×128×128 。
7. The method for generating a hierarchical dual-flow diffusion model fundus map for labeling an optic disc atrophy arc according to claim 1, wherein the hierarchical vector quantization of step 2.6 is performed in a loss function of an countermeasure network: Vector quantization loss function Consists of the sum of the top-layer vector quantization loss and the bottom-layer vector quantization loss in the layered vector quantization coder, wherein the top-layer vector quantization loss is the top-layer vector quantization before and after the top-layer vector quantization And The bottom vector quantization loss is the mean square error of the bottom vector before and after the bottom vector quantization And Mean square error of (a); Pixel level The loss function is a reconstructed image that is transformed from the input image x of the encoder and the decoder output Pixel value average absolute error of (a) patch-based generation of a contrast loss function Is an input image x and a reconstructed image Binary cross entropy loss at patch level; perception loss function based on fundus domain adaptation By extracting features of an input fundus image and a reconstructed fundus image by using an image feature extractor of a pre-trained ophthalmic base model, a learned perceived fundus image block similarity metric value between image features is calculated to obtain: Wherein the method comprises the steps of And Respectively representing the characteristics extracted by the first layer of the neural network at the h and w positions of the input image and the reconstructed image through the pre-trained characteristics; 、 Spatial dimension height and width representing feature maps; representing a learnable weight vector; Representing element-by-element multiplication; Representing the L2 norm.
8. The method for generating the hierarchical double-flow diffusion model fundus image for labeling the optic disc atrophy arc according to claim 1, wherein the specific processing procedure of the top hidden space denoising network comprises the following steps: The method comprises the steps of firstly adopting a convolution kernel with the size of 3 multiplied by 3 and the step length of 1 to obtain a feature with the size of 32 multiplied by 160 from a noise-containing feature map with the size of 32 multiplied by 6, carrying out three times of downsampling by using a denoising downsampling module, wherein the denoising downsampling module comprises a denoising downsampling layer and two groups of self-attention residual error networks, the space dimension of the feature map is converted into 4 multiplied by 640 from the size of 32 multiplied by 160, the convolution kernel of the denoising downsampling layer is 3 multiplied by 3, the step length is 2, the self-attention residual error network of the denoising downsampling is composed of a denoising residual error module, jump connection and a self-attention module, the jump connection is realized by the convolution layer with the convolution kernel of 1 multiplied by 1, and the denoising self-attention module is composed of a normalization layer and a convolution layer with the size of 1 and the step length of 1; The method comprises the steps that a middle layer module is used for carrying out feature integration, the middle layer module consists of two denoising residual error modules and a denoising self-attention module, and input is output of a downsampling module and keeps the same output and input feature size; the method comprises the steps of performing three upsampling by using a denoising upsampling module, wherein the denoising upsampling module consists of a denoising upsampling layer and two groups of denoising upsampling self-attention residual error networks, transforming the spatial dimension of a feature map from 4×4×640 to 32×32×160, and outputting a decoded image by using a convolution layer with a convolution kernel size of 3×3 and a step size of 1, wherein the size is 32×32×3.
9. The method for generating the fundus image of the layered double-flow diffusion model for labeling the optic disc atrophy arc according to claim 1, wherein the specific processing procedure of the underlying hidden space denoising network comprises the following steps: the method comprises the steps of firstly adopting a convolution kernel with the size of 3 multiplied by 3 and the step length of 1 to obtain a feature with the size of 64 multiplied by 160 from a noisy feature map with the size of 64 multiplied by 6, carrying out three times of downsampling by using a denoising downsampling module, wherein the denoising downsampling module comprises a denoising downsampling layer and two groups of self-attention residual error networks, the space dimension of the feature map is converted into 8 multiplied by 640 from 64 multiplied by 160, the convolution kernel of the denoising downsampling layer is 3 multiplied by 3, the step length is 2, the self-attention residual error network of the denoising downsampling is composed of a denoising residual error module, jump connection and a self-attention module, the jump connection is realized by the convolution layer with the convolution kernel of 1 multiplied by 1, and the denoising self-attention module is composed of a normalization layer and a convolution layer with the multi-layer convolution kernel size of 1 and the step length of 1; The method comprises the steps that a middle layer module is used for carrying out feature integration, the middle layer module consists of two denoising residual error modules and a denoising self-attention module, and input is output of a downsampling module and keeps the same output and input feature size; The method comprises the steps of performing three upsampling by using a denoising upsampling module, wherein the denoising upsampling module consists of a denoising upsampling layer and two groups of denoising upsampling self-attention residual error networks, transforming the spatial dimension of a feature map from 8×8×640 to 64×64×160, and outputting a decoded image by using a convolution layer with a convolution kernel size of 3×3 and a step size of 1, wherein the size is 64×64×3.
10. The method for generating a fundus image of a layered double flow diffusion model for labeling an optic disc atrophy arc according to claim 1, wherein the top hidden space diffusion model and the bottom hidden space diffusion model have the same loss function, and the specific calculation formula is as follows: Where t represents the time stamp of the denoising step, A noisy feature map representing a certain time t, Indicating the actual noise at time t, Expressed by parameters of The noise at time t predicted by the denoising network (top hidden space diffusion model or bottom hidden space diffusion model), Represents a standard normal distribution with a mean of 0 and a variance of 1, The L2 norm is represented by the number, Representing mathematical expectations.

Description

Method for generating layered double-flow diffusion model fundus image marked with optic disc atrophy arc Technical Field The invention belongs to the technical field of medical artificial intelligent image processing, and particularly relates to a method for generating a layered double-flow diffusion model fundus image for labeling an optic disc atrophy arc. Background Myopia is a worldwide high refractive error, and the risk of blindness due to pathological myopia in patients with high myopia is significantly increased. The area of the optic disc atrophy arc (PPA) is generally related to the extent of myopia, with greater areas of the atrophy arc being more severe. The accurate and efficient detection of the change of optic disc atrophy arc in retina fundus images is an important basis for myopia diagnosis and prevention. If the problem of difficulty in marking the optic disc atrophy arc and privacy of patient data can be overcome by generating high-quality fundus image data with the optic disc atrophy arc marks, the change of the optic disc atrophy arc can be analyzed by using a myopia automatic detection and diagnosis technology based on deep learning, and a doctor is assisted to provide personalized myopia diagnosis and treatment advice for the patient, so that the myopia process is effectively prevented and slowed down, and the life quality of the patient is improved. The existing fundus image generation method has the capability of unsupervised learning and modeling complex data distribution for generating an countermeasure network (GAN), but has the problems of unstable training and mode collapse, and a diffusion model has better generation quality and better mode coverage capability. The hidden space diffusion model (LDM) introduces a variation self-encoder (VAE) to construct a perceptually equivalent hidden space, so that the requirement of computing resources is remarkably reduced, and the key semantic features of the synthesized image are reserved. The existing hidden space diffusion generation model still has obvious defects. When the hidden space diffusion generation model compresses (downsamples) an input image by using a variational self-encoder, excessive downsampling inevitably leads to loss of fine details and damage to generation quality, and insufficient downsampling limits the ability of the model to effectively characterize high-level perceptual features. And the variation tends to compress the high frequency information from the encoder itself, potentially producing sub-optimal sampled output. In the task of generating retinal fundus images, it is important to precisely preserve complex vascular structures and fine pathological features. The prior hidden space diffusion model architecture is difficult to meet the key requirement under the contradictory influence. In addition, the variation self-encoder model adopts the feature representation capability of the perception loss function optimization model, but the existing perception loss function adopts a feature extractor pre-trained on a natural image to extract image perception features for calculation, and the feature representation quality of the encoder is reduced due to unavoidable domain differences existing on fundus images. At present, no generation paradigm for generating fundus images based on optic disc atrophy arc marks exists, and if a technology for directly inputting the optic disc atrophy arc marks to generate fundus images is developed, the technology is helpful for promoting the segmentation and detection of the optic disc atrophy arc based on deep learning, and the technology has important significance for the clinical transformation of the technology for automatically detecting and diagnosing myopia. Therefore, how to optimize the fundus feature extraction capability of the variational self-encoder and improve the fundus image quality of the hidden space diffusion model generated fundus image with the optic disc atrophy arc mark has become a problem to be solved. Disclosure of Invention In view of the above, the present invention aims to provide a method for generating a layered double-flow diffusion model fundus image for labeling optic disc atrophy arcs. A method for generating a layered double-flow diffusion model fundus image for marking an optic disc atrophy arc comprises the following steps: Initializing a fundus image dataset, and constructing a training set and a testing set; step 2, constructing a hierarchical vector quantization generation countermeasure network as a variable self-encoder, specifically: Step 2.1 constructing a layered encoder, including an underlying encoder And top layer decoder; Step 2.1.1 bottom layer encoderEncoding an input image x to obtain a bottom hidden space image; Step 2.1.2 Top layer encoderLatent space image to bottom layerCoding to obtain top layer hidden space image; Step 2.2 constructing a layered decoder including a top layer decoderAnd bottom layer decoder; Step 2.2.1 T