CN-121095098-B - FFA image generation system, method, medium and device based on double-domain constraint Mamba diffusion model

CN121095098BCN 121095098 BCN121095098 BCN 121095098BCN-121095098-B

Abstract

The application provides an FFA image generation system, method, medium and device based on a double-domain constraint Mamba diffusion model, which integrate double space and frequency domain constraints and are used for generating an FFA image from a CFP image in a cross-mode manner. By a learnable wavelet feature extractor, blood vessel key frequency components are effectively extracted from fundus CFP images, enhancing the structural guidance for generating a representation of the blood vessel edges in FFA images. In the spatial domain, the local detail and the global semantics of the fundus focus area are effectively fused through the double pyramid feature extractor, and the consistency of lesion representation in the generated FFA image is improved. The dual-domain condition Mamba module is used as a denoising core, so that effective coordination of space and frequency information is realized, and the definition and accuracy of an image are improved. The application verifies on Hajeb, MPOS and clinical data sets that PSNR reaches 30.11 and 29.40, effectively generates high-quality FFA images, reserves vascular details and provides a new method for screening retina diseases.

Inventors

QIAN TIANWEI
LIU QING
ZHU HONGQING
ZHAO LINGYI
DONG JIADI
XU XUN

Assignees

上海市第一人民医院

Dates

Publication Date: 20260508
Application Date: 20250812

Claims (11)

1. An FFA image generation system based on a two-domain constrained Mamba diffusion model, comprising: The system comprises a VAE encoder, a double-pyramid spatial domain feature extractor and a learnable wavelet domain extractor, wherein the VAE encoder, the double-pyramid spatial domain feature extractor and the learnable wavelet domain extractor are positioned in a forward diffusion stage, CFP images are input into the pre-trained VAE encoder to be encoded into potential representations, gaussian noise is gradually injected into the CFP encoder, the potential representations of the injected noise are respectively input into the double-pyramid spatial domain feature extractor and the learnable wavelet domain extractor to correspondingly extract spatial domain features used for representing spatial information of a focus area and frequency domain features used for representing frequency domain contours of a vascular structure, the learnable wavelet domain extractor adopts a set of learnable Haar wavelet filters as an initialization scheme of learnable discrete wavelet transformation, respectively corresponds to low-pass and high-pass filtering operations and parameterizes the two-pass filtering operations into variables capable of reversely propagating, so that the two-pass filtering operations can be adaptively adjusted in a training process to match frequency domain distribution of an input image, and the Haar wavelet filters carry out convolution operation in four directions on the input CFP images to obtain low-frequency and high-frequency sub-bands; The input end of the double-domain condition constraint Mamba module and the VAE decoder in the denoising stage comprises an input sequence composed of initial noise, spatial domain characteristics and frequency domain characteristics, the input sequence is input into a Mamba module guided by double conditions to be denoised under the guidance of combined modeling of the spatial domain and the frequency domain, and the denoised potential representation is decoded and reconstructed by the VAE decoder to generate an FFA image.
2. The FFA image generation system based on the two-domain constrained Mamba diffusion model of claim 1, wherein the learnable wavelet domain extractor combines a learnable discrete wavelet transform with a subband energy balance mechanism and Mamba sequence modeling module.
3. The FFA image generation system based on the two-domain constrained Mamba diffusion model of claim 2, wherein the learnable wavelet domain extractor calculates first-order spatial gradient differences in horizontal and vertical directions of the reconstructed high-frequency sub-bands and the original high-frequency sub-bands based on a gradient-preserving mechanism to construct a gradient-preserving loss, and reconstructs all band information layer by a learnable inverse wavelet transform module to restore it to the same resolution as the original image to perform a reversible transform from the frequency domain to the spatial domain.
4. The FFA image generation system based on the two-domain constraint Mamba diffusion model of claim 1, wherein the two-pyramid spatial domain feature extractor comprises a convolution pyramid branch and a pooling pyramid branch; in the convolution pyramid branches, adopting a multi-scale convolution kernel to capture semantic features under different receptive fields, then carrying out channel compression, dividing a feature map into patch sequences, introducing position codes, and then sending the patch sequences into a Mamba sequence modeling module for sequence modeling; in the pooling pyramid branches, multi-scale self-adaptive pooling is adopted to carry out multi-scale aggregation of context information, and channel compression and Mamba sequence modeling modules are combined to integrate global semantic information; And fusing the local modeling of the convolution pyramid branches and the global context perception modeling of the pooled pyramid branches to obtain a condition priori of the double pyramid spatial domain extraction.
5. The FFA image generation system based on the two-domain constrained Mamba diffusion model of claim 4, wherein the Mamba sequence modeling module employs a row-column scanning mechanism for image scanning.
6. The FFA image generation system of claim 1 wherein the input of the two-domain constraint Mamba diffusion model comprises modeling an input noise image as And respectively fusing and embedding the two types of condition features to obtain a patch sequence, wherein the first type of condition features are frequency domain features extracted by the learnable wavelet domain extractor, and the second type of condition features are spatial domain features extracted by the double-pyramid spatial domain feature extractor.
7. The FFA image generation system based on the two-domain constraint Mamba diffusion model of claim 6 wherein the patch sequence is fed into a dual-condition attention module to model context relationships in a spatial domain and a frequency domain respectively and independently construct queries, keys and values, the context features are input into a Mamba sequence modeling module, the operation results are subjected to residual connection and normalization training through bidirectional state scanning and continuous activation operation, and the image structure is restored through Unpatchify operation, so that potential representation for finally generating FFA images is obtained.
8. The FFA image generation system based on the two-domain constrained Mamba diffusion model of claim 1, The double-pyramid spatial domain feature extractor is trained by adopting a InfoNCE loss function, the learnable wavelet domain extractor is trained based on a total loss function formed by frequency domain contrast loss, sub-band energy conservation loss and sub-band gradient total loss, and the pre-training parameters of the double-pyramid spatial domain feature extractor and the learnable wavelet domain extractor are loaded and the structure of the pre-training parameters is frozen in the training process of the double-pyramid spatial domain feature extractor and the learnable wavelet domain feature extractor, wherein the loss function is formed by the spatial domain InfoNCE loss, the frequency domain InfoNCE loss and the learnable wavelet domain extractor total loss.
9. The FFA image generation method based on the two-domain constraint Mamba diffusion model is applied to the FFA image generation system based on the two-domain constraint Mamba diffusion model according to any one of claims 1 to 8, wherein the system comprises a VAE encoder, a double pyramid spatial domain feature extractor and a learnable wavelet domain extractor in a forward diffusion stage, a two-domain condition constraint Mamba module and a VAE decoder in a denoising stage, and the method comprises: The method comprises the steps of encoding a CFP image into a potential representation and gradually injecting Gaussian noise in a forward diffusion stage, extracting features of the potential representation of the injected noise to correspondingly extract spatial domain features used for representing spatial information of a focus area and frequency domain features used for representing frequency domain contours of a vascular structure, inputting the potential representation of the injected noise into a double-pyramid spatial domain feature extractor and a learnable wavelet domain extractor respectively to correspondingly extract spatial domain features used for representing spatial information of the focus area and frequency domain features used for representing frequency domain contours of the vascular structure, wherein the learnable wavelet domain extractor adopts a set of learnable Haar wavelet filters as an initialization scheme of learnable discrete wavelet transform, respectively corresponds to low-pass and high-pass filtering operations and parameterizes the low-pass and high-pass filtering operations into variables which can be reversely propagated, so that the low-pass and high-pass filtering operations can be adaptively adjusted in a training process to match the frequency domain distributions of the input image; in the denoising stage, an input sequence formed by the initial noise, the spatial domain features and the frequency domain features is denoised under the guidance of the combined modeling of the spatial domain and the frequency domain, and the denoised potential representation is decoded and reconstructed by the VAE decoder to generate an FFA image.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the FFA image generation method based on the two-domain constrained Mamba diffusion model of claim 9.
11. A computer apparatus comprising a memory, a processor, and a computer program stored on the memory, wherein the processor executes the computer program to implement the FFA image generation method of claim 9 based on the two-domain constrained Mamba diffusion model.

Description

FFA image generation system, method, medium and device based on double-domain constraint Mamba diffusion model Technical Field The application relates to the technical field of medical image processing, in particular to an FFA image generation system, an FFA image generation method, an FFA image generation medium and an FFA image generation device based on a double-domain constraint Mamba diffusion model. Background In the ophthalmic clinical diagnosis and treatment process, the retina imaging technology plays a vital and irreplaceable role, and has important clinical application value in the aspects of early screening, disease assessment, treatment effect monitoring and the like of retina related diseases. Currently, color fundus photography (Color Fundus Photography, CFP) and fundus fluorescein angiography (Fundus Fluorescein Angiography, FFA) are the two most common and widely used retinal imaging means. The two have obvious complementarity in imaging principle and clinical function, and together construct an important basis of the current retina image diagnosis. The CFP is an imaging technology for acquiring fundus color images based on visible light illumination, does not need to inject contrast agent, has the advantages of no wound, rapidness, economy, simplicity and convenience in operation and the like, and can clearly show structural characteristics of fundus, such as optic disc, retinal vascular distribution, macular region morphology and the like. The CFP is mainly used for preliminary screening of fundus diseases and structural lesion observation, and has wide application prospects in clinical scenes such as hypertensive retinopathy, myopic lesions, optic nerve diseases and the like. However, CFP can only provide static anatomical information, cannot reflect vascular permeability, microcirculation functional status and hemodynamic changes, and has certain limitations in its diagnostic and evaluation capabilities in complex fundus lesions (e.g., diabetic retinopathy, age-related macular degeneration, etc.) involving dynamic vascular changes or functional abnormalities. In contrast, FFA is a functional imaging means based on angiography principles. The method collects fundus fluorescence signals under the excitation light of specific wavelength by intravenous injection of fluorescein sodium as a contrast agent, thereby recording the distribution and flow process of fluorescein in the retinal vascular system in real time. The FFA can intuitively reflect vascular permeability change, accurately identify pathological characteristics such as microangioma, neovascular, capillary non-perfusion area, vascular leakage and the like, and has irreplaceable clinical value in diagnosis, stage, prognosis evaluation and treatment effect judgment of retinal vascular diseases (especially diabetic retinopathy, venous obstructive lesions, choroidal neovascular and the like). However, FFA as an invasive imaging means, which relies on injecting exogenous contrast agents, may cause adverse reactions including nausea, vomiting, allergic reactions, etc., and serious patients may even lead to anaphylactic shock, respiratory failure, and there is a certain clinical risk. Therefore, how to obtain vascular function information equivalent to FFA in a noninvasive and low-risk manner without using sodium fluorescein is one of the hot problems in current research. In recent years, with the continuous development of deep learning technology, the cross-mode medical image generation method has a wide application prospect in the fields of auxiliary diagnosis, image enhancement and the like, and particularly provides a brand-new technical path in the aspect of realizing noninvasive acquisition of FFA images. The method constructs a cross-mode generation model, uses the existing image data such as CFP and the like as input, and generates an image which is highly consistent with a target mode (such as FFA) in structural and functional characteristics, thereby realizing retinal vascular imaging without injecting contrast agents. The non-invasive imaging means not only can effectively reduce the examination risk and cost, but also can provide important support for large-scale screening and telemedicine, and becomes an important direction of current ophthalmic artificial intelligence research. In the prior researches, students have proposed various CFP-to-FFA image generation methods based on deep learning, which mainly can be divided into three types, namely, a first type of method is used for carrying out feature extraction and reconstruction on CFP images by designing a special image feature coding-decoding Network so as to attempt to establish an explicit mapping relation between CFP and FFA modes, thereby generating an output result which is structurally similar to a real FFA image, a second type of method is used for learning a complex nonlinear mapping relation between CFP and FFA by generating a Network (GENERATIVE ADVERSARIAL Net