CN-121810685-B - Glaucoma grading system based on multi-mode conditional state space fusion

CN121810685BCN 121810685 BCN121810685 BCN 121810685BCN-121810685-B

Abstract

The invention relates to the technical field of glaucoma grading, and discloses a glaucoma grading system based on multi-mode conditional state space fusion, which comprises the steps of acquiring CFP images and OCT images, and constructing a glaucoma grading network model, wherein the model comprises a circular structure perception attention module, a layer priori enhancement axial attention module and a conditional state space fusion module; the circular structure perception attention module enhances the characterization capability of a circular key pathological change region in the CFP image, the layer priori enhancement axial attention module enhances the utilization of key pathological change information in the OCT image, and the conditional state space fusion module dynamically modulates CFP space characteristics through OCT global information to realize cross-modal characteristic interaction, and performs glaucoma grading by using a trained network model. The invention can effectively fuse multi-mode information, enhance the feature extraction capability of related pathological features of glaucoma and improve the grading performance of glaucoma.

Inventors

ZHU WEIFANG
SUN YITING

Assignees

苏州大学

Dates

Publication Date: 20260512
Application Date: 20260309

Claims (10)

1. A glaucoma classification system based on multi-modal conditional state space fusion, comprising: The data acquisition module is used for acquiring CFP images and OCT images; The glaucoma grading network model construction module is used for constructing a glaucoma grading network model based on multi-mode conditional state space fusion, and comprises a circular structure perception attention module, a layer priori enhancement axial attention module and a conditional state space fusion module, wherein the circular structure perception attention module is used for enhancing the characterization capability of a circular key pathological change region in a CFP image through a multi-radius soft circular convolution kernel, and the layer priori enhancement axial attention module is used for enhancing the capture capability of long-range dependency in the vertical direction and the utilization of key pathological change information in the OCT image; The model training module is used for training the glaucoma grading network model based on the multi-mode conditional state space fusion; And the glaucoma grading module is used for inputting the CFP image and the OCT image to be graded into a trained glaucoma grading network model based on multi-mode conditional state space fusion to obtain a glaucoma grading result.
2. The glaucoma classification system based on multi-modal conditioned state space fusion according to claim 1, wherein the circular structure perception attention module enhances the characterization capability of circular critical pathological change areas in CFP images by multi-radius soft circular convolution kernels, comprising: The characteristic diagram after the channel enhancement adopts a multi-radius soft circular space attention mechanism, captures multi-scale information by constructing three parallel circular attention branches to obtain space attention weight, thereby enhancing the characterization capability of key circular areas such as optic discs, optic cups and the like in CFP images; and multiplying the spatial attention weight with the characteristic diagram after the channel enhancement element by element to obtain an output characteristic diagram of the circular structure perception attention module.
3. The glaucoma classification system based on multi-modal conditioning state space fusion of claim 2 wherein the channel enhanced feature map captures multi-scale information by constructing three parallel circular attention branches using a multi-radius soft circular spatial attention mechanism to obtain spatial attention weights, comprising: carrying out maximum pooling and average pooling on the feature images after channel enhancement in the channel dimension to obtain two spatial feature images, and splicing the two spatial feature images in the channel dimension to obtain a two-channel feature image; constructing a plurality of parallel soft circular attention branches, wherein each soft circular attention branch captures a circular structure with different scales by using a soft circular convolution kernel, and the learnable parameters of each soft circular attention branch comprise a radius and a boundary smoothing coefficient, wherein the radius is a radius parameter of a soft circular mask and is used for controlling the coverage range of the circular attention, and the boundary smoothing coefficient controls the attenuation rate of a circular boundary from the center to the periphery; For each soft circular attention branch, calculating the distance from each position to the center by taking the center of the convolution kernel as the origin and generating a soft circular mask of each position; Performing convolution operation on the two-channel feature map by using the effective weight of each soft circular attention branch to obtain the circular attention weight corresponding to each soft circular attention branch; and splicing the circular attention weights corresponding to all the soft circular attention branches in the channel dimension, then fusing the circular attention weights through convolution, and generating the spatial attention weights through an activation function.
4. The glaucoma grading system based on multi-modal conditioned state space fusion of claim 1 wherein the layer prior enhanced axial attention module enhances utilization of critical pathological change information in OCT images by enhancing capture capability of long range dependencies in the vertical direction, comprising: the layer priori enhanced axial attention module comprises a vertical branch and a horizontal branch, and the input of the layer priori enhanced axial attention module is recorded as For a pair of Carrying out one-dimensional axial attention calculation along the vertical direction to obtain output characteristics of branches in the vertical direction; And carrying out one-dimensional axial attention on the output characteristics of the branches in the vertical direction along the horizontal direction to obtain an output characteristic diagram of the layer priori enhanced axial attention module.
5. The glaucoma classification system based on multi-modal conditioned state space fusion according to claim 4, wherein the pair of Performing one-dimensional axial attention calculation along the vertical direction to obtain output characteristics of branches in the vertical direction, wherein the calculation comprises the following steps: For a pair of Performing dimension replacement and remodelling operation, and marking the sequence characteristic diagram obtained by conversion as ; Using a multi-head attention mechanism Mapping into a query vector, a key vector and a value vector, and calculating content similarity scores by combining the query vector, the key vector and the value vector; Adopting a direct index offset table scheme to perform relative position coding to obtain relative position offset; Convolution collation using vertical stripes Performing convolution, wherein the convolution result is remodelled into a sequence format after the nonlinearity is enhanced by an activation function, and the layer priori features are obtained and recorded as , For the layer a priori features corresponding to the ith attention header, , Is the number of attention heads; applying a normalization operation along the sequence dimension to obtain a normalized layer prior sequence, which is recorded as , Normalized layer prior sequence corresponding to the ith attention header Expanded to bias tensors matching the content similarity score dimension, noted as , A bias tensor corresponding to the ith attention head; adding the relative position bias and the bias tensor to the content similarity score to obtain bias attention weight in the vertical direction; Multiplying the offset attention weight in the vertical direction by the value vector to obtain the output of each attention head; And (3) splicing the outputs of all the attention heads in the channel dimension, then fusing, and performing dimension remodeling on the fused result to obtain the output characteristics of the branches in the vertical direction.
6. The glaucoma classification system based on multi-modal conditioned state space fusion according to claim 5, wherein the method for calculating the bias attention weight in the vertical direction is: , In the formula, For the vertically offset attention weight corresponding to the ith attention head, As a function of Softmax (r), For the content similarity score of the ith attention header, For the relative position offset of the ith attention head, lambda is an adjustable weight coefficient, The bias tensor corresponding to the ith attention head.
7. The glaucoma grading system based on multi-modal conditioning state space fusion according to claim 1, wherein the conditioning state space fusion module models long-distance dependency of a spatial sequence through a selective scanning mechanism, combines a mechanism for dynamically modulating CFP spatial features by OCT global information, realizes fine-grained and self-adaptive cross-modal feature interaction, and comprises: generating modulation characteristics through a multi-layer perceptron and a broadcasting mechanism for the output characteristic diagram of the layer priori enhanced axial attention module; generating a state transition matrix and a time step respectively for the output feature map of the layer priori enhanced axial attention module through two parallel and parameter independent multi-layer perceptrons; Adopting a selective scanning mechanism, carrying out one-dimensional state space modeling on the modulation characteristics along two directions of a row and a column respectively according to a state transition matrix and a time step, and realizing the dynamic interaction of the cross-mode characteristics through a state space conditioning mechanism to obtain row scanning characteristics and column scanning characteristics; And fusing the row scanning characteristics and the column scanning characteristics to obtain the multi-mode fusion characteristics.
8. The glaucoma classification system based on multi-modal conditioned state space fusion according to claim 7, wherein the output feature map of the layer a priori enhanced axial attention module generates modulation features by a multi-layer perceptron and broadcasting mechanism, specifically: Generating FiLM parameters for the output feature map of the layer prior enhanced axial attention module through a multi-layer perceptron, performing dimensional remodelling on FiLM parameters to obtain scaling parameters and offset parameters, and modulating the output feature map of the circular structure perception attention module through a broadcasting mechanism to obtain modulation features as follows: , In the formula, For modulation characteristics, as would be indicated by element-wise multiplication, An output feature map of the awareness module for the circular structure, In order to scale the parameters of the device, Is an offset parameter.
9. The multi-modal conditioned state space fusion based glaucoma classification system of claim 7, wherein: the calculation method of the state transition matrix comprises the steps of performing MLP (multi-level-to-zero) transformation on the output feature diagram of the layer priori enhanced axial attention module to generate a state increment adjustment value, performing dimension remodeling on the state increment adjustment value to obtain a state increment adjustment value after dimension remodeling, and marking the state increment adjustment value as Introducing a learnable initial bias parameter and performing dimension remodeling to obtain a bias parameter after dimension remodeling, and recording as Will be And (3) with Adding to obtain a state transition matrix through an exponential function; The time step calculation method comprises the steps of performing MLP (multi-level programming) transformation on the output feature diagram of the layer priori enhanced axial attention module, generating a step increment adjustment value, wherein MLP transformation when generating the step increment adjustment value and MLP internal parameters of MLP transformation when generating the state increment adjustment value are not shared To (3) pair And multiplying the activated application function by a step-length scaling coefficient to obtain the time step.
10. The glaucoma classification system based on multi-modal conditioned state space fusion of claim 7, wherein when the one-dimensional state space modeling is performed on the modulation features along two directions of rows and columns, the modulation features are scanned in rows to obtain a row scanning result, which is specifically: remolding the modulation features according to rows into a sequence format to obtain row sequences which are recorded as And carrying out selective scanning after layer normalization is applied to each sequence in the row sequence, wherein the row scanning result is as follows: , In the formula, The result of the line scan is indicated, The representation layer is normalized and, A state transition matrix is represented and is used to represent, The time step is represented by a time step, Representing a selective scan; the state update equation for the selective scan is: , In the formula, The line scan result at time t is indicated, A state vector representing the time t is represented, In order for the parameters to be able to be learned, Representing the input at time t, the correspondence The t-th element of the (c) is, The calculation method of (1) is as follows: , In the formula, Representing a natural exponential function of the sign, Representing the element-by-element product.

Description

Glaucoma grading system based on multi-mode conditional state space fusion Technical Field The invention relates to the technical field of glaucoma grading, in particular to a glaucoma grading system based on multi-mode conditional state space fusion. Background Glaucoma (Glaucoma) is a chronic neurodegenerative disease, one of the major causes of irreversible but preventable blindness worldwide. Fundus color illumination (Color Fundus Photography, CFP) and optical coherence tomography (Optical Coherence Tomography, OCT) are imaging techniques widely used in clinical examinations of glaucoma. CFP (computational fluid dynamics) is a basic means for preliminary screening of glaucoma by noninvasively shooting high-resolution fundus images and assisting doctors in observing macroscopic structural changes such as optic nerve disc-cup ratio and the like. OCT not only quantifies the detection of Retinal Nerve Fiber Layer (RNFL) thickness, but also allows for accurate acquisition of the microstructural changes in the macular area (the most dense areas of ganglion cell bodies), a sensitive biomarker that has been demonstrated to be earlier in glaucoma. The CFP image and the OCT image are combined to carry out clinical glaucoma classification, so that the morphological characteristics of an optic nerve disc can be covered, and early lesion signals of a macular region can be captured, so that the method is a key technical combination of glaucoma early screening, disease classification and progress monitoring. The clinician can determine the stage and progression of glaucoma by observing changes in the cup-to-disc ratio of the optic disc in CFP, as well as changes in RNFL thickness, macular area structure in OCT, etc. However, the clinical glaucoma grading method still depends on subjective interpretation of doctors, and has the problems of errors and uncertainty and low information fusion efficiency of multi-mode data. In recent years, deep learning (DEEP LEARNING, DL) based methods are widely used in the field of medical image processing and analysis, and Selective State Diffusion (SSD) modules in convolutional neural networks (Convolutional Neural Network, CNN), convertors and Mamba architectures have shown great application potential in various medical image processing and analysis tasks. With the development of the technologies, the accuracy and efficiency of medical image processing and analysis are remarkably improved, CNN is good at capturing local characteristics, a transducer can model long-distance dependence, and Mamba SSD modules can more efficiently screen key information during processing of high-dimensional medical images through a selective state diffusion mechanism, so that the calculation cost and the characteristic characterization capability are balanced. The advantages of the three are complementary in terms of improving the precision and the speed of tasks such as image processing, classification, segmentation and the like, and the medical image analysis is promoted to develop to an automation and intelligent direction. In the prior art, the CNN-based model has better realized automatic segmentation of the optic disc region in the glaucoma CFP image, and the OCT three-dimensional data can be extracted by a method based on a transducer architecture to assist in evaluating the damage degree of the optic nerve. In addition, some deep learning models enable screening or grading of glaucoma with single modality data (CFP only or OCT only), and some deep learning methods enable grading of glaucoma based on CFP and OCT multiple modalities. Despite significant advances in the medical field, the use of deep learning techniques in automatic grading of glaucoma has the following significant drawbacks: 1. there is a problem of insufficient multi-modal fusion. On the one hand, most of the existing methods are based on single-mode medical images, such as CFP or OCT alone, and cannot fully utilize complementary information of multi-mode medical images. On the other hand, although the existing method tries to fuse multi-mode information such as CFP and OCT, the existing fusion method mostly adopts shallow fusion strategies such as simple feature stitching or weighted average, and the like, so that semantic association and spatial correspondence between different modes cannot be fully considered, and conditional dependency between a spatial feature map of CFP and global semantic features of OCT cannot be effectively modeled. In addition, the existing fusion method lacks modeling capability for dynamic interaction among modes, fusion weight is difficult to adjust in a self-adaptive mode according to input samples, so that the fusion effect is not ideal, the synergistic advantage of multi-mode data cannot be fully exerted, and comprehensive understanding of the complex pathological characteristics of glaucoma by the model is limited. 2. There are deficiencies in the extraction of glaucoma-associated pathological fe