CN-121982766-A - Ophthalmic disease image recognition method based on neural network model

CN121982766ACN 121982766 ACN121982766 ACN 121982766ACN-121982766-A

Abstract

The invention relates to the field of ophthalmic disease image recognition and discloses an ophthalmic disease image recognition method based on a neural network model, which comprises the steps of collecting fundus photographic images and OCT images, carrying out size standardization processing, outputting a high signal-to-noise ratio preprocessing image through a convolution attention module and a multi-scale noise suppression neural network, constructing a double-attention fusion framework, dynamically fusing characteristics through a cross attention mechanism, optimizing rare disease recognition performance of a small sample by combining element learning training strategies, dynamically adjusting a classification threshold through Bayesian neural network probability modeling, outputting a preliminary recognition result, completing semantic mapping based on an ophthalmic clinical diagnosis term library, and generating a structured report containing focus coordinates, lesion classification and differential diagnosis basis.

Inventors

QI JIA
CHEN ZHANG
ZHANG YUE
YANG MIN
LIU SIWEI

Assignees

十堰市太和医院（湖北医药学院附属医院）

Dates

Publication Date: 20260505
Application Date: 20260130

Claims (10)

1. An ophthalmic disease image recognition method based on a neural network model, the method comprising: S1, acquiring fundus photographic images and OCT images, and performing size standardization processing on the fundus photographic images and the OCT images to obtain standard specification ophthalmic images; S2, inputting the standard ophthalmic image into a multi-scale noise suppression neural network, filtering illumination artifact and vascular texture interference through a cascade structure of a convolution attention module and a wavelet decomposition network by the multi-scale noise suppression neural network, and outputting a preprocessed image with a signal-to-noise ratio not lower than 40 dB; S3, constructing a dual-attention fusion neural network architecture, wherein the dual-attention fusion neural network architecture builds a global feature extraction branch based on a lightweight Vision Transformer and builds a local focus detection branch based on a depth separable CNN; s4, inputting the preprocessed image into the dual-attention fusion neural network architecture, and respectively extracting global pathology associated features and local focus detail features; S5, dynamically fusing the global pathology associated features and the local focus detail features through a cross attention mechanism, and introducing a meta-learning training strategy to optimize a dynamic fusion result so as to obtain a reinforced fusion feature vector; s6, inputting the reinforced fusion feature vector into a Bayesian neural network classification module, adjusting a disease classification threshold in real time through probability modeling of the Bayesian neural network classification module, and outputting a preliminary recognition result; And S7, carrying out semantic mapping on the primary identification result and an ophthalmic clinical diagnosis term library to generate a structured report containing focus coordinates, lesion degree grading and differential diagnosis basis, and completing the identification of the ophthalmic diseases.
2. The method for identifying an ophthalmic disease image based on a neural network model according to claim 1, wherein: the size normalization processing adopts a bilinear interpolation algorithm, the bilinear interpolation algorithm determines a pixel value by calculating a weighted average value of four adjacent pixels around a target pixel, and the weight calculation formula is as follows: w=1/(dx×dy); wherein w is the weight value of the adjacent pixel, dx is the horizontal distance between the target pixel and the adjacent pixel, and dy is the vertical distance; And uniformly adjusting the fundus photographing image and the OCT image to 512 multiplied by 512 pixel specifications, wherein the pixel gray value of the standard specification ophthalmic image is normalized by min-max, and the gray value range after normalization is strictly controlled to be 0-255.
3. The method for identifying an ophthalmic disease image based on a neural network model according to claim 1, wherein: the convolution attention module of the multi-scale noise suppression neural network comprises a channel attention sub-module and a space attention sub-module; The channel attention sub-module carries out global average pooling and global maximum pooling on an input feature map to obtain two 1 multiplied by C dimension vectors, and outputs channel weight through a 2-layer full-connection layer and a Sigmoid activation function after splicing, wherein the dimension of a hidden layer is C/8, and C is the channel number of the feature map; The space attention sub-module carries out channel dimension average pooling and maximum pooling on the feature images to obtain two H multiplied by W multiplied by 1 dimension feature images, and space weights are output through a 3 multiplied by 3 convolution layer and a Sigmoid activation function after the feature images are spliced, wherein H is the height of the feature images, W is the width of the feature images, and the number of output channels of the convolution layer is 1; the wavelet decomposition network adopts 3 layers of two-dimensional discrete wavelet transformation, and obtains three high-frequency detail components of a low-frequency approximate component, a horizontal component, a vertical component and a diagonal component after decomposition, wherein the high-frequency detail components are concentrated carriers of image edges, textures and noise and comprise focus detail and high-frequency noise component information; And adopting hard threshold suppression to high-frequency noise components in the high-frequency detail components, wherein the hard threshold calculation formula is as follows: σ=median(|x-median(x)|)/0.6745; where σ is the noise standard deviation and x is the pixel value of the high frequency detail component.
4. The method for identifying an ophthalmic disease image based on a neural network model according to claim 1, wherein: The input embedding dimension of the lightweight Vision Transformer is 768, the standard specification ophthalmic image is divided into 16X 16 image blocks by adopting an image blocking strategy with the size of 32X 32 pixels, 257 input token are obtained after 1 class token is added, and spatial position information is added by adopting sine position coding; The number of the encoder layers is 12, each layer of encoder comprises a multi-head attention mechanism and a feedforward neural network, the number of attention heads of the multi-head attention mechanism is 8, the output dimension of each attention head is 96, the feedforward neural network comprises 2 layers of linear transformation, the dimension of a hidden layer is 3072, and an activation function adopts GELU; The depth separable CNN comprises 4 feature extraction blocks, each feature extraction block is formed by a depth convolution layer, a batch normalization layer, a ReLU activation function and a point-by-point convolution layer in a cascade mode, the convolution kernel of the depth convolution layer is 3 multiplied by 3, the step distance is 1, the filling mode is same, the number of output channels is consistent with the number of input channels, the convolution kernel of the point-by-point convolution layer is 1 multiplied by 1, the step distance is 1, and the number of output channels is 2 times the number of input channels.
5. The method for identifying an ophthalmic disease image based on a neural network model of claim 4, wherein: The global pathology associated features are obtained by flattening 257×768 dimensional feature graphs output by the lightweight Vision Transformer encoder, specifically, removing class token of feature graphs, reserving 768 dimensional feature vectors corresponding to 256 space token, and inputting the feature vectors into a linear mapping layer to output global features after splicing; the local focus detail features are output through a last global average pooling layer of the depth separable CNN, the global average pooling layer pools a 64 multiplied by 256 dimension feature map output by a last convolution layer to obtain 256 dimension feature vectors, the input linear mapping layer outputs local features, and the dimensions of the global pathology associated features and the local focus detail features are unified to 1024 dimensions.
6. The method for identifying an ophthalmic disease image based on a neural network model according to claim 1, wherein: the cross attention mechanism obtains a fusion weight by calculating cosine similarity of the global pathology associated feature and the local focus detail feature; The cosine similarity is mapped to a range of 0-1 through a normalization formula, and the normalization formula is as follows: sim_norm=(sim+1)/2; wherein sim_norm is the normalized similarity value, sim is the original cosine similarity; the dynamic fusion process adopts a weighted summation mode, and the formula is as follows: F=α×A+(1-α)×B; Wherein F is a fusion feature vector, A is a global pathology associated feature vector, B is a local focus detail feature vector, alpha is a global feature weight coefficient, and the calculation formula of alpha is alpha=sim_norm×0.8+0.1, and the value range of alpha is 0.1-0.9; The meta-learning training strategy adopts a model-independent meta-learning algorithm, the training process is divided into internal circulation and external circulation, the internal circulation updates model parameters through 1-step gradient descent on each small sample task, the learning rate is 0.01, the external circulation updates meta-parameters based on the internal circulation loss of 100 small sample tasks, and the learning rate is 0.001; and (3) performing migration optimization on the characteristics of the rare ophthalmic diseases of the 23 types of small samples in a 5-shot training mode, wherein 5 training samples are used for each disease type, the iteration number of the training process is 500, and a cross entropy loss function is adopted as a training loss function.
7. The method for identifying an ophthalmic disease image based on a neural network model according to claim 1, wherein: The probability modeling of the Bayesian neural network classification module adopts a variation inference algorithm, approximates the true posterior distribution p (theta|D) by introducing the approximate posterior distribution q (theta|phi), wherein theta is a model parameter, phi is a parameter of the approximate posterior distribution, D is a training data set, and an optimization target is an evidence lower bound function: L(φ)=-KL(q(θ|φ)||p(θ))+E_q(θ|φ)[logp(D|θ)]; the adjustment range of the disease classification threshold is 0.5-0.8, the common disease threshold is set to 0.7-0.8, the rare disease threshold is set to 0.5-0.6, and the adjustment logic is used for determining the optimal threshold according to the maximization of the F1 score of the verification set divided by the training data set; the F1 fraction is the harmonic mean of the precision rate P and the recall rate R, and the calculation formula is as follows: F1=2×P×R/(P+R); wherein P is the proportion of the number of positive samples predicted correctly by the model to the number of positive samples predicted positively, and R is the proportion of the number of positive samples predicted correctly by the model to the number of true positive samples; the preliminary recognition result comprises a disease category label and a corresponding confidence value, wherein the confidence value is obtained by expected calculation of model prediction probability p (y|x, theta) under approximate posterior distribution q (theta|phi), namely: conf=E_q(θ|φ)[p(y|x,θ)]; Wherein conf is a confidence value, E_q (theta|phi) is a desired operator based on an approximate posterior distribution q (theta|phi), p (y|x, theta) is a prediction probability of an output disease class y given an input x and a model parameter theta, x is the preprocessed image, y is an ophthalmic disease class label, theta is a model parameter, phi is a parameter of the approximate posterior distribution, and the precision is reserved to three posterior decimal places.
8. The method for identifying an ophthalmic disease image based on a neural network model of claim 7, wherein: The training data set comprises 21 ten thousand clinical samples, and is divided into a training set, a verification set and a test set according to the proportion of 8:1:1; Model training adopts an Adam optimizer, the batch size is set to be 32, the weight attenuation coefficient is 0.0001, the training process adopts an early-stop strategy, and training is stopped when the accuracy of the verification set is continuously 10 rounds without improvement.
9. The method for identifying an ophthalmic disease image based on a neural network model of claim 8, wherein: The ophthalmic clinical diagnosis term library is stored in a JSON format and comprises four fields of disease numbers, standard diagnosis terms, pathological feature descriptions and differential diagnosis points, and standard diagnosis terms of 39 common ophthalmic diseases and 23 rare ophthalmic diseases are recorded together; The term library supports online updating, the updating period is not more than 30 days, the updating acquires an incremental data packet from a medical term server through an HTTPS protocol, verification is completed through comparison with an ophthalmologist knowledge base after updating, and updating contents comprise newly added disease terms, correcting existing term description and supplementing pathological feature associated information; The semantic mapping process is realized through an improved edit distance character string matching algorithm, the improved algorithm optimizes matching precision by introducing character type weights and position weights, calculates weighted edit distances between a model output result and diagnostic terms in a term library, and selects terms with minimum weighted edit distances and less than a preset threshold value 3 as a matching result, wherein letter weights 1, digital weights 0.5 and symbol weights 0.3.
10. The method for identifying an ophthalmic disease image based on a neural network model of claim 9, wherein: the structured report comprises image acquisition time, preprocessing parameters, model version information, focus coordinates, lesion degree grading and differential diagnosis basis; The focus coordinates are marked by adopting double coordinates of pixel coordinates and clinical anatomical coordinates, and the formula for converting the pixel coordinates (x, y) into the clinical anatomical coordinates (mm) is as follows: X=(x-256)×0.015; Y=(256-y)×0.015; Wherein X and Y are the abscissa and ordinate of the pixel coordinate, X is the horizontal component of the clinical anatomical coordinate, and Y is the vertical component of the clinical anatomical coordinate; The lesion degree is classified into three grades of mild, moderate and severe, the lesion degree is judged according to the proportion of the focal area to the area of an effective area of a fundus image, the effective area is an anatomical area containing retina and cornea after removing black edges of the image, and the lesion degree is defined by a tissue segmentation result of a multi-scale noise suppression neural network in S2, specifically, the lesion degree is less than 5% by weight, the moderate degree is 5% -15% by weight, and the severe degree is more than 15% by weight; the differential diagnosis basis comprises characteristic difference description of similar diseases, difference weights are obtained by calculating Euclidean distance of feature vectors of the diseases to be identified and the similar diseases, and differences of the weights before sorting are used as core basis.

Description

Ophthalmic disease image recognition method based on neural network model Technical Field The invention relates to the field of ophthalmic disease image recognition, in particular to an ophthalmic disease image recognition method based on a neural network model. Background Early diagnosis of ophthalmic diseases is directly related to treatment prognosis and vision protection of patients, and currently, ophthalmic doctors are mainly relied on to manually interpret fundus photographic images and Optical Coherence Tomography (OCT) images in clinic, however, the mode faces multiple technical challenges in practical application. The manual diagnosis is low in efficiency, the requirement of large-scale crowd screening is difficult to meet, and the diagnosis result is easily influenced by subjective factors such as doctor experience, fatigue degree and the like, so that the diagnosis consistency is insufficient. Meanwhile, noise such as illumination artifact and vascular texture interference commonly exists in fundus photographic images and OCT images, the traditional filtering algorithm can only realize noise suppression of a single scale, and noise and focus details are difficult to accurately separate, so that the accuracy of subsequent feature extraction is affected. In the aspect of feature extraction, the existing deep learning model mostly adopts a single network architecture, either focuses on global feature capture and ignores local focus details, or focuses on local features and lacks integration of global pathology associated information, so that feature expression is incomplete. For rare ophthalmic diseases, due to the fact that clinical samples are scarce, the training mode of the existing model depending on a large amount of labeling data is difficult to adapt, generalization capability is poor in a small sample scene, and recognition accuracy is generally low. In addition, the classification threshold of the existing model is a fixed value, can not be dynamically adjusted according to the characteristics of disease types (common/rare), is easy to cause partial disease missed diagnosis or misdiagnosis, and the model output result is a simple disease type label, lacks mapping with clinical standard diagnosis terms, does not form a structural report containing focus coordinates, lesion classification and differential diagnosis basis, is difficult to directly integrate into a clinical diagnosis process, and has limited practicability. In order to solve the technical problems, the invention provides an ophthalmic disease image recognition method integrating multi-scale noise suppression, double-attention feature extraction, meta-learning training strategies and Bayesian probability classification, which realizes the full-flow optimization from image preprocessing to clinical report generation. Disclosure of Invention The invention aims to provide an ophthalmic disease image recognition method based on a neural network model, which solves the problems that the prior art is difficult to meet the requirement of large-scale crowd screening, the feature extraction accuracy is low, the feature expression is incomplete, the recognition accuracy is low, and a structured report cannot be output. In order to achieve the above object, the present invention provides the following method: the invention provides an ophthalmic disease image recognition method based on a neural network model, which comprises the following steps: S1, acquiring fundus photographic images and OCT images, and performing size standardization processing on the fundus photographic images and the OCT images to obtain standard specification ophthalmic images; S2, inputting the standard ophthalmic image into a multi-scale noise suppression neural network, filtering illumination artifact and vascular texture interference through a cascade structure of a convolution attention module and a wavelet decomposition network by the multi-scale noise suppression neural network, and outputting a preprocessed image with a signal-to-noise ratio not lower than 40 dB; S3, constructing a dual-attention fusion neural network architecture, wherein the dual-attention fusion neural network architecture builds a global feature extraction branch based on a lightweight Vision Transformer and builds a local focus detection branch based on a depth separable CNN; s4, inputting the preprocessed image into the dual-attention fusion neural network architecture, and respectively extracting global pathology associated features and local focus detail features; S5, dynamically fusing the global pathology associated features and the local focus detail features through a cross attention mechanism, and introducing a meta-learning training strategy to optimize a dynamic fusion result so as to obtain a reinforced fusion feature vector; s6, inputting the reinforced fusion feature vector into a Bayesian neural network classification module, adjusting a disease classification threshold