CN-122023811-A - Retina layer segmentation method, retina layer segmentation system, retina layer segmentation medium, retina layer segmentation product and retina layer segmentation computer equipment

CN122023811ACN 122023811 ACN122023811 ACN 122023811ACN-122023811-A

Abstract

The invention belongs to the technical field of image processing. A retina layer segmentation method, system, medium, product and computer equipment are provided, the preprocessed OCT image is input into a dual-path learning structure priori attention network, a coder extracts multi-scale coding feature images, a decoder receives corresponding-stage coding feature images through jump connection and embeds dual-path learning structure priori attention modules in each stage of up-sampling structure, in each stage of the decoder, the current decoding input and the coding feature images are fused to obtain primary fusion features, the enhancement features are output through the attention modules as the input of the next stage, the enhancement features with the same spatial resolution as the original image are output by the last stage of the decoder, the retina layer segmentation image is generated through category mapping, and finally small connected domain removal operation is carried out on the segmentation image to obtain the final segmentation result.

Inventors

SONG WEIYE
LIU ENYU
Xu Muhao
Yan Bingcan
YANG HAOHUA
ZHOU LIBO
CUI YUAN
QI MIN

Assignees

山东大学

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (17)

1. A retinal layer segmentation method, comprising the steps of: performing standardized preprocessing on the input retina OCT image and the corresponding pixel-level labeling label to obtain a preprocessed OCT image; Inputting the preprocessed OCT image into a double-path learnable structure prior attention network constructed based on a U-shaped encoder-decoder architecture, wherein an encoder comprises a multi-stage downsampling structure for extracting multi-scale coding feature images, a decoder comprises a multi-stage upsampling structure corresponding to each stage of the encoder, each stage upsampling structure receives the coding feature images of the corresponding stage through jump connection, and a double-path learnable structure prior attention module is embedded in each stage upsampling structure; In each stage of up-sampling structure of the decoder, fusing the input of the current stage decoder with the corresponding stage of coding feature diagram received through jump connection to obtain a primary fusion feature, inputting the primary fusion feature to a dual-path learning structure prior attention module of the stage, and outputting an enhancement feature as the input of the next stage decoder by the dual-path learning structure prior attention module; Generating a retina layer segmentation map through category mapping by using the enhancement features of which the final stage of the decoder outputs spatial resolution consistent with the input image; and performing small connected domain removal operation on the retina layer segmentation map to obtain a final retina layer segmentation result.
2. The method of claim 1, wherein, The dual-path learnable structure priori attention module takes a coding feature map of a corresponding level and the preliminary fusion feature as input, calculates a channel average gray map of the coding feature map, generates a learnable structure priori and a deformable structure priori through a learnable structure branch and a deformable structure branch respectively based on the channel average gray map, and fuses the learnable structure priori and the deformable structure priori to obtain a structure priori map.
3. The method of claim 1, wherein, The dual-path learning structure priori attention module generates global weights and local weights based on the structure priori graph, performs weighted average on the global weights and the local weights to obtain fusion weights, and performs structured modulation on the coding feature graph of the corresponding stage by using the fusion weights to obtain modulation features.
4. The method for retinal layer segmentation according to claim 3, The dual-path learning structure prior attention module carries out frequency domain amplitude modulation on the modulation feature to obtain a frequency domain optimization feature, fuses the frequency domain optimization feature and the modulation feature through a self-adaptive gating mechanism, and combines residual connection to output the enhancement feature.
5. The method of claim 1, wherein, The standardized preprocessing comprises the steps of converting an original OCT image into a single-channel gray level image and carrying out gray level normalization, converting an original labeling label into an RGB format image and mapping the RGB format image into the class index label through class coding, carrying out center clipping on the normalized OCT image and the class index label to a fixed size, and synchronously executing at least one item of data enhancement operation comprising random horizontal overturning, random angle rotation, random blurring processing, gaussian noise adding and brightness contrast adjustment on the OCT image and the corresponding label in a training set.
6. The method of claim 1, wherein, Each stage of downsampling structure of the encoder comprises a contraction block and an attention downsampling module, wherein the contraction block performs convolution, batch normalization and activation operation on the input of the current stage to output intermediate features, and the attention downsampling module generates attention weights based on the intermediate features and performs downsampling after weighting the intermediate features to output a coding feature map serving as the input of the next stage of encoder.
7. The method of claim 1, wherein, And each stage of up-sampling structure of the decoder performs up-sampling on the input of the decoder of the current stage through pixel rearrangement, and guides the up-sampling result to be fused with the corresponding stage of coding feature map received through jump connection based on the edge attention mask, so as to obtain the primary fusion feature.
8. The method of claim 1, wherein, When training the prior attention network of the dual-path learning structure, adopting the weighted sum of the cross entropy loss and the Dice loss as a mixed loss function, updating network parameters through back propagation, and storing an optimal model when the verification loss is the lowest.
9. The method of claim 1, wherein, And the last stage of coding feature map output by the encoder is input to a converter bottleneck layer for global context modeling, and refined features are generated and are used as initial input of the decoder after bottleneck convolution processing.
10. The method of claim 1, wherein, The small connected domain removing operation includes deleting connected domains of which the number of pixels in the retina layer segmentation map is smaller than a preset threshold.
11. A retinal layer segmentation system, comprising: a preprocessing unit configured to perform standardized preprocessing on an input retina OCT image and a corresponding pixel-level labeling label, to obtain a preprocessed OCT image; A network input unit configured to input the preprocessed OCT image to a dual-path learnable structure prior attention network constructed based on a U-shaped encoder-decoder architecture, wherein the encoder includes a multi-stage downsampling structure for extracting multi-scale encoding feature maps, the decoder includes a multi-stage upsampling structure corresponding to each stage of the encoder, each stage upsampling structure receives the encoding feature maps of the corresponding stage through a skip connection, and a dual-path learnable structure prior attention module is embedded in each stage upsampling structure; The feature enhancement unit is configured to fuse the input of the decoder of the current stage with the corresponding stage coding feature map received through jump connection in each stage of up-sampling structure of the decoder to obtain a primary fusion feature, and input the primary fusion feature to a dual-path learning structure prior attention module of the stage, and the dual-path learning structure prior attention module outputs the enhancement feature as the input of the decoder of the next stage; A segmentation generation unit configured to generate a retina layer segmentation map through class mapping by outputting enhancement features whose spatial resolution is identical to that of an input image by a final stage of the decoder; And the post-processing unit is configured to execute small connected domain removal operation on the retina layer segmentation graph to obtain a final retina layer segmentation result.
12. The retinal layer segmentation system according to claim 11, In the network input unit, the dual-path learnable structure prior attention module takes a coding feature map and the preliminary fusion feature of a corresponding level as input, calculates a channel average gray level map of the coding feature map, generates a learnable structure prior and a deformable structure prior through a learnable structure branch and a deformable structure branch respectively based on the channel average gray level map, and fuses the learnable structure prior and the deformable structure prior to obtain a structure prior map.
13. The retinal layer segmentation system according to claim 11, In the feature enhancement unit, the dual-path learning structure prior attention module generates global weight and local weight based on the structure prior graph, performs weighted average on the global weight and the local weight to obtain fusion weight, and performs structural modulation on the coding feature graph of the corresponding stage by using the fusion weight to obtain modulation features.
14. The retinal layer segmentation system according to claim 13, In the feature enhancement unit, the dual-path learning structure prior attention module carries out frequency domain amplitude modulation on the modulation feature to obtain a frequency domain optimization feature, and the frequency domain optimization feature and the modulation feature are fused through a self-adaptive gating mechanism, and the enhancement feature is output by combining residual connection.
15. A computer device comprises a processor and a computer-readable storage medium; a processor adapted to execute a computer program; a computer readable storage medium having stored therein a computer program which, when executed by the processor, implements the retinal layer segmentation method according to any one of claims 1 to 10.
16. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor and to perform the retinal layer segmentation method according to any one of claims 1 to 10.
17. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the retinal layer segmentation method according to any one of claims 1 to 10.

Description

Retina layer segmentation method, retina layer segmentation system, retina layer segmentation medium, retina layer segmentation product and retina layer segmentation computer equipment Technical Field The invention relates to the technical field of image processing, in particular to a retina layer segmentation method, a retina layer segmentation system, a retina layer segmentation medium, a retina layer segmentation product and computer equipment. Background The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art. The retina is the core tissue of the visual system, whose integrity of the delicate anatomy directly determines visual function. Blinding diseases such as glaucoma, diabetic retinopathy, age-related macular degeneration, etc. all cause characteristic retinal structural abnormalities, for example, glaucoma causes progressive thinning of the retinal nerve fiber layer (nerve fiber layer, abbreviated as NFL), diabetic macular edema (diabetic macular edema, abbreviated as DME) causes abnormal thickening of the macular area, and detachment of the retinal pigment epithelium (RETINAL PIGMENT, abbreviated as RPE) layer is a typical manifestation of many lesions. Optical coherence tomography (optical coherence tomography, OCT for short) has become the "gold standard" for clinical retinal disease diagnosis by virtue of non-invasive, micron-level high resolution, real-time imaging, which can clearly present the more than 10 layers of fine anatomy of the retina. The doctor can judge the lesion type, severity and progress trend by performing layer thickness measurement and morphological analysis through OCT images, and provides a key basis for early screening, treatment scheme formulation and curative effect evaluation. In clinical routine OCT examination, single-eye scanning can generate tens to hundreds of B scanning images, traditional relying on ophthalmologists to manually divide retina layer boundaries, the efficiency is extremely low due to complicated operation flow and long time consumption, and the subjectivity and the result repeatability are poor due to the fact that different doctors and the same doctor have different judging standards in different time, meanwhile, basic medical institutions lack of intensive ophthalmologists, the technical threshold is high, and large-scale screening, emergency case diagnosis and basic medical institution application requirements are difficult to adapt. Therefore, developing high-precision and automatic retina layer segmentation technology becomes an industry core research and development direction. Meanwhile, with the rapid popularization of portable OCT equipment, the technical requirements are further improved due to the fact that the equipment is limited by volume, power consumption and cost and the constraint of hardware resources, and the segmentation algorithm is required to have the characteristics of light weight, low calculation power consumption and high reasoning speed while ensuring high precision so as to adapt to the application scene of the portable equipment. The traditional machine learning method is not considered due to the fact that fixed features such as gray level and edges of manual design are relied on, influences of structural distortion caused by OCT image speckle noise and lesions on feature distribution are not considered, adaptability to complex image scenes is weak, segmentation accuracy is low, feature design and parameter debugging are optimized for specific equipment and specific lesion types, cross-scene self-adaptive adjustment capability is not achieved, generalization capability is not achieved, core algorithms such as graph cutting and the like are required to manually define energy item weights, standardized parameter configuration schemes are lacked, result reliability is dependent on experience of operators, and standardized clinical application is difficult to achieve. The deep learning method based on CNN does not explicitly model retinal anatomy constraint such as interlayer parallelism, continuity, normal thickness range and the like because of relying on local convolution operation learning characteristics, so that an anatomically unreasonable segmentation result cannot be identified by a model, and errors such as layer fracture, dislocation, intersection and the like occur, and the model is not obvious in specific inhibition effect on inherent speckle noise design of OCT images, so that the model is sensitive to noise, and is easy to misjudge the noise as a layer boundary, and further causes boundary blurring and segmentation error rising, and meanwhile, a high-precision model needs to increase the feature extraction capacity parameter scale by increasing network depth and width by more than 50 mega (M for short), so that the calculation complexity is high, the model cannot be deployed in portable equipment, and the light modi