CN-121982043-A - Super-wide-angle image focus and blood vessel segmentation method based on visual text coding

CN121982043ACN 121982043 ACN121982043 ACN 121982043ACN-121982043-A

Abstract

The invention relates to a super-wide angle image focus and blood vessel segmentation method based on visual text coding, and belongs to the technical field of medical image processing. The method comprises the steps of obtaining an ultra-wide-angle fundus image and corresponding focus text data, constructing a fundus image data set, preprocessing to obtain standardized image data, constructing an ultra-wide-angle image focus and blood vessel segmentation model based on visual text coding, wherein the ultra-wide-angle image focus and blood vessel segmentation model comprises a visual ViT global coding branch, a CNN local coding branch, a text coding branch, a cross-modal attention module, a decoding branch and an output end, inputting images in the data set into the model for training, optimizing the model through a joint loss function to obtain a trained model, preprocessing an image to be segmented, and inputting the preprocessed image into the trained model to obtain ultra-wide-angle image focus and blood vessel segmentation results. The invention integrates visual global and local characteristics and text semantic information, and can improve the accuracy of ultra-wide angle image focus and blood vessel segmentation.

Inventors

ZHANG DAN
CHEN TAO
SHEN JUNYU
LI QINGFENG
YI QUANYONG
ZHANG JIONG

Assignees

宁波工程学院

Dates

Publication Date: 20260505
Application Date: 20260203

Claims (8)

1. A super wide angle image focus and blood vessel segmentation method based on visual text coding is characterized by comprising the following steps: s1, acquiring a super-wide-angle fundus image and focus text data corresponding to each super-wide-angle fundus image, and constructing a fundus image dataset; S2, preprocessing the ultra-wide-angle fundus image to obtain standardized image data; S3, constructing a super-wide-angle image focus and blood vessel segmentation model based on visual text coding, wherein the super-wide-angle image focus and blood vessel segmentation model comprises a visual ViT global coding branch, a CNN local coding branch, a text coding branch, a cross-modal attention module, a decoding branch and an output end, wherein the visual ViT global coding branch comprises twelve parallel visual transducer encoders, the CNN local coding branch comprises a first cascade residual error module, a second cascade residual error module, a third cascade residual error module and a fourth cascade residual error module, the text coding branch adopts a RoBERTa text encoder, the decoding branch comprises a first decoding layer, a second decoding layer, a third decoding layer and a fourth decoding layer, and the images in a fundus image dataset are input into the super-wide-angle image focus and blood vessel segmentation model based on visual text coding to train the model; S4, optimizing the model through a joint loss function in the training process to obtain a trained ultra-wide angle image focus and blood vessel segmentation model based on visual text coding; S5, preprocessing an image to be segmented, and inputting the preprocessed image into a trained ultra-wide angle image focus and blood vessel segmentation model based on visual text coding to obtain an ultra-wide angle image focus and blood vessel segmentation result.
2. The method for segmenting super-wide-angle image focus and blood vessel based on visual text coding as set forth in claim 1, wherein the visual ViT global coding branches that standardized image data is divided into a plurality of image blocks with fixed sizes, an image block sequence is formed after linear mapping, the image block sequence is input into the visual ViT global coding, twelve parallel visual transducer encoders are used for processing to obtain twelve feature images, the twelve feature images are divided into a group every three in sequence, and element-by-element addition fusion of channel dimensions is carried out on each group of features to obtain a first fusion feature Second fusion feature Third fusion feature Fourth fusion feature 。
3. The method for segmenting super-wide-angle image focus and blood vessel based on visual text coding as claimed in claim 2, wherein CNN local coding branches, the standardized image data X is input into a first cascade residual error module, and the obtained first residual error characteristic is output The first residual feature The first combined characteristic is obtained after channel-level characteristic fusion is carried out on the first fusion characteristic after 8 times up sampling Combining the first joint features Inputting the second cascade residual error module, and outputting to obtain a second residual error characteristic The second residual feature Channel-level feature fusion is carried out on the second fusion feature after 4 times of upsampling to obtain a second combined feature And combining the second combination feature Inputting the third cascade residual error module, and outputting to obtain a third residual error characteristic The third residual feature and the third fusion feature after 2 times up sampling are subjected to channel level feature fusion to obtain a third combination feature And a third combination feature Inputting the fourth cascade residual error module, and outputting to obtain fourth residual error characteristics Channel-level feature fusion is carried out on the fourth residual feature and the fourth fusion feature after 1-time up-sampling to obtain a fourth combined feature The fourth joint feature Through a convolution kernel of size Is a visual feature space representation obtained by convolving layers of (a) 。
4. A super wide angle image focus and blood vessel segmentation method based on visual text coding as set forth in claim 3, wherein the text coding branch comprises inputting focus text data corresponding to standardized image data into the text coding branch to obtain text feature space representation 。
5. The method for segmenting super-wide-angle image focus and blood vessel based on visual text coding as claimed in claim 4, wherein the cross-modal attention module is used for respectively representing visual characteristic space through global average pooling operation With text feature space representation Compressed into visual feature vector with resolution of 1×1 And text feature vector Text feature vector As a query Visual feature vector As a key Visual feature space representation As a value of Attention weight map is obtained through attention mechanism and Softmax, and the attention weight map is combined with the following information Multiplication to obtain the attention result of visual feature and text feature 。
6. The method of claim 5, wherein the decoding branches to the attention results of the visual features and the text features Through a convolution kernel of size Is processed by a convolution layer to obtain a first decoding characteristic First decoding feature And fourth residual feature After channel level feature fusion, the channel level features are subjected to double up-sampling and input into a first decoding layer to obtain second decoding features The second decoding feature And third residual feature After channel level feature fusion, the channel level features are subjected to double up-sampling and input into a second decoding layer to obtain third decoding features The third decoding feature With a second residual feature After channel level feature fusion, the channel level features are subjected to double up-sampling and input into a third decoding layer to obtain fourth decoding features The fourth decoding feature With first residual features After channel level feature fusion, the channel level features are subjected to double up-sampling and input into a fourth decoding layer to obtain fifth decoding features 。
7. The method for segmenting super-wide-angle image lesion and blood vessel according to claim 6, wherein the output terminal is used for decoding the fifth decoding characteristic Respectively inputting two convolution kernels with the size of And obtaining a focus segmentation map and a blood vessel segmentation map.
8. The method of claim 7, wherein the joint loss function comprises a cross-modal attention loss function, a Dice loss, and a cross entropy loss.

Description

Super-wide-angle image focus and blood vessel segmentation method based on visual text coding Technical Field The invention belongs to the technical field of medical image processing, and particularly relates to a super-wide-angle image focus and blood vessel segmentation method based on visual text coding. Background Currently, in fundus image analysis, diabetic retinopathy screening, and other retinal disease diagnosis, image segmentation techniques play an important role in assisting doctors in identifying focal areas, extracting vascular structures, quantifying lesion features, and the like. Conventional fundus images generally cover only the yellow spot region and its peripheral region, and have limited imaging range, resulting in an ineffective capture of a portion of the peripheral lesion. With the development of ultra-wide angle imaging technology, the ultra-wide angle image can acquire a larger range of fundus vision at one time, the coverage area of the ultra-wide angle image can be several times that of the traditional imaging, more comprehensive structural information can be provided, and new possibility is provided for early lesion detection and diagnosis. However, ultra-wide angle images bring more structural information while also significantly increasing the complexity of the segmentation task. On one hand, the image contains a large number of fine capillaries and micro focus structures, and the areas tend to have complex morphological details, weak gray level difference and fuzzy boundary, so that the traditional segmentation method based on the Convolutional Neural Network (CNN) is difficult to accurately distinguish the target from the background. On the other hand, the ultra-wide angle image has the advantages of larger spatial scale, wide blood vessel distribution range, strong structural continuity, and the conventional network has the defects in capturing global dependency and long-range structural association, and is easy to cause the problems of blood vessel fracture, incomplete focus identification and the like. In addition, imaging interference factors such as noise, uneven illumination, edge distortion and the like are often accompanied in the ultra-wide angle image, so that the segmentation difficulty of the model under a complex background is further increased. Disclosure of Invention The invention aims to achieve the aim, and the aim is achieved by the following technical scheme: The invention provides a super-wide angle image focus and blood vessel segmentation method based on visual text coding, which comprises the following steps: s1, acquiring a super-wide-angle fundus image and focus text data corresponding to each super-wide-angle fundus image, and constructing a fundus image dataset; S2, preprocessing the ultra-wide-angle fundus image to obtain standardized image data; S3, constructing a super-wide angle image focus and blood vessel segmentation model based on visual text coding, wherein the super-wide angle image focus and blood vessel segmentation model comprises a visual ViT global coding branch, a CNN local coding branch, a text coding branch, a cross-modal attention module, a decoding branch and an output end; The visual ViT global coding branch comprises twelve parallel visual transducer encoders, the CNN local coding branch comprises a first cascade residual module, a second cascade residual module, a third cascade residual module and a fourth cascade residual module, the text coding branch adopts a RoBERTa text encoder, and the decoding branch comprises a first decoding layer, a second decoding layer, a third decoding layer and a fourth decoding layer; inputting images in the fundus image dataset into a super-wide angle image focus and blood vessel segmentation model based on visual text coding, and training the model; S4, optimizing the model through a joint loss function in the training process to obtain a trained ultra-wide angle image focus and blood vessel segmentation model based on visual text coding; S5, preprocessing an image to be segmented, and inputting the preprocessed image into a trained ultra-wide angle image focus and blood vessel segmentation model based on visual text coding to obtain an ultra-wide angle image focus and blood vessel segmentation result. Further, the ultra-wide angle image has the advantages of large spatial scale, wide blood vessel distribution range and strong structural continuity, the general network is difficult to capture the association of the global dependency relationship and the long-range structure, so that the blood vessel is broken and the focus identification is incomplete, and the ViT has the core advantages of directly modeling the global pixel association and making up the short plate of the traditional CNN on the long-range dependency capture, thereby avoiding the breakage of the blood vessel due to the unrecognized long-range structure and ensuring the complete capture of the focus area distributed in a large