CN-121999229-A - Epicardial adipose tissue fine granularity segmentation method fusing anatomical text semantics

CN121999229ACN 121999229 ACN121999229 ACN 121999229ACN-121999229-A

Abstract

The invention relates to an epicardial adipose tissue fine-granularity segmentation method fusing anatomical text semantics, belonging to the technical field of medical image processing. The method automatically converts cardiac CT image annotation into text description rich in anatomical semantics through an anatomical perception text prompt generator without additional manual annotation, integrates text and image features through a global anatomical semantic fusion module to construct a global anatomical context, wherein feature fusion is optimized through a multi-head self-attention and cross-attention mechanism, and multi-mode contrast loss is introduced through a channel-level contrast anatomical perception enhancement module to improve the differentiation between classes. The invention realizes fine granularity segmentation of the left ventricle EAT, the right ventricle EAT and the EAT around the atrium, has reasonable anatomical structure of segmentation results and excellent detail performance, can be widely applied to accurate diagnosis of cardiovascular diseases, and has high robustness and generalization capability.

Inventors

LIAN SHENG
LIU JIAYAO
CHAI DAJUN
SU QIONG
XU HAO
CHEN JIANWEN
WANG XINYU

Assignees

福州大学

Dates

Publication Date: 20260508
Application Date: 20260211

Claims (10)

1. An epicardial adipose tissue fine-granularity segmentation method fusing anatomical text semantics, comprising: Inputting a heart CT image; Automatically converting the image annotation into a text description rich in anatomical semantics through an anatomical-aware text prompt generator; the text description and the image features are fused through a global anatomy semantic fusion module, so that a global anatomy context is constructed; the inter-class differentiation is optimized through the channel level contrast anatomical perception enhancement module, and a fine granularity segmentation result is output, wherein the fine granularity segmentation result comprises a left ventricular epicardial adipose tissue LV-EAT, a right ventricular epicardial adipose tissue RV-EAT, an atrial epicardial adipose tissue PA-EAT and a background.
2. The epicardial adipose tissue fine-granularity segmentation method fusing anatomical text semantics of claim 1, wherein the anatomical aware text hint generator is implemented by: The input cardiac CT image is analyzed, key anatomical structures and relative positional relationships thereof are identified, and a textual description is generated that is aligned with the anatomical priors, covering global information including image modality, acquisition region, anatomical plane, frame position, and relative positional relationships between different EAT subcategories.
3. The epicardial adipose tissue fine-granularity segmentation method fusing anatomical text semantics of claim 1, wherein the global anatomical semantic fusion module fuses text descriptions and image features through a multi-level attentional mechanism, comprising: the method comprises the steps of adjusting the dimensionality of a text feature and a visual feature to match the text feature, carrying out 1X 1 convolution, linear transformation and ReLU activation processing on the text feature, projecting the text feature to a preset dimensionality, enhancing the visual feature through reshaping and a multi-head self-attention mechanism, wherein a visual feature enhancement formula is as follows: Wherein, the Indicating the visual characteristics of the i-th layer, Representing the visual characteristics of the enhanced i-th layer, SA representing the self-attention operation, PE representing the position coding, and Norm representing the normalization.
4. The epicardial adipose tissue fine-granularity segmentation method of claim 3, wherein the global anatomical semantic fusion module further fuses the image features and the text features by cross-attention, and the fusion formula is: Wherein, the Representing the resulting feature of the fusion, CA representing the cross-attention operation, alpha being a learnable parameter, Representing high-level text features.
5. The epicardial adipose tissue fine-granularity segmentation method fusing anatomical text semantics of claim 1, wherein the segmentation penalty function of the global anatomical semantic fusion module uses a Dice penalty And Focal loss Is given by: Wherein, the In order to output the characteristics of the feature, Is a real tag.
6. The epicardial adipose tissue fine-granularity segmentation method fusing anatomical text semantics of claim 1, wherein the channel-level contrast anatomical perception enhancement module extracts channel-level text features through a text encoder in a shared global anatomical semantic fusion module and fuses the channel-level text features with visual features, wherein a multi-modal contrast loss is introduced to enhance channel consistency, and a contrast loss formula is: Wherein, the For multi-modal contrast loss, s is the number of anatomical categories, In order to fuse the image features of the text features, Image features that are fused text features of the ith category, For the channel-level text feature, For the channel-level text feature of the i-th category, Is a contrast loss function based on InfoNCE.
7. The epicardial adipose tissue fine-granularity segmentation method fused with anatomical text semantics of claim 6, wherein the channel-level contrast anatomical perception enhancement module aligns feature representations of different modalities by calculating contrast loss of channel-level text-class positive and negative sample pairs, refining features of specific anatomical substructures represented by each channel.
8. The epicardial adipose tissue fine-granularity segmentation method fused with anatomical text semantics according to claim 1, wherein the method is applicable to cardiac CT images, and the output results support the accurate diagnosis of cardiovascular diseases and the establishment of personalized diagnosis and treatment schemes.
9. A system for implementing the epicardial adipose tissue fine-grained segmentation method fusing anatomical text semantics of any one of claims 1-8, comprising: A text prompt generation unit for performing anatomic-aware text prompt generation; A fusion unit for performing global anatomical semantic fusion; an enhancement unit for performing channel-level contrast anatomical perception enhancement; and the output unit is used for generating a fine granularity segmentation result.
10. An electronic device comprising a processor and a memory, said memory storing a computer program, characterized in that the processor, when executing said computer program, carries out the steps of the method according to any of claims 1-8.

Description

Epicardial adipose tissue fine granularity segmentation method fusing anatomical text semantics Technical Field The invention belongs to the technical field of medical image processing, and particularly relates to an epicardial adipose tissue fine granularity segmentation method fusing anatomical text semantics. Background Currently, cardiovascular disease is the leading cause of death worldwide. According to statistics, the number of people suffering from cardiovascular diseases in China reaches 3.3 hundred million, and the death factor of urban and rural resident diseases exceeds 40%, so that huge cost and heavy burden are brought to a medical system. Epicardial adipose tissue (EPICARDIAL ADIPOSE TISSUE, EAT) is a unique adipose tissue located between the myocardium and the epicardial visceral layer that can provide mechanical protection, thermogenesis, energy production, metabolic secretion, etc. to the heart. The quantitative parameters of EAT have important significance in modern cardiology research and clinical practice, for example, EAT volume has been proved to be independently related to cardiovascular events such as atrial fibrillation, myocardial ischemia and the like, and EAT thickness measurement is helpful for risk classification of coronary artery diseases and heart failure. Furthermore, EAT is not evenly distributed in the heart. Studies have shown that the thickness of the periatrial EAT (PA-EAT) is an independent factor related to atrial fibrillation load, the thickness of the left ventricular EAT (LV-EAT) is closely related to coronary atherosclerosis and left ventricular diastolic dysfunction, and the volume of the right ventricular EAT (RV-EAT) is significantly related to arrhythmic right ventricular dysplasia/cardiomyopathy and ventricular premature beat. Therefore, the anatomical positions of the EAT are divided into LV-EAT, RV-EAT and PA-EAT according to the distribution of the anatomical positions of the EAT, and accurate fine granularity quantitative analysis is carried out, so that the method has important clinical value for accurate diagnosis and treatment of cardiovascular diseases. At present, the EAT segmentation technology is subjected to the development process from a traditional method to a deep learning method. Early traditional methods such as thresholding and region growing rely on manual intervention, and have complex rules and limited accuracy. Traditional machine learning-based methods such as random forests and fuzzy clustering improve segmentation adaptability, but still require manual design of features, and are not robust enough. The deep learning method makes EAT automatic segmentation progress remarkably, and mainly improves segmentation effect by improving classical models such as U-Net and the like or adopting multi-stage and multi-task learning strategies. With the widespread success of large models in the field of language processing and understanding, visual language models are also being applied in large numbers to a variety of visual segmentation tasks. Inspired by the fusion of language models in natural images, some studies began to assist medical image analysis with textual information. But analysis of medical images is more difficult than natural images. The boundaries between different regions in the medical image tend to be blurred, the gray values near the boundaries do not differ much, and it is difficult to extract high-precision segmentation boundaries. Medical images are substantially different from natural images, and therefore, the visual language model in natural images is not applicable to the analysis of medical images directly. The text-guided segmentation task aims at pixel-level visual language alignment of the input image and the given description. Earlier articles acquire images and language features, respectively, and connect them to form multimodal features. The methods today can generally be divided into two categories. The first idea uses the internal structure of text to help identify target objects. However, this approach does not align well across the modal space and the model is often complex. Another idea is to fit the cross-modal relationship between the image and the language through various attention operations. Simple semantic segmentation can also be seen as a language-guided task, and category names can be seen as short and rough textual descriptions. Unlike the detailed description in text-guided segmentation, the a priori knowledge provided by class names is coarse-grained, which presents challenges for multi-modal alignment. Experiments have shown that well-designed medical cues can enhance segmentation results. Based on this finding, we have carefully designed medical cues to emphasize semantic relationships between anatomical structures. To fully exploit the prior knowledge of text, we employ different levels of attention mechanisms to fuse text and visual features. Prompt word engineering has evolved dramatically