CN-122023806-A - Medical image segmentation system and method based on SAM2 adapter fine tuning and contrast sensing

CN122023806ACN 122023806 ACN122023806 ACN 122023806ACN-122023806-A

Abstract

The invention discloses a medical image segmentation system and method based on SAM2 adapter fine tuning and contrast sensing, and belongs to the technical field of medical image processing. The system comprises an encoder module, a parameter efficient fine adjustment module, a multi-scale feature fusion module, a foreground and background contrast attention module and a decoder module which are connected in sequence. The method comprises the steps of performing a feature analysis on a medical image, wherein the encoder module consists of a pre-training SAM2 Hiera trunk with frozen weight, the parameter efficient fine-tuning module consists of a double bottleneck structure adapter inserted into the trunk, and the foreground and background contrast attention module is a key of the method and is used for explicitly separating the foreground from the background through a parallel double branch structure and respectively performing enhancement and suppression operations so as to solve the problem of target boundary blurring in the medical image. According to the invention, through cooperation of the modules, on the premise of only fine-tuning a small amount of parameters, the precision and the boundary accuracy of medical image segmentation are obviously improved.

Inventors

ZHOU HAO
CHEN CHANG
ZHANG JIAN
TAO TAO
WANG TONG
LI QIAO

Assignees

安徽工业大学

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (8)

1. A SAM2 adapter fine tuning and contrast aware based medical image segmentation system, comprising: the encoder module is composed of a pre-trained and weight frozen SAM2 Hiera backbone network; The parameter efficient fine adjustment module is composed of a plurality of adapters inserted into each Hiera blocks in the encoder module, and the adapters adopt a double bottleneck structure comprising three linear conversion layers; the input end of the multi-scale feature fusion module is connected to the multi-layer output of the encoder module, the fusion module comprises a local branch and a global branch which are parallel, and the outputs of the local branch and the global branch are spliced in the channel dimension; the input end of the foreground-background contrast attention module is connected to the output end of the multi-scale feature fusion module, the attention module comprises a foreground branch and a background branch which are parallel, and the outputs of the foreground branch and the background branch are fused; and the decoder module adopts a U-shaped network structure, and the input end of the decoder module is connected to the output end of the foreground and background contrast attention module.
2. The medical image segmentation system as set forth in claim 1, wherein the adapter structure comprises: the first linear layer is used for reducing the input characteristic dimension to a first intermediate dimension, and then sequentially connecting the first normalization layer and a first SiLU activation function; A second linear layer connected to the first SiLU activation functions for compressing the features to a second intermediate dimension, followed by a second normalization layer connected in turn with a second SiLU activation function; A third linear layer, coupled to the second SiLU activation functions, for restoring the feature dimension to the original input dimension; Wherein Dropout layers are provided after the first SiLU activation function and the second SiLU activation function.
3. The medical image segmentation system of claim 1, wherein the local branches of the multi-scale feature fusion module comprise a 3 x 3 convolution layer, a normalization layer, a Softmax activation function, and a1 x 1 convolution layer connected in sequence, and the global branches comprise an adaptive blocking unit, a linear self-attention unit, and a feature broadcasting unit.
4. The medical image segmentation system according to claim 1, wherein the foreground-background contrast attention module further comprises an initial 3 x 3 convolution layer for separating input features into foreground features and background features and feeding the foreground branches and the background branches, respectively, each comprising a depth separable convolution layer for generating an attention map.
5. A medical image segmentation method based on the system of any one of claims 1 to 4, characterized in that the medical image is input into an image segmentation model, and a segmentation mask output by the image segmentation model is obtained, and specifically comprises the following steps: acquiring a medical image to be segmented; inputting the medical image into an image segmentation model to obtain a segmentation mask output by the image segmentation model; The processing procedure of the image segmentation model comprises the following steps: Performing multi-scale feature extraction on the input image through a weight frozen pre-training SAM2 Hiera backbone network; Performing domain-adaptive transformation on the extracted features through a plurality of double bottleneck structure adapters inserted into the backbone network; fusing features from different levels by performing local convolution in parallel with global self-attention operations; the fused features are subjected to foreground enhancement and background suppression through a parallel double-branch attention mechanism; and carrying out up-sampling and feature integration on the features subjected to foreground and background contrast enhancement to generate a final segmentation mask.
6. The medical image segmentation method according to claim 5, wherein the image segmentation model is trained using a weighted sum of the Dice loss and the binary cross entropy loss as a loss function.
7. The medical image segmentation method according to claim 5, wherein parameters of the SAM2 Hiera backbone network are kept unchanged while training the image segmentation model, and parameters of only the adapter, related components performing feature fusion and foreground-background contrast enhancement, and decoding components for upsampling are updated.
8. An electronic device, comprising a processor and a memory; The memory stores a computer program which, when executed by the processor, implements a SAM2 adapter fine tuning and contrast aware based medical image segmentation system as claimed in any one of claims 1 to 4.

Description

Medical image segmentation system and method based on SAM2 adapter fine tuning and contrast sensing Technical Field The invention belongs to the technical field of medical image processing, and particularly relates to a medical image segmentation system and method based on SAM2 adapter fine tuning and contrast sensing. Background Medical image segmentation is a key technology for assisting diagnosis and therapy planning. In recent years, large-scale pre-training models (such as SEGMENT ANYTHING Model, SAM) provide a new paradigm for medical image analysis due to their powerful general visual feature extraction capabilities. However, there are significant differences in directly applying SAM for natural image training or its modified version SAM2 to medical images, facing challenges such as foreground-background boundary blurring, imaging artifact interference, multi-scale structural complexity, etc. In the prior art, to adapt to downstream tasks, parameter efficient fine tuning techniques are typically employed, such as inserting lightweight class adapters in the backbone network. For example, a comparison document CN118410853B discloses a SAM-based efficient fine tuning method for parameters of a large visual model, and by designing a convolution side adapter, a multi-scale refinement module and a feature fusion decoder, performance of the model in tasks such as camouflage target detection is improved, and another comparison document CN118470305a discloses a SAM multi-feature fusion-based remote sensing image target detection method. While these approaches demonstrate the effectiveness of PEFT technology, they are primarily directed to general or domain-specific targets in natural scenarios, with the core being the enhancement and fusion of general multi-scale features. The medical image segmentation, especially focus segmentation, has the core difficulties that the contrast of the target (foreground) and surrounding healthy tissues (background) is low in gray level and texture, and the boundary presents a soft boundary characteristic, so the invention designs a medical image segmentation system and a medical image segmentation method based on SAM2 adapter fine adjustment and contrast perception. Disclosure of Invention The invention aims to solve the problems in the prior art, and provides a medical image segmentation system and a medical image segmentation method based on SAM2 adapter fine tuning and contrast sensing. The invention first discloses a medical image segmentation system based on SAM2 adapter fine tuning and contrast perception, comprising: the encoder module is composed of a pre-trained and weight frozen SAM2 Hiera backbone network; The parameter efficient fine adjustment module is composed of a plurality of adapters inserted into each Hiera blocks in the encoder module, and the adapters adopt a double bottleneck structure comprising three linear conversion layers; the input end of the multi-scale feature fusion module is connected to the multi-layer output of the encoder module, the fusion module comprises a local branch and a global branch which are parallel, and the outputs of the local branch and the global branch are spliced in the channel dimension; the input end of the foreground-background contrast attention module is connected to the output end of the multi-scale feature fusion module, the attention module comprises a foreground branch and a background branch which are parallel, and the outputs of the foreground branch and the background branch are fused; and the decoder module adopts a U-shaped network structure, and the input end of the decoder module is connected to the output end of the foreground and background contrast attention module. In the above system, the structure of the adapter includes: the first linear layer is used for reducing the input characteristic dimension to a first intermediate dimension, and then sequentially connecting the first normalization layer and a first SiLU activation function; A second linear layer connected to the first SiLU activation functions for compressing the features to a second intermediate dimension, followed by a second normalization layer connected in turn with a second SiLU activation function; A third linear layer, coupled to the second SiLU activation functions, for restoring the feature dimension to the original input dimension; Wherein Dropout layers are provided after the first SiLU activation function and the second SiLU activation function. In the system, the local branch of the multi-scale feature fusion module comprises a 3×3 convolution layer, a normalization layer, a Softmax activation function and a1×1 convolution layer which are connected in sequence, and the global branch comprises an adaptive blocking unit, a linear self-attention unit and a feature broadcasting unit. In the above system, the foreground-background contrast attention module further comprises an initial 3×3 convolution layer for separating the input featu