CN-122023796-A - Medical image segmentation method based on deep learning

CN122023796ACN 122023796 ACN122023796 ACN 122023796ACN-122023796-A

Abstract

The invention discloses a medical image segmentation method based on deep learning, which comprises the following steps of S1, inputting a medical image x The method comprises the steps of preprocessing and data enhancement, step S2 of extracting all scale feature information of an input image in parallel by using a double-branch encoder, step S3 of designing a multi-scale feature gating fusion module, up-sampling multi-scale gating fusion features output by the encoder and original image shallow layer features by using a decoder, and self-adaptively weighting and fusing by a gating mechanism to reconstruct a segmentation probability map, step S4 of designing a corresponding mixing loss function for network training according to whether segmentation tasks are classified into two or more categories, step S5 of performing comparison experiments and ablation experiments on three disclosed medical data sets to verify the effectiveness of the segmentation method, and the problem that the existing medical image segmentation technology still has shortcomings in terms of considering local detail perception, global context information utilization and calculation efficiency is solved.

Inventors

YUAN XIAOPING
YIN ZIHAN
ZHANG DINGCHUAN
ZHANG LIUQING
ZHU CHANGHU
SU CHAOYANG
TIAN WENKAI
Cen Yuhang

Assignees

中国矿业大学

Dates

Publication Date: 20260512
Application Date: 20260122

Claims (10)

1. A medical image segmentation method based on deep learning, characterized in that the method comprises the following steps: step S1, inputting medical image x Preprocessing and data enhancement are carried out; S2, extracting the scale feature information of the input image in parallel by using a double-branch encoder; S3, designing a multi-scale feature gating fusion module, up-sampling multi-scale gating fusion features and original image shallow features output by an encoder by using a decoder, and adaptively weighting and fusing by using a gating mechanism to reconstruct a segmentation probability map; Step S4, designing a corresponding mixed loss function for network training according to whether the segmentation task is classified into two categories or multiple categories; and step S5, performing a contrast experiment and an ablation experiment on the three disclosed medical data sets, and verifying the effectiveness of the segmentation method.
2. The method for segmenting medical images based on deep learning according to claim 1, wherein the dual-branch encoder in step S2 comprises CNN branch, mamba branch and SSCM module; The CNN branch comprises a plurality of cascaded CNN feature extraction units, and the CNN feature extraction units are used for extracting multi-scale local detail features from an input image; the CNN feature extraction unit sequentially comprises a CNN feature extraction unit for carrying out feature extraction and adjusting the number of channels of a feature map, a convolution layer with a convolution kernel size of 3 multiplied by 3, a batch normalization layer for carrying out feature extraction and adjusting the number of channels of the feature map, a LeaKyReLU activation function layer for introducing nonlinearity to enhance the feature expression capability, and a maximum pooling layer for carrying out downsampling to reduce the spatial resolution of the feature map and enlarge the receptive field; The Mamba branch comprises a plurality of cascaded Multi Scale Mamba modules, which are used for extracting multi-scale global context features from input features, and the core of the Mamba branch is a multi-scale convolution cross scanning module, and the global context features of the image are extracted by utilizing a selective state space model and a cross scanning mechanism of the multi-scale convolution cross scanning module; the SSCM module is used for fusing the characteristic information of the double-branch output to generate a fused characteristic diagram.
3. The medical image segmentation method based on deep learning as set forth in claim 2, wherein the specific implementation steps of the CNN branches are as follows: step S211, the first-stage CNN feature extractor adjusts the channel number of the input three-channel feature map to 96, and reduces the space size to 1/4 of the original image through pooling operation; Step S212, extracting features by a CNN feature extracting unit of each subsequent stage through the 3 multiplied by 3 convolution kernel; step S213, setting the number of output channels to be 2 times of the number of output channels of the previous stage; step S214, the feature image size is halved through the maximum pooling operation with the kernel size of 2×2 and the step length of 2, and the CNN branch output feature image is obtained as follows 。
4. The medical image segmentation method based on deep learning according to claim 2, wherein the Mamba branch specifically comprises the following implementation steps: step S221, in the Mamba branch, the embedded image is Extracting features by using four Mamba feature extractors comprising a 2-layer Multi Scale Mamba module, and carrying out layer normalization processing on the embedded patch features by using a Multi Scale Mamba module; Step S222, inputting the normalized characteristics to a first processing branch and a second processing branch in parallel; Step S223, performing dimension transformation on the first processing branch through a linear layer, and extracting spatial features by utilizing depth separable convolution; step S224, further feature extraction is carried out by utilizing an SMSS6 module, and layer normalization is carried out on the output again; step S225, the second processing branch directly transforms the layer normalized characteristics through the linear layer; Step S226, fusing the output features of the first processing branch and the second processing branch through the linear transformation layer, and performing residual connection operation on the fusion result and the initial input features of the modules to form the final output of the Multi Scale Mamba modules, wherein the feature map output by the Mamba branch feature extractor is as follows The feature extraction flow of Multi Scale Mamba modules can be expressed as: ; Wherein, the The main and auxiliary branch output results are respectively indicated Multi Scale Mamba, Representing the final output of the ith Multi Scale Mamba module, PM represents PATCH MERGING operation, linear represents a Linear layer, LN represents Linear normalization, and DwConv represents depth separable convolution.
5. The medical image segmentation method based on deep learning according to claim 4, wherein the step S224 is specifically implemented as follows: Step S2241 and MSS6 comprise a Cross Scan module and a Muti Scale Conv module, wherein the Cross Scan module scans and expands the input feature map; Step S2242, the Multi Scale Conv module extracts local features of the input feature map on different scales by using convolution kernels with sizes of 3×3,5×5,7×7, respectively, and changes the output result into a one-dimensional sequence, and the processing procedure of the MSS6 module is expressed as: ; where Conv_k k represents a convolution kernel of size k by k performing a convolution operation, faltten represents a flattening operation, Representing the sequence of feature maps convolved and flattened by a k-size convolution kernel, CS represents a cross-scan operation along the four directions, Representing a scan of the ith layer in the direction t, Representing the sequence after the scan flattening, The feature modeling of the input sequence using the S6 module is represented, The weight parameters representing the different paths are used, Representing the output of the i-layer MSS6 module.
6. The medical image segmentation method based on deep learning according to claim 4, wherein the specific implementation steps of the step S223 are as follows: step S2231 adds up the corresponding elements of the Mamba branches and CNN branches output feature maps to t i =m i +c i , Achieving primary fusion; step S2232, using CBAM to process the feature map after preliminary fusion, introduces channel attention and spatial attention modeling, where the channel attention and spatial attention formulas can be expressed as: ; Wherein MLP represents fully connected layers, avgPool and MaxPool represent average pooling and maximum pooling, respectively, conv3×3 represents a convolutional layer of size 3×3, cat c represents per-channel splicing, Representing the channel attention module output, Representing a spatial attention module output; Step S2233, splicing the processed characteristic diagram and the original characteristic diagram according to channels and carrying out group convolution; Step S2234, compressing the feature map elements into the [0,1] interval by using a Sigmoid activation function to form a weight map, multiplying the weight map by the corresponding original input and adding the results to obtain a fused feature map, wherein the processing method of the SSCM module is as follows: ; Wherein ω cti represents a preliminary attention feature map output by the module, cat represents a channel-by-channel splice, σ represents a Sigmoid activation function, the attention feature map is compressed to [0, 1] to obtain attention weight maps ω and 1- ω, GC 7×7 represents a 7×7-sized group convolution operation, CS represents channel shuffling, and O SSCMi represents a result obtained by performing a weighted summation operation on original input maps m i and c i by using corresponding weight maps ω m and ω c .
7. The medical image segmentation method based on deep learning according to claim 1, wherein the specific implementation step of step S3 is as follows: Step S31, the decoder carries out up-sampling and channel adjustment on the fusion features output by the four SSCM modules of the encoder respectively to restore the size of the encoder to the size of an original input image, each decoding stage fuses the features from the corresponding level of the encoder through jump connection, the original input image passes through an independent 3X 3 convolution layer, shallow layer features are extracted, the number of channels is changed to 24, and the feature map after the adjustment of each level is defined as ; Step S32, a multi-scale characteristic gating fusion module comprises a lightweight weight generation network formed by two 1X 1 convolution layers, and characteristic diagrams after each level is adjusted Splicing in the channel dimension, compressing and outputting a group of five-channel weight mapping omega raw corresponding to the input space dimension; Step S33, referring to a learnable temperature parameter Temp scaling omega raw , and normalizing along the channel dimension by a Softmax function to generate a spatially adaptive attention weight graph omega i ; Step S34, designing a residual fusion mechanism, linearly fusing a dynamic gating weighted sum result and an input arithmetic average result, and controlling a fusion coefficient by utilizing a learnable scalar parameter lambda 1 ,λ 2 =1-λ 1 constrained by a Sigmoid function, wherein the module outputs a feature map fusing a multi-scale context, and a formula of the multi-scale feature gating fusion module can be expressed as follows: ; Wherein, the Cat C represents the per-channel splice, conv 1 represents the 1×1 convolution kernel, BN and ReLU represent the batch normalization and ReLU activation functions, respectively, temp represents the temperature parameters controlling the weight distribution, split C represents per-channel splitting, ω raw and ω i represent the five-channel weight mapping and per-channel Split weight mapping, F GO represents the output result of the gated weighted fusion, F MeanO represents the simple multiple-input arithmetic average result, λ 1 ,λ 2 represents the learnable scalar parameters used to control the two-branch output, Representing the final output result of the multi-scale feature gating fusion module.
8. The medical image segmentation method based on deep learning according to claim 1, wherein in the step S4, the specific operation steps are as follows: step S41, for the two-class segmentation task, a mixed loss function combining the binary cross entropy and the Dice loss is adopted to complete the two-class segmentation task, and the calculation formula can be expressed as follows: ; Wherein y i is the true label of the ith pixel, p i is the probability that the model predicted pixel belongs to the foreground, N represents the total number of pixels, lambda 1 and lambda 2 represent the weights of two loss functions, epsilon represents a smoothing term for avoiding zero denominator; Step S42, for multi-class segmentation tasks, the class segmentation tasks are completed using a loss function that combines the modified weighted cross entropy L IpCe and the modified L IpDice .
9. The medical image segmentation method based on deep learning according to claim 8, wherein the specific steps of step S42 are as follows; step S421, the standard cross entropy loss function can be expressed as: ; Wherein y i,c is a true label of one-time thermal coding, p i,c is a prediction probability, N is the total number of pixels, and C is the class number; Step S422, a Focal Loss mechanism is introduced, category weights are adjusted, and a Focal Loss adjustment factor calculation formula is as follows: Focal Loss= ; step S423, different weights omega c ,ω c are given to each category and inversely proportional to the occurrence frequency of the category, so that the sample distribution is balanced, and the improved formula of the cross entropy loss function is as follows: ; The standard Dice Loss function is expressed as: ; wherein ε represents a smooth term for avoiding zero denominator; the total loss function can be expressed as: ; Wherein, alpha and beta are weight parameters which are respectively set to 0.4 and 0.6.
10. The medical image segmentation method based on deep learning according to claim 1, wherein the step S5 is specifically implemented as follows: Step S51, three public data sets ISIC-2017, ISIC-2018 and ACDC are selected, wherein the ISIC-2017 data set comprises 2000 training set images, 150 verification set images and 600 test set images, the ISIC-2018 data set comprises 2594 training set images, 100 verification set images and 1000 test set images, and the ACDC data set comprises MRI scanning data of 150 patients. The training set comprises 100 cases of marked data, the test set comprises 50 cases of unmarked data, and the experiment divides the 100 cases of marked data into 70 cases of training data, 10 cases of verification data and 20 cases of test data; Step S52, selecting the cross-over ratio IOU, the price coefficient DSC, the specificity Spec, the accuracy ACC and the Recall ratio Recall as evaluation indexes of two data sets of ISIC-2017 and ISIC-2018, wherein the calculation formula is as follows: ; Where TP i represents the number of pixels that are correctly predicted as class i, FP i represents the number of pixels that do not actually belong to class i but are incorrectly predicted as class i, TN i represents the number of pixels that do not actually belong to class i and are not predicted as class i, and FN i represents the number of pixels that do not actually belong to class i but are not predicted as class i; DSC and HD95 are selected as evaluation indexes of ACDC data, and the calculation formula of HD95 is as follows: ; S pred 、S gt represents a boundary surface point set of a prediction segmentation result and a truly marked boundary surface point set respectively, h 95 represents a single-side 95% quantile distance, and P-q represents Euclidean distance between a point p and a point q; Step S53, configuring the environment for network training and verification as NVIDIA Geforce RTX4070 with 12GB memory, ubuntu22.04, pytorch 1.13.13, adjusting the image size of three data sets to 256×256, setting the batch size of the model to 16, training 100 epochs, using AdamW optimizer, learning rate lr=0.001, weight attenuation weight_decay=1e-2, performing learning rate scheduling using cosine annealing algorithm, cycle length t_max=100, minimum learning rate eta_min=1e-5; step S54, selecting 7 commonly applicable medical image segmentation networks to carry out a comparison experiment with the method; Step S55, designing an ablation experiment to verify the necessity and the effectiveness of each module in the method; the specific steps of the step S54 are as follows: Step S541, obtaining indexes of different segmentation methods on an ISIC-2017 dataset, wherein the core segmentation indexes IoU and DSC respectively reach 78.06% and 87.67%, which is superior to other comparison methods, and the overall classification accuracy ACC reaches 95.96% which is the first of 8 methods, and the specific Spec and Recall ratio Recall are respectively 98.01% and 85.74% which are still not popular with the other 7 methods; Step S542, obtaining the visual results of different models on an ISIC-2017 data set, wherein the visual results are that the method has clear and continuous segmentation boundaries on the images with fuzzy boundaries of hair interference and focus, has good fitting degree, and is contrasted and displayed, other methods can easily misjudge healthy tissues as focus when the edges are fuzzy, can easily recognize interferents as targets when foreign matters exist, and can generate wrong segmentation of healthy skin areas under the interference of hair; Step S543, indexes of different segmentation methods on an ISIC-2018 data set are obtained, and on the ISIC-2018 data set, optimal results are obtained on three core indexes of 79.12% in IoU, 88.34% in DSC and 94.37% in ACC, wherein IoU is 0.83% higher than the inferior method, 87.53% in Recall and 96.57% in Spec are also named as top-of-line, and the effectiveness of segmentation accuracy and region overlap ratio of the method is further verified; Step S544, obtaining a visual result of different models on the ISIC-2018 dataset, wherein the segmentation result of the method is identical to real information, and compared with the segmentation result of the method, other methods have obvious defects of mistaken segmentation of healthy tissues, fracture of lesion areas, saw-tooth boundaries and the like; Step S545, indexes of different segmentation methods on ACDC are obtained, on an ACDC data set, the method is optimal in overall segmentation precision and boundary fitting degree, the average Dice coefficient is 91.21%, the average HD95 distance is 2.7469 mm, and the method is an optimal result, is remarkably superior to a classical U-Net series method, keeps high segmentation precision, is superior to models such as ViT in boundary precision, and shows good generalization performance; Step S546, obtaining the visual results of different models on the ACDC data set, wherein the segmentation result of the method is consistent with the real labeling height, and the generated boundary is continuous, smooth and consistent with the anatomical structure, so that the common phenomenon of contour fracture or irregularity in other methods is effectively avoided; the specific implementation steps of the step S55 are as follows: Step S551, the group with experiment number 1 uses CNN branch and GCM module only, the group with experiment number 2 uses Mamba branch and GCM module, the group with experiment number 3 uses CNN and Mamba double branch, the group with experiment number 4 uses CNN and Mamba double branch and GCM module, the group with experiment number 5 uses CNN and Mamba double branch and SSCM module, and the group with experiment number 6 is a complete model integrating all modules; The ablation experimental result on the data set of step S552 and ISIC-2017 shows that when only CNN branch and GCM module are used, ioU is 76.61%, DSC is 86.75%, when only Mamba branch and GCM module are used, ioU is 77.17%, model performance is limited, when dual branches are used in combination but feature fusion is not performed, ioU is 77.60%, DSC is 87.39%, performance is superior to any single branch, the dual branch structure can effectively provide complementary features, when multi-scale gating fusion module GCM is directly introduced on the dual branch basis, ioU is 77.33% when SSCM module is absent, performance is reduced, which indicates that simple polymerization lack of guidance may damage feature consistency, when SSCM module is introduced on the basis, feature fusion is performed, ioU is 77.46%, performance is recovered and improved, ioU of the complete model integrating all modules is 78.06%, DSC is 87.67%, and the best is achieved on all indexes.

Description

Medical image segmentation method based on deep learning Technical Field The invention relates to the technical field of medical image processing and computer vision, in particular to a medical image segmentation method based on deep learning. Background Along with the development of technology, medical image segmentation is a key technology for computer-aided diagnosis, operation planning and curative effect evaluation, and the function of the medical image segmentation is to accurately separate a target anatomical structure or focus area from a medical image. With the development of deep learning, convolutional neural network (Convolutional Neural Network, CNN) based methods have become the dominant method in this field. Early, full convolutional networks (Fully Convolutional Networks, FCN) achieved pixel-level segmentation, while the U-Net architecture by virtue of its encoder-decoder structure and jump connection can effectively recover part of the spatial details, playing a great role in medical image segmentation. However, U-Net and its most variants rely primarily on convolution operations, which have limited local receptive fields, and are inadequate for adequately fusing semantic information and spatial information. This is prone to segmentation imperfections or boundary errors when dealing with medical targets that are blurred in boundary, complex in structure or of different sizes. In addition, the jump connection usually adopts simple addition or splicing operation, and the self-adaption and refined fusion of the shallow space information and the deep semantic information cannot be fully realized. To overcome the limitations of CNN, researchers introduced a transducer-based model. The Transformer is able to establish global dependencies through a self-attention mechanism, whose visual variants exhibit its superior performance in multiple tasks. However, since the computational complexity of the self-attention mechanism is proportional to the square of the input sequence length, the demand for computational resources in processing high resolution medical images has grown dramatically, making its use in clinically practical settings impractical. While subsequent research has improved by designing shift windows and other strategies, challenges remain in balancing computational efficiency with global modeling capabilities. Recently, mamba architecture in state space models (STATE SPACE models, SSMs) has received attention due to its linear computational complexity and powerful long-sequence modeling capabilities. Mamba by selecting the state space and hardware aware algorithms, remote dependencies can be captured while maintaining efficient computation. Prior efforts have been made to introduce Mamba into visual tasks to process by designing a scanning strategy to convert a two-dimensional image into a sequence. However, these methods mainly focus on global modeling by using a state space model instead of the traditional attention mechanism, and cannot fully utilize the advantages of CNN in terms of extracting local detail features. How to design a novel network architecture, which can not only efficiently utilize a state space model to control the global situation, but also reserve the processing capacity of CNN on details such as local textures, edges and the like, and realize efficient fusion and complementation of information of the two, is still an unresolved problem in the prior art, and is also a key for improving the segmentation precision and robustness of complex medical images. In summary, the existing medical image segmentation technology has the defects in three aspects of local detail perception, global context information utilization and calculation efficiency. Therefore, a new solution is needed to solve the above problems Disclosure of Invention In view of the above, the present invention aims to provide a medical image segmentation method based on deep learning, which solves the problem that the existing medical image segmentation technology still has shortcomings in terms of local detail perception, global context information utilization and calculation efficiency. In order to achieve the above purpose, the invention adopts the following technical scheme: A medical image segmentation method based on deep learning comprises the following steps: step S1, inputting medical image x Preprocessing and data enhancement are carried out; S2, extracting the scale feature information of the input image in parallel by using a double-branch encoder; S3, designing a multi-scale feature gating fusion module, up-sampling multi-scale gating fusion features and original image shallow features output by an encoder by using a decoder, and adaptively weighting and fusing by using a gating mechanism to reconstruct a segmentation probability map; Step S4, designing a corresponding mixed loss function for network training according to whether the segmentation task is classified into two categories or mul