CN-121999339-A - Medical image segmentation method with dynamic token processing and super-pixel soft and hard consistency enhancement
Abstract
The present disclosure relates to a medical image segmentation method with dynamic token processing and super-pixel soft-hard consistency enhancement. The method comprises the steps of collecting images, carrying out characteristic analysis processing on the images, embedding a lightweight dynamic token in a bottleneck layer of an encoder, merging and reconstructing processed image data, combining super-pixel-level soft and hard mixing and consistency enhancement strategies, designing a mixing consistency enhancement model, processing reconstructed image data through the model, and designing a novel loss function L all by utilizing binary cross entropy loss and Dice coefficient loss. By introducing a designed lightweight dynamic token, combining and reconstructing the picture, small-size focus segmentation and background noise can be better processed while the lightweight is maintained, and the segmentation accuracy and robustness are improved; in addition, the feature learning capability and the convergence performance of the medical image segmentation method under the limited labeling data can be improved by utilizing the mixed consistency enhancement model.
Inventors
- KONG WEIWEI
- WANG YUCHEN
Assignees
- 西安邮电大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260106
Claims (10)
- 1. The medical image segmentation method for enhancing the consistency of dynamic token processing and super-pixel softness is characterized by comprising the following steps of: collecting an image, carrying out characteristic analysis processing on the image, embedding a lightweight dynamic token in a bottleneck layer of an encoder, and merging and reconstructing processed image data; Combining super-pixel level soft and hard mixing and consistency enhancement strategies, designing a mixing consistency enhancement model, and processing reconstructed image data through the model; Using the binary cross entropy loss and the Dice coefficient loss, a new loss function L all ,L all is designed to optimize L task and L cons by weighted combination: , , , L BCE and L Dice represent binary cross entropy loss and Dice coefficient loss, respectively, p is the prediction result, y is the true label, p 1 and p 2 are probability maps of two predicted branches, respectively, And For the thresholded binary mask, L task represents task loss, L cons represents consistency loss, The task loss of the two branches is respectively corresponding, lambda is the weight of the consistency loss, and L all is the total task loss.
- 2. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 1, wherein the step of combining and reconstructing the processed image data comprises: Taking the acquired image as input, flattening the acquired image into a token sequence, and storing a reconstruction matrix M constructed according to a token position mapping relation; Calculating a dependency score for each token; calculating the joint distance between the token; calculating the local density and the high-density distance of each token, and obtaining a density distance score according to the local density and the high-density distance; the dependence score and the density distance score are added to obtain a comprehensive score, and the front rN token is selected as a clustering center according to a sparse ratio parameter r and a total token number N; The semantic token set Z is organized into a feature tensor X, feature refining is carried out by convolution, the feature tensor X is remapped into a two-dimensional feature map Y by a reconstruction matrix M which is constructed according to a token-position mapping relation and is used as an output of an image.
- 3. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 2, The refined feature vector is remapped into the two-dimensional feature map Y by using the corresponding relation between the token and the original space position as a core through a sparse-dense matrix selection strategy, and the calculation process is as follows: Y=reshape(M(Conv 1×1 (X))), Conv 1×1 (X) is used for carrying out convolution operation on the feature tensor X, M (the.+ -.) is used for mapping the convolved features to a two-dimensional space grid through a reconstruction matrix M, M is constructed based on a token-position mapping relation, and reshape (the..) is used for reshaping the mapping result into a two-dimensional feature map Y as a final output.
- 4. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 2, The specific calculation process of the dependence score of each token is as follows: , Where M h ∈R N×N represents the attention map of the H attention header, H d =h/2 is the number of important attention headers used for calculation, N is the total number of token, norm (-) represents the normalization operation, Representing the dependency score of token i.
- 5. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 1, The hybrid consistency enhancement model includes randomly scrambling samples within a batch, generating a pair of samples (I A , I B ) and a corresponding label (L A , L B ), superpixels Mixup, using each sample in the batch, sampling a mixing coefficient lambda k, from Beta distribution to achieve soft mixing, superpixels CutMix, using Bernoulli distribution to decide whether to replace superpixels, achieving hard mixing; After construction of superpixels Mixup and CutMix, two sets of enhancement inputs (I 1 , L 1 ) and (I 2 , L 2 ) are randomly generated for the same batch in one training iteration, and each set of inputs is randomly selected for use Mixup or CutMix.
- 6. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 5, Assuming that the size of the batch is B, the number of image channels is C, the height and width are H multiplied by W, the superpixel diagram corresponding to each sample is sp-map, and the superpixel number set is R= { R 1 , R 2 , …, R k , …, R n }; The construction process of super-pixels Mixup is that for each super-pixel R k , the mixing coefficient lambda k is sampled from the Beta distribution, each super-pixel Mixup mask is equal to lambda k of the super-pixel, each super-pixel carries out soft mixing between the original image and the comparison image according to the value of Mixup mask, and the formula is: , Wherein I A and I B are two input images, L A and L B are corresponding labels of the images, Mixup mask, I mixup and L mixup are respectively soft blended images and labels; the construction process of the superpixel CutMix is that for each superpixel R k , the Bernoulli distribution is used to decide whether to replace: , Wherein S is { 1..N } is the set of superpixels selected, the probability of selection is P (k e S) =beta-Cutmix, M binary is binarized CutMix mask, indicating whether the superpixel is completely replaced.
- 7. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 6, The calculation formula of the hard mixing process is as follows: , Wherein, I A and I B are two input images, L A and L B are corresponding labels of the images, and I Cutmix and L Cutmix are hard-mixed images and labels, respectively.
- 8. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 7, After construction of superpixels Mixup and CutMix, two sets of enhancement inputs (I 1 , L 1 ) and (I 2 , L 2 ) are randomly generated for the same batch in one training iteration, and each set of inputs is randomly selected for use Mixup or CutMix.
- 9. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 8, Setting a predicted result p=sigma (logits) and a real label y, and calculating task loss L task ; Wherein logits is a return value after the steps, sigma is a sigmoid activation function, p is a numerical value after logits is processed by the sigmoid activation function, L BCE and L Dice respectively represent binary cross entropy loss and Dice coefficient loss, and corresponding mathematical expressions are respectively: wherein epsilon is a smooth term and zero denominator is avoided.
- 10. The method for medical image segmentation with dynamic token processing and super-pixel soft and hard consistency enhancement according to claim 2, In the fusion process, the weights are normalized by using a softmax function, The specific calculation process is as follows: Wherein z k represents semantic token characteristics fused by a kth cluster center, C k represents a token set belonging to a center k, x i is original token characteristics, and p i is token importance weight obtained through linear layer learning.
Description
Medical image segmentation method with dynamic token processing and super-pixel soft and hard consistency enhancement Technical Field The invention relates to the technical field of network protocols and quantum technologies, in particular to a medical image segmentation method for dynamic token processing and super-pixel soft and hard consistency enhancement. Background With the rapid development of medical imaging technology, medical image segmentation has become a core task of a computer-aided medical diagnosis system, and meanwhile, the medical image has inherent characteristics of complex organ background, low contrast and the like, so that higher requirements on segmentation accuracy are put forth. Although the U-Net variant based on the Transformer has been successful in the field of medical image segmentation, when the variant is applied to a real clinical scene, especially a computing resource limited environment, two challenges are faced, namely, a large model with a complete attention mechanism based on a CNN-Transformer mixed framework can effectively capture long-range dependency relationships, but huge computing overhead and parameter quantity make the variant difficult to deploy in edge equipment. Moreover, many models using only light weight attention have problems of weak feature extraction capability and poor robustness in generalization on different data sets due to limited model capacity. Therefore, how to improve the accuracy and robustness of segmentation while keeping the model lightweight, and especially when dealing with small lesion segmentation and background noise, achieving efficient feature learning and model convergence is a major problem. Accordingly, there is a need to provide a new solution to ameliorate one or more of the problems presented in the above solutions. Disclosure of Invention The invention aims to provide a medical image segmentation method with enhanced dynamic token processing and super-pixel soft and hard consistency, which is used for solving the practical problems of poor segmentation precision of small-size focuses, insufficient background noise suppression and difficult deployment of high-performance large models in medical image segmentation tasks. The invention provides a medical image segmentation method for enhancing dynamic token processing and super-pixel soft and hard consistency, which is characterized by comprising the following steps: collecting an image, carrying out characteristic analysis processing on the image, embedding a lightweight dynamic token in a bottleneck layer of an encoder, and merging and reconstructing processed image data; Combining super-pixel level soft and hard mixing and consistency enhancement strategies, designing a mixing consistency enhancement model, and processing reconstructed image data through the model; Using the binary cross entropy loss and the Dice coefficient loss, a new loss function L all,Lall is designed to optimize L task and L cons by weighted combination: , , , L BCE and L Dice represent binary cross entropy loss and Dice coefficient loss, respectively, p is the prediction result, y is the true label, p 1 and p 2 are probability maps of two predicted branches, respectively, AndFor the thresholded binary mask, L task represents task loss, L cons represents consistency loss,The task loss of the two branches is respectively corresponding, lambda is the weight of the consistency loss, and L all is the total task loss. Preferably, the combining and reconstructing comprises the steps of: Taking the sampled image as input, flattening the sampled image into a token sequence, and storing a reconstruction matrix M constructed according to a token position mapping relation; Calculating a dependency score for each token; calculating the joint distance between the token; calculating the local density and the high-density distance of each token, and obtaining a density distance score according to the local density and the high-density distance; the dependence score and the density distance score are added to obtain a comprehensive score, and the front rN token is selected as a clustering center according to a sparse ratio parameter r and a total token number N; The semantic token set Z is organized into a feature tensor X, feature refining is carried out by convolution, the feature tensor X is remapped into a two-dimensional feature map Y by a reconstruction matrix M which is constructed according to a token-position mapping relation and is used as an output of an image. Preferably, the refined feature vector uses the corresponding relation between the token and the original space position as a core through a sparse-dense matrix selection strategy, the token is remapped into the two-dimensional feature map Y, and the calculation process is as follows: Y=reshape(M(Conv1×1(X))), Conv 1×1 (X) is used for carrying out convolution operation on the feature tensor X, M (the.+ -.) is used for mapping the convolved features