CN-116805318-B - Medical image segmentation method based on dynamic deformable convolution and sliding window self-adaptive complementary attention mechanism

CN116805318BCN 116805318 BCN116805318 BCN 116805318BCN-116805318-B

Abstract

The invention discloses a medical image segmentation method based on a dynamic deformable convolution and sliding window self-adaptive complementary attention mechanism, which improves the target perception of micro focus and larger deformation in a medical image and the discrimination capability between a segmentation target and a background. The dynamic deformable convolution can flexibly change the weight coefficient and deformation bias through task self-adaptive learning, enhance the expression capability of local features of the image and realize self-adaptive extraction of the spatial features. The sliding window self-adaptive complementary attention mechanism realizes global modeling of the cross dimension of the medical image through self-attention branches of weight coefficient self-adaptive learning, overcomes the defect of insufficient modeling of the cross dimension relationship between the space and the channel in the conventional way, and can capture long-distance correlation characteristics of the cross dimension in the image. And the parallel interaction mode combines the local and global features under different resolutions to enhance the characterization learning, and the local features and the global features in the medical image are reserved to the maximum extent.

Inventors

LEI TAO
SUN RUI
DU XIAOGANG
YANG ZIYAO
XUE MINGYUAN
Min Zhongdan

Assignees

陕西科技大学

Dates

Publication Date: 20260505
Application Date: 20230614

Claims (6)

1. A medical image segmentation method based on a dynamic deformable convolution and sliding window adaptive complementary attention mechanism, comprising: 1) Loading a medical image data set and preprocessing; 2) The method comprises the steps of constructing CNNs and a transducer fusion network, training, wherein the CNNs and the transducer fusion network are formed by parallel interaction of a double-branch network, a dynamic deformable convolution is operated in a CNNs branch, a sliding window self-adaptive complementary attention mechanism is arranged in the transducer branch, the dynamic deformable convolution can adaptively learn a convolution kernel deformation offset and a convolution kernel weight coefficient according to specific medical image segmentation tasks and data distribution, and meanwhile, end-to-end training is realized through the counter-propagation of the network, so that the double change of the convolution kernel shape and weight is realized; The shape change of the convolution kernel in the dynamic deformable convolution is based on the learning of the deformation offset by a network, and the network firstly utilizes a square convolution kernel For input feature map Sampling and then weighting matrix Weighted summation, deformation offset learning Is expressed as: at this time, output the feature map Each of the position coordinates of (a) Expressed as: when in the weight matrix Introducing deformation offset After that, the processing unit is configured to, Is that And (2) the total length of: through network learning, an offset matrix with the same size as the input feature map is finally obtained, and the dimension of the matrix is 2 times of that of the input feature map; The convolution kernel weight change in the dynamic deformable convolution is determined by introducing weight coefficients, and for the learning of the weight coefficients, the characteristic diagram output result of the conventional convolution is expressed as follows: Wherein the method comprises the steps of In order to activate the function, After the weight coefficient is introduced into the convolution kernel weight matrix, the output result of the feature map after dynamic deformable convolution is as follows: For the number of weight coefficients, Is a weight coefficient with a learnable parameter; Of the four transducer self-attention branches, two of the branches capture the correlation of the channel and space respectively, and the other two branches capture the channel dimension respectively Dimension to space Channel dimension Dimension to space Correlation between; after a shift window partitioning method is adopted in the sliding window self-adaptive complementary attention mechanism, the calculation process of the continuous transducer block is as follows: Wherein the method comprises the steps of And Respectively representing output characteristics of the sliding window self-adaptive complementary attention and the compact convolution projection, wherein W-ACAM represents the window self-adaptive complementary attention, SW-ACAM represents the sliding window self-adaptive complementary attention, and LPM represents the compact convolution projection; 3) And outputting the segmentation prediction results of the CNNs branch and the transducer branch, and carrying out fusion judgment on the segmentation prediction results to output the final optimized segmentation result.
2. The method for segmenting the medical image based on the dynamic deformable convolution and sliding window self-adaptive complementary attention mechanism according to claim 1, wherein the preprocessing comprises the steps of uniformly adjusting the original image size of a data set to 224×224, performing random scaling operation on the image in the original data set, wherein the scaling ratio is between 0.9 and 1.5, and performing random vertical overturning, horizontal overturning, 90 ° rotation or 270 ° rotation on the image in the original data set according to the ratio of 0.5.
3. The medical image segmentation method based on the dynamic deformable convolution and sliding window self-adaptive complementary attention mechanism according to claim 1, wherein the sliding window self-adaptive complementary attention mechanism adopts a sliding window calculation mode to calculate the self-attention in a local window, and simultaneously a compact convolution projection is set, and the method comprises the steps of firstly reducing the local size of a medical image through sliding window operation, then compressing the channel dimension of the medical image through the compact convolution projection, and finally calculating the self-attention.
4. The medical image segmentation method based on the dynamic deformable convolution and sliding window adaptive complementary attention mechanism according to claim 1, wherein the attention calculation process of each transducer self-attention branch is as follows: Wherein the relative position is offset ; Respectively a query matrix, a key matrix and a value matrix; Dimensions representing queries and keys; representing the number of blocks; through four parallel transducer self-attention branches 、、 And After calculation, the final feature fusion output result is: 、、 And Is a learnable parameter used to adaptively control the importance of each attention branch to spatial and channel information in a particular segmentation task.
5. The method for segmenting a medical image based on a dynamic deformable convolution and sliding window adaptive complementary attention mechanism according to claim 1, wherein three loss functions are set in the CNNs and Transformer fusion network, namely, integral loss Loss of CNNs branches And loss of transducer branches : Wherein, the Representing the mean square error loss of the signal, Representing the loss of Dice and the associated loss of speed, 、、 And Respectively representing and inputting images The corresponding method finally outputs a prediction graph, a CNNs branch output prediction graph, a transform branch output prediction graph and a label graph.
6. The method for medical image segmentation based on a dynamic deformable convolution and sliding window adaptive complementary attention mechanism according to claim 5, wherein the final loss function of the CNNs and Transformer fusion network is expressed as: Wherein, the , Is a gaussian-like rising curve, and is characterized by that, Representing the number of training total rounds.

Description

Medical image segmentation method based on dynamic deformable convolution and sliding window self-adaptive complementary attention mechanism Technical Field The invention belongs to the technical field of image processing and the field of pattern recognition, and particularly relates to a medical image segmentation method based on a dynamic deformable convolution and sliding window self-adaptive complementary attention mechanism. Background Medical image segmentation plays a very important role in the field of medical image processing and is also one of the core technologies of computer-aided diagnosis and therapy systems. The traditional method relies on a doctor with special experience to manually label and divide a large amount of medical image data, which is time-consuming and labor-consuming and is easily influenced by subjective factors. In recent years, with the rapid development of artificial intelligence and computer technology, researchers develop a plurality of new automatic segmentation algorithms for medical images on the basis of a large number of experiments. Existing medical image segmentation methods are mainly based on deep learning methods, and the methods can be roughly divided into CNNs-based and the Transformer-based networks. The deep learning-based algorithm is capable of learning high-dimensional feature information of medical images through a multi-layer network structure. Among various deep learning networks related to medical image segmentation, convolutional neural networks (Convolutional Neural Networks, CNNs) perform extremely well. CNNs can effectively learn distinguishing characteristics and extract priori knowledge from a large-scale medical dataset, so that the learning device becomes an important component of a modern intelligent medical image analysis system. In 2015, ronnebreger et al inspired by the FCN network, a first end-to-end network U-Net for medical image segmentation was designed in ISBI CELL TRACKING CHALLENGE. The U-Net network is a symmetrical encoding and decoding structure, and the unique design structure can fully utilize the local detail information of the medical image, thereby reducing the dependence of the network on a training data set. Therefore, the U-Net network can still obtain good medical image segmentation effect under the condition of smaller data set. Alom et al designed R2U-Net by combining U-Net, resNet and recurrent neural network (RCNN) which achieved good performance in multiple medical image segmentation datasets of blood vessels, retina, etc. Following the introduction of dynamic convolution into U-Net by Gu et al, CA-Net was proposed. Experiments in a medical data set prove that the CA-Net not only can improve the segmentation precision of medical images, but also can reduce the training time of a model. Based on U-Net, yang et al have borrowed from the ideas of residual connection and deformable convolution, and have added a residual deformable convolution (residual deformable convolution) in U-Net, providing DCU-Net. DCU-Net exhibits a more advanced segmentation effect on the DRIVE medical dataset than U-Net. Lei et al designed SGU-Net on the basis of U-Net, and proposed ultra-light convolution module and additional ADVERSARIAL SHAPE-constraint can remarkably improve the segmentation precision of the abdomen medical image through self-supervision training. CNNs, while making great progress in network architecture, the major factors in its success are due to invariance in handling different scales and generalized bias in local modeling. This fixed receptive field, while improving the computational efficiency of CNNs, limits its ability to capture relationships between distant pixels in medical images, lacking the ability to model medical images in long ranges. In 2017 Vaswani et al, proposed the first transducer network because of its unique design structure, which enabled transducers to have the ability to input of indefinite length, build long-range dependent modeling, and capture global information. The success of the transducer is mainly due to the self-attention mechanism (self-attention mechanism, SA) because it is able to capture long-range dependencies. As the Transformer obtains excellent performance in the NLP task, viT applies the Transformer to the image processing field for the first time, and captures global context information of an input image through a plurality of cascaded Transformer layers, so that the Transformer has achieved great success in the image classification task. Next Chen et al propose TransUNet, the advent of this network has brought forward a completely new situation in which Transformer is used in the field of medical image segmentation. Since TransUNet directly uses the transducer model in NLP for image segmentation, the input image block size is fixed and the calculation amount is large. Valanarasu et al address the shortcomings of TransUNet and propose MedT in combination with a gating mechanism