CN-121982308-A - CNN and subtractive vision Mamba hybrid neural network medical image segmentation method integrating golden section proportion

CN121982308ACN 121982308 ACN121982308 ACN 121982308ACN-121982308-A

Abstract

The invention relates to the crossing field of medical image segmentation technology and computer aided diagnosis, in particular to a CNN and subtraction vision Mamba hybrid neural network medical image segmentation method integrating golden section proportion. The method solves the problems that in the field of medical image segmentation, the traditional CNN is limited in capability when processing long-sequence information due to the inherent limitation of a convolution kernel receptive field, and Vision Transformer is high in calculation cost when coping with a lightweight downstream task due to the secondary calculation complexity of an attention mechanism. The invention strengthens the memory capacity of the model to the small target by separating the background of the image, designs a promtt-guide mechanism, and guides the network by continuously using the original image as a prompt word in the network training so as to prevent the deep module from forgetting the local detail of the original image. The expressive power is improved by various efficient convolution modules, and a balance is achieved between the computational efficiency and the segmentation accuracy.

Inventors

Gao Peiting
ZHANG RUIPING
FENG JINKUN
ZHAO JINGXUAN

Assignees

太原理工大学

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (5)

1. A CNN and subtraction vision Mamba hybrid neural network medical image segmentation method integrating golden section proportion is characterized by comprising the following steps: S1, preprocessing a sample medical image, namely correcting different resolutions of the sample medical image to the same set resolution; S2, constructing ESVM-Net neural network, which comprises a prompt word model, an encoder network, a decoder network, a bottleneck layer and a jump connection processing module; The encoder network comprises a multi-stage encoder, each stage of encoder uses a subtractive visual Manba model, the visual Manba model learns the background characteristics of an image through a two-way design, and then the extracted target characteristics are submitted to a multi-core convolution block for processing; The decoder network comprises multi-stage decoders, each stage of decoder adopts a local feature extraction module constructed by a multi-core convolution block and SA and CA attention mechanisms to be in jump connection, and further adopts a channel attention mechanism to fuse information of the encoder and the decoder; s3, transmitting the processed image into a prompt word model and an encoder network, wherein each level of encoder except the last level of encoder performs feature extraction on the preprocessed medical image, and respectively transmits the extracted feature image to the next level of encoder and a jump connection processing module, and the last level of encoder inputs the extracted feature image into the first level of decoder; S4, except the first-stage decoder, the other decoders are connected with the input of the processing module and the input of the upper-stage decoder in a jumping manner, reconstruction processing is carried out on the input of the processing module and the input of the upper-stage decoder to obtain a feature image with the same size as the upper-stage decoder, then deconvolution is used for enlarging the size of the feature image, and finally the feature image is restored to the size of an original-size image through processing of the multi-stage decoder; S5, training the whole network by using a combined loss function of BCEDiceLOSS and MCEloss; s6, inputting the real-time image into a model to obtain a final medical segmentation image after training in the steps S1-S5 is finished.
2. The method for segmenting the medical image by combining CNN and subtracting vision Mamba with golden section ratio according to claim 1 is characterized in that in the ESVM-Net neural network, a prompt word model adopts pre-trained DenseNet, all parameters of the prompt word model are frozen in the training process, the output of the prompt word model is simultaneously connected with five-stage encoders corresponding to each stage of encoders, the output end of the prompt word model is divided into two paths, one path of output is added with the other path of output through a bottleneck layer through element by element, the five-stage encoders are sequentially connected through two paths, the first-stage encoder is used for inputting the preprocessed medical image, the first-stage encoder is connected with a decoder through a jump connection processing module, the fifth-stage encoder is sequentially connected with the first-stage decoder, the last-stage decoder is further connected with the next-stage decoder through the jump connection processing module, and the received image is subjected to bilinear interpolation up-sampling through a local feature extraction module, SA and CA processing, and then the feature map is obtained.
3. The method for segmenting a medical image of a CNN-subtractive vision Mamba hybrid neural network in accordance with claim 2, wherein in steps S3-S4, the feature map extraction specifically comprises: S31, enabling the preprocessed medical image to pass through a prompt word model and a first-stage encoder, wherein the medical image generates characteristic image data C1_target and C2_background after passing through the first-stage encoder, inputting target characteristics into a jump connection processing module, inputting target and background into a next-stage encoder, adding output of the prompt word model, and obtaining a final result of D1_target and D1_background; S32, executing the operation process of the step S31 again by the characteristic image data D1_target and D1_background in a second-level encoder to obtain the characteristic image data D2_target and D2_background; S33, executing the operation process of the step S31 again by the characteristic image data D2_target and D2_background in a third-level encoder to obtain the characteristic image data D3_target and D3_background; S34, executing the operation process of the step S31 again by the characteristic image data D3_target and D3_background in a fourth-level encoder to obtain the characteristic image data D4_target and D4_background; s34, executing the operation process of the step S31 again by the characteristic image data D4_target and D4_background in a fifth-stage encoder to obtain characteristic image data D5_target and D5_background; S36, inputting the characteristic image data D1_target, D2_target, D3_target and D4_target into a jump connection processing module for processing to obtain characteristic diagrams A1, A2, A3 and A4 respectively; s41, inputting the characteristic image data D5_target into a first-stage decoder for processing to obtain a characteristic diagram B1; s42, inputting the feature map B1 into a second-stage Decoder for processing, and adding the feature map extracted by the jump connection processing module to obtain a feature map C1, wherein the formula is C1=decoder (B1) +A4; S43, repeating the operation S42 in the rest decoder stages until the image is restored to the resolution of the original image, wherein the characteristic diagram is marked as C4; s44, processing the feature map C4 through a Sigmoid function to obtain a final feature map.
4. A method of medical image segmentation incorporating a golden section ratio CNN and subtractive vision Mamba in accordance with any one of claims 1-3 wherein the segmentation network is trained in S5 using the following loss function: ; Wherein, the In order to split up the loss of the network, 、 Setting coefficients for the set coefficients; ; wherein n is the total number of samples, y represents the real label, and represents the output of the segmentation network after the labeled image is processed.
5. A method for segmenting a medical image of a CNN and subtractive vision Mamba hybrid neural network in accordance with any one of claims 1-3, wherein the encoder uses a subtractive vision mannba model SVMB which includes a visual state space block VSSB, a point convolution PWC, and a local feature extraction module LFEM, the VSSB and PWC receiving as two parallel inputs the input of the previous image feature data, the VSSB output being split into two paths, one path and PWC output being input to LFEM after the element-by-element subtraction, the other path being output to SVMB after the element-by-element addition of the other path and LFEM, and the other path of output of LFEM being output to SVMB after the element-by-element addition of the output of the cue word model; The VSSB comprises LN, linear, and LN which are sequentially connected and receive the output result of the previous stage, wherein one line is sequentially provided with SiLu, SS2D and LN according to the trend of data, and the other line is output through Scale, and the output of the two lines is input into the other Linear after being subjected to hadamard product; The LFEM sequentially comprises PWC, BN/ReLu, a multi-core residual convolution block MRCB, BN and PWC, wherein the first PWC receives external input, MRCB comprises three groups of parallel depth separable convolution DWCs and BN/ReLu connected with each depth separable convolution, the three groups of parallel structures respectively carry out element-by-element addition operation on self output results and BN/ReLu output results, the element-by-element addition results are input to the BN and the second PWC after element-by-element addition, and finally the output result of the second PWC and the last output result received by the LFEM are added element-by-element to obtain the final output of the LFEM; The jump connection processing module adopts an efficient feature fusion attention block EFFA, and comprises two paths of depth separable convolution DWCs and BN/ReLu which are parallel to each other, wherein the two paths of BN/ReLu output results are added element by element and then input to an efficient channel attention ECA, and meanwhile, an external input result and an output result of the ECA are output after being subjected to hadamard product; The high-efficiency channel attention ECA comprises a maximum pooling layer, an average pooling layer, PWC/BN and a Sigmoid function, wherein external input results are respectively associated after passing through the maximum pooling layer and the average pooling layer, and then are sequentially output through the PWC/BN and the Sigmoid function.

Description

CNN and subtractive vision Mamba hybrid neural network medical image segmentation method integrating golden section proportion Technical Field The invention relates to the crossing field of medical image segmentation technology and computer aided diagnosis, and particularly relates to a CNN and subtraction vision Mamba hybrid neural network medical image segmentation method which is used for accurately and efficiently segmenting a lesion area of a medical image through deep learning calculation so as to facilitate the subsequent clinical medical diagnosis. Background Medical image segmentation techniques play an important role in clinical medical decisions. Convolutional Neural Networks (CNNs) and Vision Transformer-based networks have been two mainstream approaches to solving such problems in the last decade. For convolutional neural networks, it is difficult to clearly handle remote dependencies even with large convolutional kernels, subject to the nature of the convolution itself. Regarding the transform method, although long-range dependence between pixels can be established by the Attention mechanism, the complexity of the second time can lead to huge calculation amount and memory consumption. To solve the above model problems, many researchers have devised state-of-the-art (SOTA) models, such as UNET, transUNet, swin UNETR, that combine the advantages of both transducer and CNN. Although these models have a better numerical performance. The attention mechanism in the transducer involved in this model still requires the creation of a sequence x sequence size matrix to handle the remote reliance of the image. Although the feature map can be reduced in size by convolution, the complexity of the model increases rapidly with the size of the volume when applied to large-sized medical images, and multiple applications of convolution can also result in forgetting that the network is globally dependent. To overcome the above difficulties, state Space Models (SSM) were introduced as a module to build long-sequence neural networks for analysis, mamba (SSM) has evolved. Considering the linear calculation complexity and parallel training structure, cinnabar and the like firstly designs a Vision Mamba model related to Mamba for image recognition. Without the attention mechanism, the model can complete image reasoning on 1248×1248, saves 86.8% of GPU memory and has great potential in many fields. Subsequent studies have found that the same focus on the global receptive field may introduce more redundant information independent of the target features when 2D images are manipulated by the 2D-selective-scan (SS 2D) module of SSM. In addition, the introduction of redundant information by the receptive field is reduced by introducing a high-order 2DSELECTIVE-scan (SS 2D) on the method of Rao et al, and a better effect is shown. But this model tends to lose detail of the original image with increasing network depth for low-scale feature maps. Disclosure of Invention The invention provides a CNN and subtracting vision Mamba hybrid neural network medical image segmentation method integrating golden section proportion, which aims to solve the technical problems that in the medical image segmentation field, the traditional CNN is limited in capability when processing long sequence information due to inherent limitation of convolution kernel receptive field, and Vision Transformer (Vit) is too high in calculation cost when coping with a lightweight downstream task due to secondary calculation complexity of a attention mechanism. The invention adopts the following technical scheme that the CNN and subtraction vision Mamba hybrid neural network medical image segmentation method integrating golden section proportion comprises the following steps: S1, preprocessing a sample medical image, namely correcting different resolutions of the sample medical image to the same set resolution; S2, constructing ESVM-Net neural network, which comprises a prompt word model, an encoder network, a decoder network, a bottleneck layer and a jump connection processing module; The encoder network comprises a multi-stage encoder, each stage of encoder uses a subtractive visual Manba model, the visual Manba model learns the background characteristics of an image through a two-way design, and then the extracted target characteristics are submitted to a multi-core convolution block for processing; The decoder network comprises multi-stage decoders, each stage of decoder adopts a local feature extraction module constructed by a multi-core convolution block and SA and CA attention mechanisms to be in jump connection, and further adopts a channel attention mechanism to fuse information of the encoder and the decoder; s3, transmitting the processed image into a prompt word model and an encoder network, wherein each level of encoder except the last level of encoder performs feature extraction on the preprocessed medical image, and respectively transmits the e