CN-116152268-B - Multi-scale intestinal polyp segmentation method integrating attention mechanisms

CN116152268BCN 116152268 BCN116152268 BCN 116152268BCN-116152268-B

Abstract

The invention discloses a multi-scale intestinal polyp segmentation method integrating an attention mechanism. The method is characterized by comprising the steps of 1, constructing a multi-scale effective semantic fusion module, extracting more abundant and effective multi-scale semantic information, 2, constructing a brand-new coding-decoding depth network segmentation model, improving polyp segmentation accuracy, and screening out useless features for segmentation tasks as far as possible by extracting enough context information and global information under different sensing fields, thereby overcoming the defect that semantic information is limited and redundant in a traditional coding-decoding structure, and having excellent segmentation and generalization performances on two-dimensional enteroscopy images with polyp areas with different shapes and sizes.

Inventors

LI MIN
Shan Fangmei
WANG MENGWEN

Assignees

南京理工大学

Dates

Publication Date: 20260512
Application Date: 20211116

Claims (2)

1. A multi-scale intestinal polyp segmentation method integrating attention mechanisms, which is characterized by comprising the following steps: First, the enteroscopy image in the training set is processed Inputting depth network segmentation model and inputting image The high-level coding characteristic diagram is obtained after the coding path operation consisting of four identical coding blocks Subsequent feature map Obtaining an input characteristic diagram of a decoding path through two continuous common convolution layer processes ; Second step, feature map Obtaining a characteristic diagram after the decoding path operation consisting of four identical decoding blocks Subsequent feature map The input image of the model is obtained through a single common convolution layer processing Model output prediction segmentation tag map with same resolution ; Third step, using the training set to truly divide the label graph Model output prediction segmentation label graph Calculating model loss through a loss function, and then carrying out iterative training on the network through an optimization algorithm to continuously reduce the model loss until the depth network segmentation model achieves the optimal segmentation effect; In the first step, enteroscopy images in a training set are acquired Inputting depth network segmentation model and inputting image Through four identical coding blocks The high-level coding characteristic diagram is obtained after the operation of the composed coding path Subsequent feature map Obtaining an input characteristic diagram of a decoding path through two continuous common convolution layer processes Each coding block consists of two continuous common convolution layers and a maximum pooling layer, and is obtained respectively Output characteristic diagram of second common convolution layer For subsequent jump connection operations; in the second step, feature map Through four identical decoding blocks Obtaining characteristic diagram after operation of decoding path of composition Subsequent feature map The input image of the model is obtained through a single common convolution layer processing Model output prediction segmentation tag map with same resolution Each decoding block consists of a transposed convolution layer, a common convolution layer, a multi-scale effective semantic fusion module and a common convolution layer in sequence The input characteristic diagram of the first common convolution layer is the output characteristic diagram of the previous transposed convolution layer and the output characteristic diagram of the previous transposed convolution layer Splicing in the channel dimension; The operation steps of the multi-scale effective semantic fusion module are as follows: (1) Multi-scale effective semantic fusion module input feature map Four parallel average pooling layers with different pooling proportions are respectively processed to obtain a four-level pyramid feature map; (2) Each level pyramid feature map is processed by a common convolution layer to reduce the number of channels to a module input feature map Number of channels Then, the module input characteristic diagram is restored through bilinear interpolation up-sampling Resolution, splicing in the channel dimension to obtain a feature map Wherein 、、 Respectively is a characteristic diagram Height, width and channel number of (a); (3) Feature map Further carrying out feature redirection processing and feature graphs Firstly, through a global tie pooling layer, a pooling formula of the global tie pooling layer is expressed as follows: Wherein, the Representing the feature map obtained after global averaging pooling, c representing the channel subscript, Representing the height and width of the feature map on the c-th channel, respectively, and then the feature map Compressing the number of channels through the first full connection layer to , Representing scaling parameters, entering a second full connection layer after activation of the ReLU activation function to restore the number of channels to Obtaining a characteristic diagram And finally, utilizing sigmoid activation function to make the feature map The values within are defined in Within a range as coefficients to be multiplied to a module original input feature map Is expressed as: wherein S is Representing the coefficients obtained after sigmoid activation, Represents channel-by-channel multiplication, O Representing a feature map obtained after feature redirection treatment; (4) Map the characteristic map And module original input feature diagram And splicing in the channel dimension, and carrying out the same characteristic redirection processing operation again to finally obtain an output characteristic diagram of the multi-scale effective semantic fusion module, wherein the output characteristic diagram is used as the input of a subsequent common convolution layer.
2. The method for multi-scale intestinal polyp segmentation with integrated attention mechanism according to claim 1, wherein in the third step, a tag map is truly segmented by using training set Model output prediction segmentation label graph Model loss is calculated through a loss function, and then model loss is continuously reduced through an optimization algorithm iterative training network until a depth network segmentation model achieves an optimal segmentation effect, wherein the loss function adopts a two-class cross entropy loss function commonly used for an image segmentation problem: 。

Description

Multi-scale intestinal polyp segmentation method integrating attention mechanisms Technical Field The invention relates to the technical field of image segmentation and deep learning, in particular to a novel depth network construction method suitable for polyp segmentation of a two-dimensional enteroscopy image. Technical Field In the medical field, experts rely on their expertise and abundant clinical experience to detect and manually segment polyp regions in enteroscopy images, however, due to the abundant variability of polyp appearance and the characteristics of blurring boundaries between polyp appearance and background regions, not only is the workload of medical workers increased, but also individual differences between patients and high subjectivity of doctors can cause erroneous segmentation. The deep learning and the medical imaging technology are combined to realize accurate segmentation of polyp areas, so that the workload of medical workers can be reduced, the polyp segmentation speed can be accelerated, and the polyp segmentation accuracy can be improved. In recent years, with the increase in the computational power of Graphics Processing Units (GPUs), convolutional Neural Networks (CNNs) have been widely used for visual recognition, and significant results have been achieved in the field of pixel-level image segmentation. The Unet depth network segmentation model based on the coding-decoding structure, which is proposed by researchers, adopts a coding path to extract image semantic information required by a segmentation task, adopts a decoding path to gradually restore resolution, finally obtains a predictive segmentation label graph with the same size as the input image resolution, and promotes the development of the medical image segmentation field. In order to alleviate the large semantic differences existing between codec path features, researchers have proposed MultiResUNet a depth network segmentation model that gradually reduces such semantic differences by introducing a convolutional chain with residual connections in the jump connections. In order to realize the cyclic reuse of the extracted features of the building blocks, researchers propose a bidirectional O-shaped network (Bio-Net) depth segmentation model, and an O-shaped reasoning path is formed by constructing bidirectional jump connection between coding and decoding paths, so that feature multiplexing and better feature refinement are realized. In order to alleviate the interference of a large number of useless redundant features existing in a network on a segmentation task, researchers propose an Attention U-Net depth network segmentation model, and by introducing a novel Attention gate model for medical image segmentation into a decoding path, the network can automatically learn to focus on target object areas with different sizes and shapes, so that the features useful for the segmentation task are highlighted. In order to effectively deepen the network depth and alleviate the gradient vanishing problem, researchers propose an R2U-Net depth network segmentation model, and by introducing residual connection and circular convolution in a coding and decoding path building block, the network can obtain the characteristic with more expressivity, and the segmentation performance is improved while the parameter number is not increased. In order to explore the importance of the features of different depths of the coding path, researchers propose Unet ++ depth network segmentation models, which share the same feature extractor by introducing short connections and long connections, and the features of different levels are restored by different decoding paths, so that the network can automatically learn the importance of the features of different depths. The existing depth network segmentation model based on the coding-decoding structure usually adopts a continuous convolution and downsampling mode to continuously extract multi-scale semantic information of an image, however, due to the fact that the size of a receptive field is limited, the extracted context information is often very limited, and is insufficient for effectively judging a segmentation task, and problems such as an imperfect segmentation mask or an inaccurate object boundary are often caused. Meanwhile, the convolution operation can indifferently fuse the space and channel information on all the feature graphs in the local receptive field range, so that a large number of features invalid for the segmentation task exist in the network, a large amount of computing resources are wasted, and the performance of the segmentation task is also interfered. Disclosure of Invention The invention discloses a multi-scale intestinal polyp segmentation method integrating an attention mechanism. The method constructs a brand-new coding-decoding deep learning image segmentation network frame, can extract enough context information and global information under different sensitivity fi