CN-121982303-A - Medical image segmentation method and system based on multi-scale global context awareness and boundary guidance fusion

CN121982303ACN 121982303 ACN121982303 ACN 121982303ACN-121982303-A

Abstract

The invention belongs to the technical field of medicine, and discloses a medical image segmentation method based on multi-scale global context awareness and boundary guidance fusion, which comprises the steps of acquiring a colonoscope polyp data set, preprocessing the data set, and carrying out image normalization, data enhancement and size unification; the method comprises the steps of carrying out layered feature extraction on an input image by utilizing a Swin transform backbone network, gradually obtaining multi-scale semantic information through a four-stage coding structure to realize modeling of rich contexts, inputting the fused features into a decoder, recovering spatial resolution through step-by-step up-sampling and residual connection, and outputting a polyp segmentation result with high quality. The invention realizes the accurate segmentation of small polyps and large polyps, simultaneously suppresses background interference, reduces false detection and omission ratio, enhances the robustness and generalization capability of the model, and has wide clinical application prospect.

Inventors

WU CONG
HONG WENJUN
YANG TAO
TANG YUNCHUAN
ZHOU RAN
GAN HAITAO
HUANG ZHONGWEI
YANG ZHI
SHI MING
LIU CHUNTING
ZHENG YIFAN

Assignees

湖北工业大学

Dates

Publication Date: 20260505
Application Date: 20260107

Claims (10)

1. The medical image segmentation method based on multi-scale global context sensing and boundary guiding fusion is characterized by comprising the following steps of: Obtaining colonoscope polyp image data, and carrying out normalization processing, data enhancement processing and size unification processing on the image; Carrying out multi-scale feature extraction on the input image by using a layered transform coding network to obtain coding features containing different semantic information; And inputting the coding features into a decoding structure, recovering the spatial resolution through step-by-step upsampling and residual connection, and outputting polyp segmentation results.
2. The method according to claim 1, wherein in the multi-scale feature extraction process, each scale feature enters a global context sensing module and a boundary guiding fusion module respectively, semantic features and spatial features of different scales are fused through a feature aggregation module, and fusion features are provided for a decoding structure.
3. The method of claim 1, wherein during the decoding process, detail compensation and feature reconstruction are performed on the upsampled features by a circular residual convolution module to enhance texture representation of the polyp region and to improve final segmentation accuracy.
4. The method of claim 1, wherein a pyramid global attention module is used in the feature fusion process, and feature weight adjustment is performed through a parallel structure of channel attention branches and space attention branches, so as to improve feature distinguishing capability in low contrast and complex background.
5. A global context awareness module, comprising a multi-scale hole convolution branch, a spatial attention branch and a channel attention branch; the multi-scale cavity convolution branch processes input features through three convolution paths with expansion multiplying power of 1,3 and 5 respectively to obtain three groups of multi-scale features; the three groups of features are spliced in the channel dimension, space attention weight is generated through one-dimensional convolution, and the multi-scale features are weighted; The spliced features acquire channel attention weights through global average pooling and a two-stage full-connection structure, and channel weighting processing is carried out on the multi-scale features; And the spatial weighting features and the channel weighting features are fused through a cross attention structure to output global context features.
6. The module of claim 5, wherein the cross-attention structure processes the spatial weighting feature and the channel weighting feature by a one-dimensional convolution and a three-dimensional convolution, respectively, generates a final spatial weight and a final channel weight after processing by an activation function, and fuses the weighting features to obtain the output feature.
7. A boundary-directed fusion module comprising a reverse attention branch, a prediction boundary branch, and a high frequency edge branch; the reverse attention branch activates the prediction graph and reversely transforms the prediction graph to generate reverse features, and the reverse features are fused with the coding features to inhibit background interference; the prediction boundary branches carry out edge operator processing on the prediction graph to obtain prediction boundary characteristics, and the prediction boundary characteristics are fused with coding characteristics to enhance boundary potential capability; the high-frequency edge branches are fused with coding features by utilizing high-frequency edge features of the input image; and the three branch outputs are subjected to splicing and convolution processing to generate fusion characteristics.
8. The module of claim 7, wherein the fusion features are convolved, normalized, and activated after stitching to enhance both polyp region structural features and boundary detail features.
9. The medical image segmentation system is characterized by comprising a data preprocessing unit, a coding unit, a feature fusion unit and a decoding unit; the data preprocessing unit is used for carrying out normalization processing, enhancement processing and size unification processing on the polyp image; the coding unit is used for extracting multi-scale semantic features of the input image; the feature fusion unit comprises a global context sensing module, a boundary guiding fusion module and a pyramid global attention module, and is used for carrying out multi-scale fusion on coding features; The decoding unit is used for upsampling and reconstructing the fusion characteristics to output a medical image segmentation result.
10. The system of claim 9, wherein the decoding unit performs gradient feedback reconstruction of the multi-level fusion feature via a loop residual convolution structure, such that the recovered spatial details remain continuous and structurally intact.

Description

Medical image segmentation method and system based on multi-scale global context awareness and boundary guidance fusion Technical Field The invention belongs to the technical field of medicine, but is not limited to, and particularly relates to a medical image segmentation method and system based on multi-scale global context sensing and boundary guiding fusion. Background Colorectal cancer (Colorectal Cancer, CRC) is a common and highly dangerous type of malignancy with a incidence of three leading worldwide. A number of clinical and epidemiological studies have shown that most cases of CRC originate from pre-cancerous lesions such as adenomatous polyps. If the lesions can be found and timely excised through systematic screening at the initial stage, the incidence and death rate of CRC can be remarkably reduced. Currently, colonoscopy is the most clinically significant means of colorectal cancer screening and prevention. However, the polyps in colonoscope images have significant differences in the characteristics of size, color, texture and the like, and the images often have the problems of uneven illumination, insufficient contrast and the like, so that the boundaries of the polyps and mucous membranes are often blurred, and the difficulty of manual identification and segmentation is increased. These factors can easily lead to missed or false detection of polyps, thereby affecting the accuracy and efficiency of screening. With the development of deep learning, a medical image segmentation method based on a neural network has made remarkable progress. The U-Net structure realizes better segmentation effect through the frame and jump connection of the encoder-decoder, but has the defects in modeling global context and long-range dependence, and is easy to cause the problems of detail loss and poor global consistency. In order to make up for the limitation of the convolutional neural network on global modeling, a transducer architecture is introduced into a medical image segmentation task, and multi-scale global information is captured through a self-attention mechanism, so that the context understanding capability of the model is remarkably improved. Subsequently, the structure of the fusion of CNN and the transducer balances the global modeling and the local positioning, but the problem of reduced segmentation precision still exists under a complex background or a boundary fuzzy scene. Furthermore, edge features are critical to the accuracy of medical image segmentation. Although the existing method improves the boundary recognition capability by using significance guidance, reverse attention or boundary enhancement mechanisms, the existing method is still imperfect in the aspects of multi-scale feature fusion and edge uncertainty modeling, and the situations of false edges, over-segmentation or omission and the like are easy to occur. Therefore, how to fully utilize edge detail information while maintaining global context modeling capability to realize high-precision segmentation of a target region in a complex medical image is still a technical problem to be solved in the art. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a medical image segmentation method based on multi-scale global context sensing and boundary guiding fusion. The invention is realized in such a way that a medical image segmentation method based on multi-scale global context awareness and boundary guidance fusion comprises the following steps: S1, acquiring a colonoscope polyp data set, and preprocessing the data set, wherein the preprocessing comprises image normalization, data enhancement and uniform size; S2, carrying out layered feature extraction on an input image by utilizing a Swin transform backbone network, gradually acquiring multi-scale semantic information through a four-stage coding structure, and realizing modeling of rich contexts; And S3, inputting the fused features into a decoder, recovering the spatial resolution through step-by-step upsampling and residual connection, and outputting a high-quality polyp segmentation result. Further, the S2 specifically includes: The output features of each layer are respectively input into a global context sensing module GCAM and a boundary guiding fusion module BGFM, wherein the global context sensing module is used for enhancing the collaborative expression of global and local semantics and capturing long-distance dependency relationships; And a pyramid global attention module PGAM is introduced, and through dynamic fusion of the space and the channel attention characteristics, self-adaptive characteristic balance and information screening are realized, and the characteristic resolution capability of the model in a complex scene is improved. Further, the global context awareness module comprises three parts, namely a multi-scale cavity convolution branch, a space attention branch and a channel attention branch; The global context