CN-122023383-A - Polyp segmentation model based on location awareness feature refinement
Abstract
The invention relates to medical image processing, in particular to a polyp segmentation model based on position perception feature refinement, which comprises an encoder, a decoder, a context feature module CFM, a trans-scale information module CIM and a semantic attention module SAM, wherein the encoder adopts a light-weight backbone network of a transducer and is used for extracting multiscale semantic features, the decoder adopts a symmetrical structure to gradually restore spatial resolution, a plurality of position perception feature refinement modules PFR are embedded in shallow layers or multiple layers, attention distribution of the shallow layer semantic features is guided through deep semantic so as to compensate and reconstruct local details, the decoder also comprises a convolution layer, a context feature module CFM, a trans-scale information module CIM and a semantic attention module SAM, and a multiscale supervision strategy is adopted to simultaneously apply supervision to multiple layers of output so as to improve detection and segmentation precision of tiny polyps and boundary pixels.
Inventors
- HAN LONGFEI
- Xiong Bingning
Assignees
- 北京工商大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260304
Claims (10)
- 1. The polyp segmentation model based on the position perception feature refinement is characterized by adopting an encoder-decoder structure; The encoder adopts a lightweight backbone network of a transducer and is used for extracting multi-scale semantic features; The decoder adopts a symmetrical structure to gradually restore the spatial resolution, a plurality of position perception feature refinement modules PFR are embedded in a shallow layer or a plurality of layers, attention distribution of shallow semantic features is guided through deep semantic so as to compensate and reconstruct local details, the decoder also comprises a convolution layer, a context feature module CFM, a cross-scale information module CIM and a semantic attention module SAM, and a multi-scale supervision strategy is adopted to simultaneously apply supervision to multi-layer output so as to improve detection and segmentation precision of micro polyps and boundary pixels.
- 2. The polyp segmentation model based on location-aware feature refinement of claim 1, wherein the encoder performs feature extraction on the input endoscopic image I to obtain multi-scale semantic features E 1 、E 2 、E 3 、E 4 .
- 3. The polyp segmentation model based on location-aware feature refinement of claim 2, wherein the encoder performs feature extraction on the input endoscopic image I to obtain multi-scale semantic features E 1 、E 2 、E 3 、E 4 , comprising: Extracting the edge priori of the shallow semantic features E 1 , and carrying out global context modeling on the deep semantic features E 4 to obtain the position priori; All multi-scale semantic features are preprocessed, including scale alignment, channel projection and standardization.
- 4. The location-aware feature refinement based polyp segmentation model according to claim 1, wherein the location-aware feature refinement module PFR comprises a location-aware attention module PSA and a feature reconstruction module FRB; The position perception attention module PSA guides attention distribution of shallow semantic features through deep semantic, suppresses background noise and enhances polyp region features; And the characteristic reconstruction module FRB carries out structural reconstruction on the local details which still have loss after the attention is enhanced, and further compensates the edge information of the micro polyp.
- 5. The location-aware feature refinement based polyp segmentation model in claim 4 in which the location-aware attention module PSA comprises a background suppression block and a location-sensitive attention block; Background suppression Block, position prior construction, global average pooling Pool c of channel dimension for deep semantic features E 4 , and convolution by 1*1 Generating a low-dimensional semantic map S d , and multiplying the input endoscopic image I with the low-dimensional semantic map S d element by element to obtain a background suppression feature B: ; ; Wherein, the Representing element-by-element multiplication; the working process of the position sensitive attention block comprises the following steps: S11, block coding, namely respectively carrying out block coding on the background suppression feature B and the shallow semantic feature E 1 to obtain a block sequence P B 、P E1 ; S12, calculating similarity and attention score, namely projecting a block sequence into a query vector Q, a key vector K and a value vector V, calculating a similarity matrix, and normalizing by a scaling and Softmax function to obtain a position attention weight matrix A: ; ; ; ; wherein W Q 、W K 、W V are each a learnable linear projection matrix, A query-key similarity matrix, d being a projection dimension; S13, shallow semantic feature enhancement, namely weighting and reconstructing the block sequence P E1 of the shallow semantic feature E 1 by using the position attention weight matrix A to obtain an enhanced shallow block sequence And mapping back to the spatial domain through block reorganization to obtain the position perception enhancement characteristic To suppress background noise and enhance polyp region features.
- 6. The location-aware feature refinement based polyp segmentation model of claim 4 in which the feature reconstruction module FRB comprises a boundary block and a feature refinement block; boundary blocks, block patch the current decoder prediction map Y to extract the response matrix R: ; The characteristic refines the piece, its working process includes: s21, response rearrangement/replacement, namely identifying high response columns/rows in each block, dispersing/copying information of high response positions to low response positions through rearrangement/exchange operation, and forming a compensation matrix C: ; ; wherein R' is the rearranged response matrix, To identify the index function of the high response column/row in each block by the response matrix R, the index function is modified by The operation finds the location index that responds most within each block, For the reorder function, R ' j is the j-th column/row of the reordered response matrix R', For normalized weights, sum j represents a summation operation; s22, fusing and mapping the compensation matrix C and the position perception enhancement features Projection is carried out according to a channel, element-by-element multiplication and addition fusion are carried out, and then a reconstruction feature F rec is generated through connection of a plurality of depth separable convolutions DWConv and residual errors: ; Wherein, the Representing element-by-element additions; And S23, performing scale restoration, namely performing scale alignment on the reconstructed feature F rec and the feature of the current scale of the decoder, performing decoding fusion, and finally generating a prediction graph restored to the original image size through scale restoration.
- 7. The location-aware feature refinement based polyp segmentation model according to claim 4, wherein the location-aware feature refinement module PFR is capable of being represented as a series structure of a normalized-location-aware attention module PSA-a priori guided feed forward network PGFFN-residual connection: ; Wherein, X is an input feature, L is edge priori or position priori information, LN represents a normalization operation, and the priori guided feed forward network PGFFN adopts a block convolution and gating mechanism to enhance nonlinear expression capability: ; Where z is the input characteristic of the a priori guided feed forward network PGFFN, GELU is the gaussian error linear element activation function, As a Sigmoid gating function, W 1 、W 2 are all learnable linear transformation weight matrices.
- 8. The polyp segmentation model based on location aware feature refinement of claim 1, wherein in the decoding stage, the prediction graph of each layer is jump connected with the corresponding shallow semantic features by way of channel stitching or learning weighted addition, and finally by 1*1 convolution Mapping to pixel level-separated probability maps Sensitivity to micro polyps and boundary pixels is improved, and multi-level supervision strategies are adopted to simultaneously apply supervision to multi-level outputs of a decoder.
- 9. The location-aware feature refinement based polyp segmentation model of claim 1 in which model parameters are initialized with a uniform or Xavier, all convolution layers being followed by a normalization layer and a nonlinear activation layer.
- 10. The polyp segmentation model based on location-aware feature refinement of claim 9, wherein the loss function L at model training uses a combination of pixel-level weighted binary cross entropy loss L wBCE and weighted IoU loss L wIoU to balance class imbalance and boundary constraints: ; ; ; Wherein, the Is the true label for the i-th pixel, For the prediction probability of the i-th pixel, For the pixel weight of the ith pixel, N is the total number of pixels participating in calculation according to the boundary distance or the distribution of difficulty samples, For weighted intersections, i.e. a weighted sum of regions where both true and predicted are positive samples, For weighted union, i.e., a weighted sum of regions that are true or predicted to be positive samples, L reg is optional regularization or boundary perception penalty, 、 、 Are all weight super parameters; Data enhancement and model training, namely adopting a data enhancement strategy including random rotation, overturn, color disturbance and scale transformation, adopting a AdamW optimizer for model training, enabling an initial learning rate to be 1 multiplied by 10 -4 , combining multi-step or cosine annealing learning rate scheduling, jointly training on a multi-disclosure endoscope data set, and using cross verification to improve generalization.
Description
Polyp segmentation model based on location awareness feature refinement Technical Field The invention relates to medical image processing, in particular to a polyp segmentation model based on position perception feature refinement. Background Accurate segmentation of colorectal polyps is a core problem in endoscopically assisted diagnosis, and has important significance for early discovery and treatment. Unlike natural images, endoscopic images often have low contrast, complex background, similar pixel characteristics of polyps and surrounding tissues, and the like, which results in blurred polyp boundaries and large size differences, and particularly, the spatial resolution of tiny polyps under the traditional convolution and pooling operations is easily weakened, so that detail information is lost, thereby causing missed detection and false detection. In recent years, various methods based on deep learning have been proposed to cope with the above challenges, mainly including a cross-layer feature fusion method, an attention mechanism method, and a boundary perception method (such as a tranformer-based Polyp-PVT, a reverse attention-based PraNet, and a boundary enhancement-focused CaraNet), which have made remarkable progress in improving overall segmentation performance, but each have limitations in that feature fusion may mask details of a minute object while taking into consideration multi-scale information, an attention mechanism may ignore minute structures while being capable of highlighting a significant region, and boundary enhancement may have difficulty in ensuring segmentation accuracy when boundary pixels are very few. Therefore, how to effectively recover and strengthen shallow fine-grained information, especially edge detail compensation for micro polyps, while maintaining semantic understanding capabilities remains a key issue for current research. Disclosure of Invention (One) solving the technical problems Aiming at the defects existing in the prior art, the invention provides a polyp segmentation model based on position perception feature refinement, which can effectively overcome the defect that accurate segmentation of micro polyps is difficult to carry out in the prior art. (II) technical scheme In order to achieve the above purpose, the invention is realized by the following technical scheme: A polyp segmentation model based on position perception feature refinement adopts an encoder-decoder structure; The encoder adopts a lightweight backbone network of a transducer and is used for extracting multi-scale semantic features; The decoder adopts a symmetrical structure to gradually restore the spatial resolution, a plurality of position perception feature refinement modules PFR are embedded in a shallow layer or a plurality of layers, attention distribution of shallow semantic features is guided through deep semantic so as to compensate and reconstruct local details, the decoder also comprises a convolution layer, a context feature module CFM, a cross-scale information module CIM and a semantic attention module SAM, and a multi-scale supervision strategy is adopted to simultaneously apply supervision to multi-layer output so as to improve detection and segmentation precision of micro polyps and boundary pixels. Preferably, the encoder performs feature extraction on the input endoscopic image I to obtain a multi-scale semantic feature E 1、E2、E3、E4. Preferably, the encoder performs feature extraction on the input endoscopic image I to obtain a multi-scale semantic feature E 1、E2、E3、E4, and then includes: Extracting the edge priori of the shallow semantic features E 1, and carrying out global context modeling on the deep semantic features E 4 to obtain the position priori; All multi-scale semantic features are preprocessed, including scale alignment, channel projection and standardization. Preferably, the location-aware feature refinement module PFR comprises a location-aware attention module PSA and a feature reconstruction module FRB; The position perception attention module PSA guides attention distribution of shallow semantic features through deep semantic, suppresses background noise and enhances polyp region features; And the characteristic reconstruction module FRB carries out structural reconstruction on the local details which still have loss after the attention is enhanced, and further compensates the edge information of the micro polyp. Preferably, the location-aware attention module PSA comprises a background suppression block and a location-sensitive attention block; Background suppression Block, position prior construction, global average pooling Pool c of channel dimension for deep semantic features E 4, and convolution by 1*1 Generating a low-dimensional semantic map S d, and multiplying the input endoscopic image I with the low-dimensional semantic map S d element by element to obtain a background suppression feature B: ; ; Wherein, the Representing element-by-element mul