CN-121983291-A - VRWKV-based medical image segmentation model, method and equipment

CN121983291ACN 121983291 ACN121983291 ACN 121983291ACN-121983291-A

Abstract

The invention relates to the technical field of computer aided diagnosis and treatment, in particular to a medical image segmentation model, a method and equipment based on VRWKV, wherein the model adopts a U-shaped framework of an encoder-bottleneck block-decoder, a frequency perception wavelet attention module is arranged on a jump connection path between the encoder and the decoder, the frequency perception wavelet attention module decomposes characteristic representation in a frequency domain through wavelet transformation and carries out self-adaptive modulation by utilizing a VRWKV attention mechanism, so that high-frequency details and a low-frequency global structure can be better aligned, a multi-scale channel fusion module is arranged between the decoder and a segmentation head, and the multi-scale channel fusion module carries out cross-scale fusion on output characteristics of all layers of the decoder by utilizing a channel mixing mechanism of VRWKV. The invention obtains optimal performance in various target segmentation tasks, and has stronger robustness on complex scenes such as fuzzy boundaries, low contrast, noise interference and the like.

Inventors

ZHOU ZHENHUAN
LI YINING
LI TAO

Assignees

南开大学

Dates

Publication Date: 20260505
Application Date: 20260403

Claims (9)

1. A VRWKV-based medical image segmentation model comprises a U-shaped framework of an encoder-bottleneck block-decoder, wherein a plurality of VRWKV blocks are arranged in the encoder, a plurality of image block expansion blocks and VRWKV blocks are arranged in the decoder, and the encoder and the decoder are connected in a corresponding interlayer jump manner; The method is characterized in that a frequency-aware wavelet attention module is arranged on a jump connection path, the frequency-aware wavelet attention module decomposes characteristic representation in a frequency domain through wavelet transformation, and self-adaptive modulation is carried out by utilizing VRWKV attention mechanism, so that high-frequency details and low-frequency global structures can be aligned better; a multi-scale channel fusion module is arranged between the decoder and the dividing head, and the multi-scale channel fusion module utilizes a VRWKV channel mixing mechanism to carry out cross-scale fusion on the output characteristics of all layers of the decoder; the working process of the multi-scale channel fusion module is as follows: The method comprises the steps of carrying out scale alignment on decoding hierarchical features obtained through calculation of each layer of a decoder, then inputting VRWKV channel mixing modules to carry out channel dimension interaction, splicing interaction results to obtain enhancement features, splicing the decoding hierarchical features subjected to scale alignment along the channel dimension to obtain basic features, and adding the basic features and the enhancement features to obtain channel fusion features.
2. The VRWKV-based medical image segmentation model as set forth in claim 1, wherein the frequency-aware wavelet attention module operates as follows: performing wavelet transformation on the input features, decomposing into a low frequency component, a vertical high frequency component, a horizontal high frequency component and a diagonal high frequency component; inputting the low-frequency component, the vertical high-frequency component, the horizontal high-frequency component and the diagonal high-frequency component into a VRWKV spatial mixing module for frequency attention modulation; The output characteristics of the spatial mixing module generate a fusion characteristic diagram through inverse wavelet transformation; and adding the fusion feature map and the input feature to obtain the frequency perception enhancement feature.
3. The VRWKV-based medical image segmentation model as set forth in claim 2, wherein the low-frequency component, the vertical high-frequency component, the horizontal high-frequency component, and the diagonal high-frequency component are used as the received vectors, and the low-frequency component is used as both the key vector and the value vector in the spatial blending module.
4. The VRWKV-based medical image segmentation model as set forth in claim 1, wherein in the channel blending module, the decoded hierarchy feature aligned with the smallest spatial resolution scale is a key vector and the decoded hierarchy feature aligned with all scales is a received vector.
5. A VRWKV-based medical image segmentation model as defined in claim 1, wherein the decoded hierarchy features are scale-aligned by upsampling.
6. A VRWKV-based medical image segmentation model as defined in claim 1, wherein the encoder is a pre-trained VRWKV encoder.
7. A VRWKV-based medical image segmentation model as defined in claim 1, wherein the segmentation head includes image block expansion, upsampling, 3 x 3 convolution blocks, and 1 x 1 convolution layers.
8. A method of VRWKV-based medical image segmentation, using a VRWKV-based medical image segmentation model as claimed in any one of claims 1 to 7, comprising the steps of: S1, acquiring a medical image; S2, inputting the medical image into a multi-layer structure encoder to obtain a plurality of encoding hierarchy characteristics; After passing through the frequency perception wavelet attention module, the multiple coding hierarchy features are input into the corresponding layer of the decoder; The coding level characteristics output by the last layer of the coder are input into the decoder after being subjected to characteristic abstraction by the bottleneck block; The decoder outputs a plurality of decoding hierarchy features; S3, inputting a plurality of decoding hierarchical features into a multi-scale channel fusion module to perform multi-scale channel interaction, so as to obtain channel fusion features; s4, inputting the channel fusion characteristics into a segmentation head to obtain a segmentation result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements a VRWKV-based medical image segmentation method as defined in claim 8 when the computer program is executed.

Description

VRWKV-based medical image segmentation model, method and equipment Technical Field The invention relates to the technical field of computer-aided diagnosis and treatment, in particular to a VRWKV-based medical image segmentation model, a VRWKV-based medical image segmentation method and VRWKV-based medical image segmentation equipment. Background In the field of computer-aided diagnosis and therapy, medical image segmentation is a fundamental key task, and the performance of the medical image segmentation directly influences the accuracy of clinical diagnosis and the formulation efficiency of a therapy scheme. Currently, the mainstream medical image segmentation techniques are mainly based on Convolutional Neural Network (CNNs), vision Transformer (ViTs), mamba models and hybrid architecture, however, these techniques have significant limitations: The CNNs-based method is represented by U-Net and variants thereof, has strong local feature extraction capability, is limited by a receptive field range, is difficult to effectively model global context information and long-distance dependency, and is easy to cause problems of boundary blurring, detail loss and the like in segmentation tasks of complex anatomical structures and lesion areas. The method based on ViTs relieves the problem of global dependence modeling through a self-attention mechanism, but the secondary calculation complexity of the self-attention mechanism causes huge calculation cost and memory consumption of the model when processing high-resolution medical images, and severely restricts the deployment application of the model in clinical scenes. The method based on Mamba realizes long-distance dependent modeling of linear computation complexity by adopting a state space model, improves operation efficiency, but generally achieves ideal balance between efficiency and precision at the cost of sacrificing segmentation precision. VRWKV is an emerging high-efficiency modeling architecture, has linear computation complexity and strong long-distance dependent modeling capability, and is primarily applied to the field of medical image segmentation. For example, med-URWKV: pure RWKV WITH IMAGENET PRE-training For MEDICAL IMAGE Segmentation discloses that the Pure RWKV model can be used For medical image Segmentation. However, the prior art has two major core defects that firstly, a CNN and VRWKV mixed architecture is adopted, the weight parameters of a large-scale pre-training VRWKV encoder cannot be fully multiplexed, secondly, an effective fusion mechanism for frequency domain information and multi-scale features is lacking, a VRWKV model is insufficient in the aspects of collaborative modeling of local details and global structures, space continuity maintenance and the like, and high requirements of medical image segmentation on boundary precision and structural integrity are difficult to meet. Disclosure of Invention The present invention aims to solve the above-mentioned problems. Therefore, the invention provides a VRWKV-based medical image segmentation model, a VRWKV-based medical image segmentation method and a VRWKV-based medical image segmentation device, which are characterized in that a pre-trained VRWKV encoder is multiplexed, and two core modules, namely a frequency perception wavelet attention module (FAWA module) and a multi-scale channel fusion module (MSCF module), are fused, wherein the frequency perception wavelet attention module improves the precision of characteristic detail representation by introducing frequency domain information, the multi-scale fusion enhances characteristic aggregation, optimal performance is obtained in various target segmentation tasks, and the multi-scale fusion has stronger robustness to complex scenes such as fuzzy boundaries, low contrast, noise interference and the like. The invention provides a VRWKV-based medical image segmentation model, which adopts the following technical scheme that the model comprises a U-shaped framework of an encoder-bottleneck block-decoder, wherein a plurality of VRWKV blocks are arranged in the encoder, a plurality of image block expansion blocks and VRWKV blocks are arranged in the decoder, and the encoder and the decoder are connected in a corresponding interlayer jump manner; The jump connection path is provided with a frequency sensing wavelet attention module, the frequency sensing wavelet attention module decomposes characteristic representation in a frequency domain through wavelet transformation, and self-adaptive modulation is carried out by utilizing VRWKV attention mechanism, so that high-frequency details and low-frequency global structures can be aligned better; a multi-scale channel fusion module is arranged between the decoder and the dividing head, and the multi-scale channel fusion module utilizes a VRWKV channel mixing mechanism to carry out cross-scale fusion on the output characteristics of all layers of the decoder; the working process of the multi-sc