CN-121616926-B - Multi-mode medical image fusion method and system

CN121616926BCN 121616926 BCN121616926 BCN 121616926BCN-121616926-B

Abstract

A multi-modal medical image fusion method and system. Relates to the technical field of multi-mode medical image fusion, in particular to a multi-mode medical image fusion method and system. The method comprises the steps of converting a two-way single-channel medical image into a three-channel feature map through a mode adaptation module, extracting multi-scale features through two parallel pre-training ConvNeXt modules, obtaining an initial fusion feature set through element level addition, performing multi-direction shift and CUDA (compute unified device architecture) accelerated WKV attention computation on the initial fusion features through a VRWKV space mixing module, strengthening global dependence and local detail association of a space, realizing channel feature calibration and dimension conversion through double-path pooling, shared weight MLP and residual fusion through an improved channel attention decoder, and finally outputting a single-channel fusion image through transposed convolution up-sampling, feature fusion and Sigmoid normalization.

Inventors

WANG LIU
ZHOU YANG
Li Menjia
CAI HONGYU
MA YUE
WANG TIANQI
WANG YIFAN

Assignees

长春大学

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (8)

1. A method of multimodal medical image fusion, the method comprising the steps of: S1, acquiring a single-channel MRI image and an SPE image; The single-channel SPE image is converted into the middle characteristics of three channels through a mode adaptation module; After the middle characteristics of the three channels are subjected to characteristic enhancement treatment, the middle characteristics and the MRI single-channel images are respectively input into a double-encoder characteristic extraction module; The S2 double encoder feature extraction module comprises 4 feature extraction stages, wherein each extraction stage is used for realizing resolution downsampling and channel number lifting of two channel images through 2 ConvNeXt modules to respectively obtain feature tensors of 4 scales of an MRI image and an SPE image And , An index value representing a scale is provided, ; The ConvNeXt module sequentially passes through a 2d depth separable convolution module, a layer normalization module, a first 2d convolution module, a gating index linear unit, a second 2d convolution module, a layer scaling module and a random depth module from input to output; S3, will And Element-level addition is performed to obtain an initial fusion feature set ; S4, will Decoding to obtain decoding result ; The decoding comprises 4 decoding stages, each decoding stage comprising 1 modified channel attention decoder; The outputs of the 1 st, 2 nd and 3 rd feature extraction stages are connected by residuals to the outputs of the modified channel attention decoder in the 1 st, 2 nd and 3 rd decoding stages; the 1 st, 2 nd and 3 rd decoding stages also each include 1 VRWKV spatial mix modules; The improved channel attention decoder sequentially passes through a transposition convolution module, a first normalization module in the decoding process, a first ReLU activation function in the decoding process, a first convolution module in the decoding process, a second normalization module in the decoding process, a second ReLU activation function in the decoding process and a channel attention mechanism from input to output; The output of the channel attention mechanism is fused with the input of the channel attention mechanism through residual error, so as to obtain the result of the improved channel attention decoder; The channel attention mechanism is divided into two paths from input to output, wherein the first path is subjected to maximum pooling, the second path is subjected to average pooling and then is input into two layers of MLPs together with the output of the first path, and the output of the two layers of MLPs is subjected to processing of a second convolution module and a Sigmoid activation function in the decoding process to obtain the output of the channel attention mechanism; s5, through The convolution converts the decoding result into a single-channel fusion feature map, and then the pixel value of the single-channel fusion feature map is normalized to be the pixel value through a Sigmoid activation function Obtaining a multi-mode medical fusion image in the interval ; And then will be Converted from YUV images to RGB images.
2. The method for multi-modal medical image fusion as defined in claim 1, wherein the modality adaptation module includes 1 A convolution module; The computational formula of the feature enhancement is as follows: Wherein, the Representing image data to be feature enhanced, Representation of The convolution module is used for generating a convolution code, Representing the first in a feature enhancement process The convolution module is used for generating a convolution code, Representing the first normalization layer in the feature enhancement process, Representing the first ReLU activation function in the feature enhancement process, Representing the second in the feature enhancement process The convolution module is used for generating a convolution code, Representing the second normalization layer in the feature enhancement process, Representing the second ReLU activation function in the feature enhancement process.
3. The method for multi-modal medical image fusion as defined in claim 2, wherein the input of VRWKV spatial hybrid module is query offset and stitching to obtain The tensor is used to determine the degree of the tensor, The tensor is processed by space inquiry, space key and space value to obtain feature matrixes R, K and V with corresponding dimensions; After R is processed by a Sigmoid activation function, the attention gating of the space query is obtained ; K and V calculate the space attention weight through WKV operators accelerated by CUDA to obtain the feature matrix after attention weighting ; And Multiplying by element level, normalizing to obtain global features with enhanced space , The index value of the spatial mixing module is represented VRWKV, ; And after the residual error fusion is carried out on the input of the VRWKV spatial mixing module, the output of the VRWKV spatial mixing module is obtained.
4. The method for multi-modal medical image fusion as defined in claim 3 wherein the query offset is specifically that inputs from VRWKV spatial blending modules are divided into 4 groups by channel, and 1-pixel shift operations in left, right, up and down directions are performed respectively.
5. The multi-modal medical image fusion method of claim 4, wherein the weighted attention feature matrix is calculated as: , wherein, A linear attention function is represented and is shown, The size of the batch is indicated and, The length of the feature sequence is indicated, The number of characteristic channels is indicated and, The spatial decay parameters are represented by a set of values, And All of which represent scalar parameters, Representing the initial bias parameters.
6. The method of claim 5, wherein the decoding process of the 2 nd and 3 rd decoding stages is identical to the first decoding stage; In the 4 th decoding stage, the 4 th modified channel attention decoder outputs To decode the result 。
7. The method according to claim 6, wherein in step S5, the RGB image is an output of the multi-modality medical image fusion.
8. A multi-modal medical image fusion system, wherein the system is configured to implement the method of any one of claims 1-7, the system comprising a data acquisition module, a modality adaptation module, a feature enhancement module, a dual encoder feature extraction module, a feature fusion initialization module, a decoding module, and an output module; The data acquisition module is used for acquiring single-channel MRI images and SPE images; the mode adaptation module is used for converting the single-channel SPE image into the middle characteristics of three channels; the characteristic enhancement module is used for carrying out characteristic enhancement on the middle characteristics of the three channels; after the middle features of the three channels are subjected to feature enhancement treatment, the middle features and the MRI single-channel images are respectively input into a double-encoder feature extraction module; The double encoder feature extraction module is used for realizing resolution downsampling and channel number promotion of two channel images to respectively obtain feature tensors of 4 scales of an MRI image and an SPE image And ; The feature fusion initialization module is used for initializing And Element-level addition is performed to obtain an initial fusion feature set ; The decoding module is used for decoding Decoding to obtain decoding result ; The output module is used for decoding the result And converting the fusion data into output of multi-mode medical image fusion.

Description

Multi-mode medical image fusion method and system Technical Field The invention relates to the technical field of multi-mode medical image fusion, in particular to a multi-mode medical image fusion method and system. Background The multi-mode medical image fusion is used as a key technology in the medical image analysis field, aims to systematically integrate complementary information captured by different mode imaging technologies, not only covers fine characterization of an anatomical structure, but also comprises dynamic feedback of physiological functions, provides a more comprehensive and accurate decision basis for clinical diagnosis, and further plays a supporting role in key links such as early disease screening, treatment scheme optimization, disease course dynamic monitoring and the like. The technology has the core value of breaking through the inherent limitation of single-mode imaging and realizing the expansion of diagnosis dimension through information complementation. In recent years, the deep learning technology has great potential in the field of image fusion, but the conventional method has significant defects in key links such as long-distance spatial dependence modeling, multi-directional structure feature extraction, cross-mode multi-scale feature accurate alignment, weak focus feature response enhancement and the like, and is difficult to balance between global feature characterization and calculation efficiency, and particularly difficult to achieve ideal effects in the aspects of fine anatomical structure maintenance and key diagnosis information enhancement of medical images. With the development of artificial intelligence, convolutional neural networks have achieved significant effort in the field of image processing. However, the following problems are commonly faced in terms of fine structure preservation and critical information enhancement of existing medical images: (1) The traditional convolution operation receptive field is limited, long-distance spatial correlation in medical images is difficult to capture, while a transducer model can model global dependence, but has high computational complexity, so that balance between global feature characterization and computational efficiency is difficult to achieve, and the detail integrity and processing timeliness of the medical images cannot be considered. (2) Anatomical structures and focus information in medical images have obvious directionality, a single-direction feature extraction mechanism is easy to miss multidimensional space distribution details, real morphological features of tissues are difficult to comprehensively restore, and accuracy of subsequent diagnosis is affected. (3) The problem of information loss, feature redundancy or fusion imbalance of feature information of different modes and different scales is easy to occur in the fusion process, the alignment precision of high-dimensional global features and low-dimensional local features is insufficient, meanwhile, the feature distribution deviation is easy to be caused by the intensity and contrast difference between modes, and the complementary value of multi-mode information is difficult to be fully exerted. (4) The response sensitivity of the traditional channel attention mechanism to weak focus features in medical images is insufficient, key diagnosis areas cannot be focused accurately, the enhancement effect of core structure information in fusion results is not obvious, and the requirement of clinical diagnosis on fine features is difficult to meet. Therefore, a fusion method for efficient heterogeneous feature extraction, multi-dimensional feature enhancement and accurate modal fusion capability needs to be provided. Disclosure of Invention Aiming at the problems of insufficient feature extraction, low fusion precision, poor detail reservation and the like in the existing multi-mode medical image fusion technology, the invention aims to provide a multi-mode medical image fusion method and system. The problems of insufficient feature extraction, insufficient long-distance dependence modeling and low fusion precision of the traditional method are solved, the details of the anatomical structure and the physiological function information of the medical image are considered, and the diagnostic value of the fusion image is improved. The method comprises the following steps: S1, acquiring a single-channel MRI image and an SPE image; The single-channel SPE image is converted into the middle characteristics of three channels through a mode adaptation module; After the middle characteristics of the three channels are subjected to characteristic enhancement treatment, the middle characteristics and the MRI single-channel images are respectively input into a double-encoder characteristic extraction module; The S2 double encoder feature extraction module comprises 4 feature extraction stages, wherein each extraction stage is used for realizing resolution downsampling a