CN-121982219-A - Surgical navigation three-dimensional reconstruction method based on resolution adjustment and cross-modal fusion

CN121982219ACN 121982219 ACN121982219 ACN 121982219ACN-121982219-A

Abstract

The invention relates to the field of virtual surgery, in particular to a surgical navigation three-dimensional reconstruction method and system based on resolution adjustment and cross-modal fusion. The three-dimensional reconstruction method for the surgical navigation comprises the steps of constructing a three-dimensional reconstruction model for the surgical navigation based on resolution adjustment and cross-modal fusion, and then carrying out end-to-end training on the model, wherein the three-dimensional reconstruction model comprises a convolution attention module, a resolution adjustment module, a cross-modal space contrast learning module and a vector fusion decoding module. The method and the device generate the comprehensive weight map based on the convolution attention module, and conduct resolution adjustment on the original feature map to improve processing efficiency and reduce calculation load, and perform semantic alignment through the cross-modal space contrast learning module, and the vector fusion decoding module completes cross-modal data fusion, so that errors of data fusion can be reduced, and accuracy of cross-modal fusion is improved.

Inventors

ZHANG XIAORUI
LU TAO
WANG PENGPAI
WANG TING
Lv Qingzhuo
JIANG ZHENGHAO

Assignees

南京工业大学

Dates

Publication Date: 20260505
Application Date: 20260210

Claims (10)

1. A surgical navigation three-dimensional reconstruction method based on resolution adjustment and cross-modal fusion comprises the following specific steps: constructing a surgical navigation three-dimensional reconstruction model based on resolution adjustment and cross-modal fusion; Performing end-to-end training on the model; Acquiring CT and MRI three-dimensional images of a patient; processing CT and MRI three-dimensional image data of a patient by using the model to generate a voxel map; the three-dimensional reconstruction model comprises the following parts: the convolution attention module is used for respectively generating a CT comprehensive weight map and an MRI comprehensive weight map based on the importance degree of each region of the original slice image; The resolution adjustment module is used for adjusting the resolution of the CT original feature map and the MRI original feature map according to the guidance of the comprehensive weight map; The cross-modal space contrast learning module carries out cross-modal learning on the CT feature map and the MRI feature map after resolution adjustment to generate feature vectors after semantic alignment; and the vector fusion decoding module is used for fusing and decoding the feature vectors with the aligned semantics to generate a cross-modal fused feature map.
2. The surgical navigational three-dimensional reconstruction method according to claim 1, wherein the convolution attention module comprises: an original characteristic diagram generation sub-module for generating a preprocessed CT image And pre-processed MRI images Generating CT original characteristic map by a layer of convolution of 3×3 And MRI raw feature map ; The core of the channel attention sub-module is a two-layer fully-connected network, and the input is CT original characteristic diagram And MRI raw feature map Output is CT channel attention weight map And MRI channel attention weighting map And CT channel weighting profile And MRI channel weighting profile ; The core of the space attention submodule is convolution of 5x5, and the input is a CT channel weighting characteristic diagram And MRI channel weighting profile Output is CT space attention weight graph And MRI spatial attention weighting map ; The comprehensive weight map generation sub-module is used for generating a CT channel attention weight map And MRI channel attention weighting map Spatial attention to CT And MRI spatial attention weighting map Multiplying by pixel position to generate CT comprehensive weight map And MRI composite weight map 。
3. The surgical navigational three-dimensional reconstruction method according to claim 2, wherein the resolution adjustment module comprises: Average saliency calculation sub-module for CT comprehensive weight map And MRI composite weight map Dividing the two-dimensional weight map of the channel into a number of non-overlapping local regions of pixel size 16 x 16, and calculating the average saliency weight of each region using the following formula: Wherein, the Refers to the first The number of channels in the channel is the same, Refers to the first The first of the channels The number of areas in the region of the substrate, Is the first In the first channel The average saliency weight of the individual regions, Refers to the first In the first channel The number of pixels that a region contains, Refers to a handle All pixel points in the image are traversed once, then the weights corresponding to the points are all added up, The abscissa index in the channel referring to the comprehensive weight map is The ordinate index is Is provided in the position of (a), Refers to the first Coordinates in each channel Weights at; Target resolution calculation submodule for calculating target resolution according to average significance weight The target resolution of each local area of the original feature map is calculated, and the formula is as follows: Wherein, the Refer to the representation of CT original feature map And MRI raw feature map Middle (f) The first channel The target resolution of the individual regions is determined, And Respectively a preset resolution maximum value and a preset resolution minimum value, for determining the range of the resolution level, And The method comprises the steps of respectively determining a preset weight high threshold value and a preset weight low threshold value, and judging the importance of different areas; downsampling and interpolation submodule for obtaining CT original feature map And MRI raw feature map When the target resolution is less than 1, downsampling the local area, reducing the local area to the target resolution, performing resolution adjustment on each local area, performing light feature extraction, recovering the local area to be 16 multiplied by 16 of the original block size by adopting three convolution interpolation, and then re-splicing all the local areas with the original size according to the original spatial positions to obtain CT feature images with adjusted resolutions And resolution-adjusted MRI feature maps 。
4. A surgical navigational three-dimensional reconstruction method according to claim 3, wherein said preset resolution maximum value 1, A preset minimum resolution 0.5, The preset weight high threshold value The value range is 0.7< <0.8, Preset weight low threshold The range of the value of (2) is 0.2< <0.3。
5. The surgical navigation three-dimensional reconstruction method of claim 1, wherein the cross-modal spatial contrast learning module is comprised of a 3-layer fully connected network, each layer comprising a batch normalization layer BatchNorm and a ReLU activation function.
6. The surgical navigation three-dimensional reconstruction method of claim 5, wherein the cross-modal spatial contrast learning module is pre-trained before the overall training of the system and freezes parameters, and the pre-training process is as follows: Training data preparation, namely acquiring CT feature images and MRI feature images aligned in groups, dividing trunk areas and bifurcation areas by manual labeling, extracting feature representation of each area, and obtaining feature vectors of the trunk areas of the CT MRI backbone region feature vector And CT bifurcation area feature vector MRI bifurcation region feature vector Adding trace Gaussian noise to training CT and MRI feature images, dividing trunk region and bifurcation region, extracting feature representation of each region, and obtaining noisy CT trunk region feature vector Noisy MRI trunk area feature vector And noisy CT bifurcation area feature vectors Noisy MRI bifurcation area feature vector ; One-stage single-mode contrast learning training, namely inputting CT trunk region feature vectors into a cross-mode space contrast learning module Noisy CT trunk area feature vector Feature vector of CT bifurcation area And noisy CT bifurcation area feature vectors Optimizing with a triplet loss function to As an anchor point for the anchor point, As a positive sample of the sample, And As a negative sample, the model can distinguish a trunk area and a bifurcation area in a single mode; two-stage single-mode contrast learning training, namely inputting MRI trunk region feature vectors into a cross-mode space contrast learning module Noisy MRI trunk area feature vector MRI bifurcation region feature vector And noisy MRI bifurcation area feature vector Optimizing with a triplet loss function to As an anchor point for the anchor point, As a positive sample of the sample, 、 As a negative sample, the model can distinguish a trunk area and a bifurcation area in a single mode; cross-modal contrast learning training, and inputting CT trunk region feature vectors to a cross-modal space contrast learning module MRI backbone region feature vector Feature vector of CT bifurcation area MRI bifurcation region feature vector Optimizing with a triplet loss function to As an anchor point for the anchor point, As a positive sample of the sample, 、 As a negative example, the model is enabled to distinguish between trunk and bifurcation regions across modalities.
7. The surgical navigational three-dimensional reconstruction method according to claim 6, wherein the triplet loss function used for the single mode contrast learning training and the cross mode contrast learning training is as follows: Wherein the method comprises the steps of Is a function of the loss of the triplet, Refers to the total sum of the penalty terms for each triplet to get the total penalty, To participate in the total number of triples calculated, For the index of the triplet sample, To select the larger maximum output from the two, Representing the calculation of the euclidean distance between features, An embedded feature vector representing an anchor point, Representing the embedded feature vector of the positive sample, Representing the embedded feature vector of the negative sample, Representing the interval super-parameters.
8. The surgical navigational three-dimensional reconstruction method of claim 1, wherein the model is trained end-to-end using the loss function as follows: Wherein the method comprises the steps of In order to account for the total loss, , , , The parameters used to balance the effects of the parts are separately, In order to achieve the fusion loss, In order to reconstruct the loss of the device, In order for the loss to be regularized, Is a loss of attention.
9. The surgical navigational three-dimensional reconstruction method according to claim 8, The fusion loss The method comprises the following steps: Wherein, the In order to fuse the feature map(s), Refers to the square of the euclidean distance between two feature maps, And Respectively a CT characteristic map after resolution adjustment and an MRI characteristic map after resolution adjustment; The reconstruction loss The method comprises the following steps: Wherein, the Is a high-resolution voxel map, Is a true three-dimensional model; The regularization loss The method comprises the following steps: Wherein, the Is the weighting coefficient of the CT trunk region, For the weighting coefficients of the MRI trunk region, Is the weighting coefficient of the CT bifurcation area, Weighting coefficients for MRI bifurcation regions; the attention loss The method comprises the following steps: Wherein, the Refers to the binary cross entropy loss, which, first, In order to generate a two-dimensional space importance map after global average pooling of the three-dimensional comprehensive weight map in the channel dimension, the two-dimensional space importance map is called a two-dimensional comprehensive weight map, Is a manually noted binary mask map.
10. A surgical navigation three-dimensional reconstruction system based on resolution adjustment and cross-modal fusion for implementing the surgical navigation three-dimensional reconstruction method as defined in claims 1-9, the system comprising: The data acquisition module is used for acquiring original CT and MRI three-dimensional images of the same patient, and slicing the images along the same direction to obtain a plurality of groups of registered CT images and corresponding MRI images; The data preprocessing module is used for selecting a group of initial CT images And corresponding initial MRI images Preprocessing and outputting as a preprocessed CT image And pre-processed MRI images ; The convolution attention module is used for generating importance degree of each region of the image and inputting the importance degree into the preprocessed CT image And pre-processed MRI images Output as CT original feature map And MRI raw feature map And CT comprehensive weight map And MRI composite weight map ; Resolution adjustment module for guiding CT original feature map according to comprehensive weight map And MRI raw feature map Resolution processing is carried out to obtain a CT characteristic diagram with adjusted resolution And resolution-adjusted MRI feature maps ; The region segmentation module is used for adjusting the resolution of the CT characteristic map And resolution-adjusted MRI feature maps Dividing into a trunk area and a bifurcation area, and then respectively converting into CT trunk area characteristic vectors MRI backbone region feature vector And CT bifurcation area feature vector MRI bifurcation region feature vector ; The cross-modal space contrast learning module is used for performing cross-modal semantic alignment on the four feature vectors to obtain corresponding CT trunk region embedded vectors MRI backbone region embedding vector And CT bifurcation region embedding vector MRI bifurcation region embedding vector ; Vector fusion decoding module for embedding CT trunk area into vector MRI backbone region embedding vector CT bifurcation region embedding vector MRI bifurcation region embedding vector Fusion is carried out to obtain a trunk vector Bifurcation vector Inputting the spliced images into a lightweight decoder to obtain a final feature fusion graph ; The feature map stacking module is used for merging the feature maps according to the processing sequence Performing stacking operation to obtain three-dimensional fusion feature map ; Voxel map generation module for fusing three-dimensional characteristic map And inputting the three-dimensional decoder to generate a voxel map.

Description

Surgical navigation three-dimensional reconstruction method based on resolution adjustment and cross-modal fusion Technical Field The invention relates to the field of virtual surgery, in particular to a surgical navigation three-dimensional reconstruction method based on resolution adjustment and cross-modal fusion. Background Coronary Artery Bypass Graft (CABG) is a cardiovascular surgery that solves the problem of coronary artery stenosis or blockage by creating new blood flow channels, and surgical navigation techniques are an extremely important aid to CABG surgery, wherein creating real-time three-dimensional reconstructed images prior to surgery can help surgeons accurately locate and plan the surgical path, optimizing the surgical strategy and improving the operational accuracy, and three-dimensional reconstruction accuracy and multimodal data fusion are important guarantees that surgeons can perform more accurate operations. Therefore, how to construct a three-dimensional reconstruction model with high accuracy and high quality is the focus of research of current surgical navigation. In the construction of the three-dimensional reconstruction model, the introduction of the high-resolution image significantly improves the details and accuracy of the model. However, this improvement is also accompanied by challenges, and high-resolution images often contain a large amount of detailed information, resulting in problems of insufficient computing resources and slow processing speed. However, despite the improvement in accuracy, there are still bottlenecks in terms of computational complexity and real-time processing power. In the construction of three-dimensional reconstruction models, since CABG surgery requires accurate positioning and manipulation within the chest cavity, not only is detailed information of soft tissue known, but also the influence of bone structure on the surgical path and the operation space needs to be considered, the limitation of a single-modality data source may cause the system to fail to provide a complete anatomical view. In the study of J.Zhang et al, they proposed a multimodal data source fusion framework (MDFF) to intelligently fuse multiple data types. Although the framework theoretically has strong fusion capability, the heterogeneous nature and inconsistency among different modal data sources brings challenges to multi-modal data fusion, and thus inaccuracy and misleading of fusion results are caused. Kong et al in Multimodal MEDICAL IMAGE Fusion and 3D Reconstruction Using Convolutional Neural Networks propose a CNN-based multi-modal medical image Fusion and three-dimensional reconstruction method. According to the method, CT and MRI image features are extracted through an encoder, feature fusion is carried out by using a convolution layer, and finally three-dimensional reconstruction is realized through a 3D decoder. However, the method does not consider the importance difference of different areas inside the image in the characteristic extraction and fusion process, and the whole image is processed by adopting uniform resolution and calculation resources. In summary, the current three-dimensional reconstruction method of surgical navigation still has the defects, and a method capable of efficiently processing high-resolution images and fusing data in a cross-mode manner needs to be researched so as to achieve the goal of constructing a high-precision and high-quality three-dimensional reconstruction model. Disclosure of Invention The invention provides the method for improving the processing efficiency of the high-resolution image and the accuracy of cross-modal data fusion in the three-dimensional reconstruction process of the surgical navigation. In a first aspect, a surgical navigation three-dimensional reconstruction method based on resolution adjustment and cross-modal fusion is provided, which specifically comprises the following steps: constructing a surgical navigation three-dimensional reconstruction model based on resolution adjustment and cross-modal fusion; Performing end-to-end training on the model; Acquiring CT and MRI three-dimensional images of a patient; processing CT and MRI three-dimensional image data of a patient by using the model to generate a voxel map; the three-dimensional reconstruction model comprises the following parts: the convolution attention module is used for respectively generating a CT comprehensive weight map and an MRI comprehensive weight map based on the importance degree of each region of the original slice image; The resolution adjustment module is used for adjusting the resolution of the CT original feature map and the MRI original feature map according to the guidance of the comprehensive weight map; The cross-modal space contrast learning module carries out cross-modal learning on the CT feature map and the MRI feature map after resolution adjustment to generate feature vectors after semantic alignment; and the vector f