CN-122023983-A - Multi-mode image fusion method, device and storage medium based on graph neural network

CN122023983ACN 122023983 ACN122023983 ACN 122023983ACN-122023983-A

Abstract

The invention discloses a multi-modal image fusion method, a device and a storage medium based on a graph neural network, belonging to the technical field of medical image processing and artificial intelligence, wherein the method comprises the steps of obtaining multi-modal images; the method comprises the steps of carrying out depth feature extraction on multi-mode images to obtain high-dimensional feature images of each mode, dividing the high-dimensional feature images of each mode into a plurality of image nodes, calculating the similarity between each image node and adjacent nodes, constructing an adjacent matrix of a multi-mode image structure according to the similarity, capturing global association relations and long-distance dependence relations of node features in the image nodes by taking the adjacent matrix as image topology guidance, carrying out global aggregation according to attention pool image reading functions to generate global features, and inputting the global features into a reconstruction network to generate a final fusion image. The method and the device can solve the technical problems that the modeling capability of the prior art on irregular data is limited, and complex spatial association and semantic dependence between multi-mode images are difficult to capture effectively.

Inventors

ZHANG LI
ZHANG JIAN
NAN YAHUI

Assignees

吕梁学院

Dates

Publication Date: 20260512
Application Date: 20260119

Claims (9)

1. The multi-mode image fusion method based on the graph neural network is characterized by comprising the following steps of: Acquiring multi-modality images from at least two different medical imaging devices; Deep feature extraction is carried out on the multi-mode image, and a high-dimensional feature map of each mode is obtained; Dividing a high-dimensional feature map of each mode into a plurality of map nodes, calculating the similarity between each map node and adjacent nodes, and constructing an adjacent matrix of the multi-mode map structure according to the similarity; Capturing global association relation and long-distance dependency relation of node characteristics in graph nodes by taking an adjacency matrix as graph topology guidance, and performing global aggregation according to an attention pooling graph reading function to generate global characteristics; The global features are input into a reconstruction network, and a final fusion image is generated through deconvolution and upsampling operations.
2. The method for multi-modal image fusion based on the graph neural network according to claim 1, wherein before depth feature extraction is performed on the multi-modal image, the method comprises: Adopting a non-local mean filtering algorithm to remove noise from the multi-mode image; normalizing the intensity of each mode image to a [0,1] interval; And carrying out spatial registration by adopting a method of combining affine transformation and elastic registration.
3. The method for multi-modal image fusion based on the graph neural network according to claim 1, wherein the depth feature extraction is performed on the multi-modal image to obtain a high-dimensional feature graph of each modality, and the method comprises the following steps: carrying out depth feature extraction on the preprocessed image by adopting a feature extraction network based on Mamba model, wherein the feature extraction network models the preprocessed image through a discretization state space equation, and the expression of the discretization state space equation is as follows: ; Wherein, the Indicating time of day Is used to determine the state vector of (1), Indicating time of day Is used to determine the state vector of (1), The image after the preprocessing is represented and is displayed, A high-dimensional feature map is represented, The state matrix, the input matrix, the output matrix and the direct connection matrix are respectively discretized.
4. The graph neural network-based multi-modal image fusion method of claim 1, wherein calculating the similarity between each graph node and its neighbors comprises: ; Wherein, the Is a node And adjacent node The degree of similarity between the two, Is a node Is characterized in that, Is a node Is characterized in that, Is a node Is provided with a plurality of spatial positions, Is a node Is provided with a plurality of spatial positions, For the first weight to be balanced, For the second weight to be balanced, Is a spatial scale parameter.
5. The graph neural network-based multi-modal image fusion method of claim 1, wherein capturing global associations of node features in graph nodes comprises: The attention coefficient between the center node and the neighbor node is calculated, and the expression is: ; based on the attention coefficient, the neighborhood characteristics are adaptively aggregated, so that the preliminary interaction and fusion of different modal characteristics on a graph space are realized while the local structure information is reserved, and the expression is as follows: ; Wherein, the Representing nodes The graphical annotation force characteristics output after processing, Representing a non-linear activation function, Representing the index of the nodes in the graph structure, Representing the target node currently acting as the aggregation center, Representing the neighbor nodes of the target node, Representing nodes First-order neighborhood node set, i.e. AND node All neighbor nodes directly connected , Representing a linear transformation matrix that can be learned, Representing neighbor nodes Is a feature vector of the input of the (a); Capturing long-distance dependency of node characteristics, comprising: the state transition is dynamically regulated through a discretization state equation, so that global context information among attention features of the graph is captured, time sequence enhancement features are generated, and the expression is as follows: ; Wherein, the Is a node Is provided with a timing enhancement feature of (c), Is a discretized state equation; Global aggregation is carried out according to the attention pool map reading function, unified global features are generated, and the expression comprises: ; Wherein, the Representing the global features that are ultimately generated, Representing the total number of nodes in the graph, Representing nodes Is a weighted sum of the attention pooling weights of (a), Representing the query vectors that can be learned, Representing the transformation matrix that can be learned, Representing a hyperbolic tangent activation function.
6. The method of multi-modal image fusion based on a graph neural network of claim 1, wherein inputting global features into a reconstruction network, generating a final fused image through deconvolution and upsampling operations, comprises: projecting and remodelling global features into an initial feature map through a full connection layer of a reconstruction network; Inputting the initial feature map into a decoder of a reconstruction network, and carrying out bilinear interpolation up-sampling on the initial feature map to enlarge the size of the initial feature map by 2 times so as to obtain a double feature map; The shallow feature map of the encoder is screened through a structured attention gate module of the reconstruction network, a weighted shallow feature map is obtained, and the calculation process of the attention gate module is as follows: ; ; Wherein, the In order for the attention to be weighted, Encoder No A shallow feature map of the layer, For the corresponding gating signal of the decoder, And Is that The convolution kernel is used to determine the convolution kernel, In order to output the transform convolution, The function is activated for Sigmoid, Representing the multiplication by element, The weighted shallow feature map is obtained; Channel splicing is carried out on shallow layer feature images of the layers corresponding to the encoder in the reconstruction network in the double feature images, and a spliced feature image is obtained; The spliced characteristic diagram is sequentially passed through two continuous A convolution layer, batch normalization and a ReLU activation function to obtain a high-resolution feature map; inputting the high-resolution feature map into a reconstruction layer of a reconstruction network, mapping feature dimensions into output image channel numbers through convolution operation to generate a residual image, introducing a residual connection mechanism, and superposing the residual image and the multi-mode image which is subjected to bicubic interpolation up-sampling to generate a final fusion image, wherein the expression is as follows: ; Wherein, the In order to finally fuse the images together, Representing a reconstruction layer comprising A convolution kernel for realizing channel dimension reduction, In the case of a multi-modal image, Up-sampling the multi-modal image to the same spatial resolution as the high resolution feature map is represented.
7. A multi-modal image fusion apparatus based on a graph neural network, comprising: the image acquisition module is used for acquiring multi-mode images from at least two different medical imaging devices; the feature extraction module is used for carrying out depth feature extraction on the multi-mode images to obtain a high-dimensional feature map of each mode; The matrix construction module is used for dividing the high-dimensional feature graph of each mode into a plurality of graph nodes, calculating the similarity between each graph node and adjacent nodes, and constructing an adjacent matrix of the multi-mode graph structure according to the similarity; The feature aggregation module is used for capturing the global association relation and the long-distance dependency relation of node features in graph nodes by taking the adjacency matrix as graph topology guidance, and performing global aggregation according to the attention pool graph reading function to generate global features; And the feature reconstruction module is used for inputting the global features into a reconstruction network and generating a final fusion image through deconvolution and up-sampling operations.
8. An electronic terminal comprising a processor and a memory coupled to the processor, wherein a computer program, the real-time transmission data, the calculation data and the setting parameters are stored in the memory, which when executed by the processor, performs the steps of the method according to any of claims 1 to 6.
9. A computer readable storage medium having stored thereon a computer program, said real time transmission data, calculation data and setting parameters, which when executed by a processor realizes the steps of the method according to any of claims 1 to 6.

Description

Multi-mode image fusion method, device and storage medium based on graph neural network Technical Field The invention relates to the technical field of medical image processing and artificial intelligence, in particular to a multi-mode image fusion method, device and storage medium based on a graph neural network. Background Multi-modal medical image fusion is an important technique in medical image analysis, aiming at effectively integrating image information from different imaging devices so as to provide a more comprehensive and accurate diagnosis basis. In the diagnosis of Alzheimer's disease, structural MRI can provide detailed brain anatomical information, while functional PET can reflect brain metabolic activity. The images of the two modes have obvious complementarity, and effective fusion of the images has important significance for early diagnosis of diseases. The traditional image fusion method such as wavelet transformation and Laplacian pyramid decomposition realizes information complementation to a certain extent, but still has the problems of detail loss, alignment error, non-ideal fusion effect and the like. In recent years, a fusion method based on deep learning is gradually rising, wherein a Convolutional Neural Network (CNN) is widely applied to an image fusion task. However, the CNN method has limited modeling capability for irregular data, and it is difficult to effectively capture complex spatial correlations and semantic dependencies between multimodal images. Disclosure of Invention The invention aims to overcome the defects in the prior art, and provides a multi-mode image fusion method, device and storage medium based on a graph neural network, which can solve the technical problems that the modeling capability of the prior art on irregular data is limited, and complex spatial association and semantic dependence between multi-mode images are difficult to capture effectively. In order to achieve the above purpose, the invention is realized by adopting the following technical scheme: in a first aspect, the present invention provides a method for multi-modal image fusion based on a graph neural network, including: Acquiring multi-modality images from at least two different medical imaging devices; Deep feature extraction is carried out on the multi-mode image, and a high-dimensional feature map of each mode is obtained; Dividing a high-dimensional feature map of each mode into a plurality of map nodes, calculating the similarity between each map node and adjacent nodes, and constructing an adjacent matrix of the multi-mode map structure according to the similarity; Capturing global association relation and long-distance dependency relation of node characteristics in graph nodes by taking an adjacency matrix as graph topology guidance, and performing global aggregation according to an attention pooling graph reading function to generate global characteristics; The global features are input into a reconstruction network, and a final fusion image is generated through deconvolution and upsampling operations. Further, before depth feature extraction is performed on the multi-mode image, the method includes: Adopting a non-local mean filtering algorithm to remove noise from the multi-mode image; normalizing the intensity of each mode image to a [0,1] interval; And carrying out spatial registration by adopting a method of combining affine transformation and elastic registration. Further, depth feature extraction is performed on the multi-mode image to obtain a high-dimensional feature map of each mode, including: carrying out depth feature extraction on the preprocessed image by adopting a feature extraction network based on Mamba model, wherein the feature extraction network models the preprocessed image through a discretization state space equation, and the expression of the discretization state space equation is as follows: ; Wherein, the Indicating time of dayIs used to determine the state vector of (1),Indicating time of dayIs used to determine the state vector of (1),The image after the preprocessing is represented and is displayed,A high-dimensional feature map is represented,The state matrix, the input matrix, the output matrix and the direct connection matrix are respectively discretized. Further, calculating the similarity between each graph node and its neighboring nodes includes: ; Wherein, the Is a nodeAnd adjacent nodeThe degree of similarity between the two,Is a nodeIs characterized in that,Is a nodeIs characterized in that,Is a nodeIs provided with a plurality of spatial positions,Is a nodeIs provided with a plurality of spatial positions,For the first weight to be balanced,For the second weight to be balanced,Is a spatial scale parameter. Further, capturing global association relation of node characteristics in the graph node, including: The attention coefficient between the center node and the neighbor node is calculated, and the expression is: ; based on the attention co