CN-121600329-B - Intelligent brain tumor classification method and system based on multi-mode magnetic resonance

CN121600329BCN 121600329 BCN121600329 BCN 121600329BCN-121600329-B

Abstract

The application relates to the technical field of brain tumor classification detection, in particular to a brain tumor intelligent classification method and system based on multi-mode magnetic resonance, comprising the steps of sampling magnetic resonance images of a brain of a target object to obtain an up-sampling image and a down-sampling image; the method comprises the steps of inputting a downsampled image into a visual encoder to obtain a global visual token, dividing an upsampled image into a plurality of sub-images, inputting the sub-images into the visual encoder to obtain a visual token, removing a non-spatial patch token in the visual token to obtain anatomical detail features of each sub-image, splicing the anatomical detail features of each sub-image to obtain splicing features, fusing the splicing features and the global visual token to obtain a fused token, splicing the fused tokens of magnetic resonance images of each sequence to obtain fused features, projecting the fused features to obtain first projection features, and inputting the first projection features into a language model to obtain brain tumor classification labels of target objects. The method can realize the accurate classification of brain tumors.

Inventors

GONG XUAN
Qu Jingqiong
Wang Yidie
Kuang Shuwen
CHEN ZHOU
LIU CHAO

Assignees

中南大学湘雅医院

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (9)

1. An intelligent brain tumor classification method based on multi-mode magnetic resonance is characterized by comprising the following steps: S1, sampling a magnetic resonance image of a brain of a target object to obtain an up-sampling image and a down-sampling image, wherein the magnetic resonance image comprises a plurality of magnetic resonance images of sequences; s2, inputting the downsampled image to a visual encoder to obtain a global visual token, dividing the upsampled image into a plurality of subgraphs, and inputting the subgraphs to the visual encoder to obtain the visual token; S3, reserving each spatial patch token of the last layer in the visual tokens output by the visual encoder to obtain anatomical detail features of each sub-graph, and splicing the anatomical detail features of each sub-graph to obtain splicing features, wherein the spatial patch tokens are vector representations formed by embedding vectors of areas divided by the sub-graph; S4, injecting splicing features into the global visual tokens through an LR-HR cross-attention module of the visual language multi-mode model to obtain fusion tokens fused with high-resolution information, and splicing the fusion tokens of the magnetic resonance images of the sequences to obtain fusion features; and S5, projecting the fusion features by using a double-layer MLP projector to obtain first projection features, and inputting the first projection features into a language model to obtain the brain tumor classification labels of the target objects.
2. The method according to claim 1, wherein step S5 comprises: Projecting the fusion features by using a double-layer MLP projector to obtain a first projection feature, and presetting a text instruction for classification prediction; And inputting the first projection characteristic and the text instruction into a language model to obtain the brain tumor classification tag of the target object.
3. The method of claim 1, wherein the language model, anatomical detail aware encoder, and dual layer MLP projector are components of a brain-oriented visual language multi-modal model, the method further comprising a training process of the visual language multi-modal model, the training process of the visual language multi-modal model comprising: The method comprises the steps of acquiring a training set, wherein the training set comprises a two-dimensional magnetic resonance image with text description and a multi-sequence magnetic resonance image with a radiation detection report; And training an LR-HR cross-resolution attention module of the double-layer MLP projector and the anatomic detail perception encoder by a gradient descent method based on the two-dimensional magnetic resonance image and the multi-sequence magnetic resonance image to obtain the trained double-layer MLP projector and the trained LR-HR cross-resolution attention module.
4. The method of claim 3, wherein the training set further comprises three-dimensional magnetic resonance images labeled with classification labels, the language model, the visual encoder, and the dual layer MLP projector are components of a brain-oriented visual language multi-modal model, the method further comprising: Extracting magnetic resonance images from the training set; when the magnetic resonance image extracted from the training set is a two-dimensional magnetic resonance image, inputting the two-dimensional magnetic resonance image into the visual encoder, extracting a first visual embedded feature in the two-dimensional magnetic resonance image, projecting the first visual embedded feature through the trained double-layer MLP projector to obtain a second projection feature, generating a predicted text description of the two-dimensional magnetic resonance image by using the language model based on the second projection feature, and calculating a first model loss based on the predicted text description and the text description of the two-dimensional magnetic resonance image; When the magnetic resonance image extracted from the training set is a three-dimensional magnetic resonance image or a multi-sequence magnetic resonance image, inputting the extracted magnetic resonance image into the visual encoder, extracting a second visual embedded feature in the extracted magnetic resonance image, and projecting the second visual embedded feature through the trained double-layer MLP projector to obtain a third projection feature, and generating a prediction result of the extracted magnetic resonance image by using the language model based on the third projection feature; training the trained double-layer MLP projector, LR-HR cross-resolution attention module and language model in the visual language multi-modal model based on the first model loss or the second model loss to obtain a trained double-layer MLP projector, LR-HR cross-resolution attention module and language model; and performing secondary training on the trained double-layer MLP projector, the trained LR-HR cross-resolution attention module and the trained language model by using the training set to obtain a trained visual language multi-mode model.
5. The method according to claim 4, wherein the method further comprises: acquiring target model parameters of the visual language multi-mode model in a plurality of training rounds, and respectively constructing an intermediate visual language multi-mode model based on each target model parameter, wherein the network architecture of each intermediate visual language multi-mode model is consistent; Aiming at the three-dimensional magnetic resonance image in the training set, respectively carrying out classified prediction by using each intermediate vision language multi-modal model to obtain a plurality of prediction categories, and taking the category with the highest frequency in the plurality of prediction categories as the final prediction category of the three-dimensional magnetic resonance image; Determining the number of categories inconsistent with the actual category of the three-dimensional magnetic resonance image from the prediction categories, and calculating the confidence of the three-dimensional magnetic resonance image based on the number of categories and the total number of the prediction categories; Constructing a triplet based on the three-dimensional magnetic resonance image, the confidence level of the three-dimensional magnetic resonance image, and the final prediction category of the three-dimensional magnetic resonance image; Constructing a reliability data set based on the triplets of the three-dimensional magnetic resonance images; and carrying out parameter optimization on the visual language multi-mode model based on the reliability data set, and reconstructing an output format of the visual language multi-mode model so that the visual language multi-mode model outputs the confidence degree of the brain tumor classification label.
6. The method of claim 1, wherein the magnetic resonance image C m of the plurality of sequences has the expression: ; Wherein, the For passing the magnetic resonance image of scan plane v 0 in the T1 weighting sequence, For the magnetic resonance image of the scan plane v 0 in the T1c weighting sequence, For a magnetic resonance image of scan plane v 3 in the s 3 sequence, For a magnetic resonance image of scan plane v 4 in the s 4 sequence, The method comprises the steps of obtaining a magnetic resonance image of a scanning plane v 5 in a sequence s 5 , obtaining a scanning plane v 0 which is any scanning plane in a T1 weighted sequence, obtaining a T2 weighted sequence in the case that the sequence comprises the T2 weighted sequence and the FLAIR sequence in the s 3 sequence, obtaining a FLAIR sequence in the s 4 sequence, obtaining any scanning plane in the T2 weighted sequence in the scanning plane v 3 and obtaining any scanning plane in the FLAIR sequence in the scanning plane v 4 , obtaining a target sequence in the case that the sequence comprises the target sequence in the s 3 sequence, obtaining any scanning plane in the s 4 sequence, obtaining any scanning plane in the target sequence in the scanning plane v 3 , obtaining any scanning plane in the scanning plane v 4 , obtaining a T2 weighted sequence or a FLAIR sequence in the target sequence, and obtaining any one of the s 3 sequence and the s 4 sequence in the absence of the T2 weighted sequence and the FLAIR sequence in the sequence.
7. The method of claim 6, wherein a plurality of the sequences form a sequence combination, the fusion feature comprises a fusion feature corresponding to the plurality of sequence combinations, and step S5 comprises: Projecting the fusion features corresponding to a plurality of sequence combinations by using the double-layer MLP projector to obtain a first projection feature of each sequence combination; inputting each first projection characteristic into a language model to obtain an initial brain tumor classification label of each sequence combination and the confidence coefficient of each initial brain tumor classification label; taking the label with the highest frequency in the initial brain tumor classification labels as the brain tumor classification label of the target object, and calculating the confidence coefficient of the brain tumor classification label; Classification label for brain tumor Confidence of (1) The calculation formula of (2) is as follows: ; Wherein N 1 is the number of the initial brain tumor class labels consistent with the brain tumor class labels, M is the total number of the initial brain tumor class labels, Confidence of the label y m for the mth initial brain tumor classification, Is a kronecker function.
8. The method of claim 7, wherein the method further comprises: outputting a current detection report of the target object, wherein the current detection report comprises the brain tumor classification label and the confidence of the brain tumor classification label.
9. A brain tumor intelligent classification system based on multi-modal magnetic resonance, the system comprising: The system comprises a sampling subsystem, a sampling subsystem and a sampling subsystem, wherein the sampling subsystem is used for sampling a magnetic resonance image of a target object to obtain an up-sampling image and a down-sampling image; The coding subsystem is used for inputting the downsampled image to a visual encoder to obtain a global visual token, dividing the upsampled image into a plurality of subgraphs, and inputting the subgraphs to the visual encoder to obtain the visual token; The splicing subsystem is used for reserving each spatial patch token of the last layer in the visual tokens output by the visual encoder to obtain the anatomical detail characteristics of each sub-graph, splicing the anatomical detail characteristics of each sub-graph to obtain splicing characteristics, and the spatial patch tokens are vector representations formed by embedding vectors of the areas divided by the sub-graphs; the fusion subsystem is used for injecting splicing features into the global visual tokens through an LR-HR cross-attention module of the visual language multi-mode model to obtain fusion tokens fused with high-resolution information, and splicing the fusion tokens of the magnetic resonance images of the sequences to obtain fusion features; And the classification subsystem is used for projecting the fusion features by using a double-layer MLP projector to obtain first projection features, and inputting the first projection features into a language model to obtain the brain tumor classification labels of the target objects.

Description

Intelligent brain tumor classification method and system based on multi-mode magnetic resonance Technical Field The application relates to the technical field of brain tumor classification detection, in particular to an intelligent brain tumor classification method and system based on multi-mode magnetic resonance. Background Brain tumors refer to cell clusters that grow abnormally in brain tissue or its surrounding structures, and can be classified into benign and malignant types, and can be classified into primary brain tumors and secondary brain tumors according to the site of origin. But different types of brain tumors differ greatly in biological behavior, growth rate, invasiveness, response to treatment, etc. Only with a clear classification can the most appropriate treatment regimen be formulated. Clinical brain tumor classification mainly depends on manual interpretation of a neuroimaging physician, and has the characteristics of long time consumption, easy error and the like, and the problem is further aggravated by the shortage of the neuroradiology specialists in the global scope, and a great amount of time is required to be invested for system training in order to acquire the professional ability of the field. Therefore, there is a need to design an intelligent classification prediction system for brain tumors. When the computer is used for classifying and predicting brain tumor, the classification and prediction are mainly performed through data enhancement and primary feature extraction, multi-scale feature extraction and enhancement and feature fusion, but finer anatomical detail features cannot be captured while the calculation efficiency is ensured. For example, the patent application CN 120107703A performs classification prediction by data enhancement and primary feature extraction, multi-scale feature extraction and enhancement, and feature fusion, and has a problem of low classification accuracy. Disclosure of Invention Based on the above, it is necessary to provide a method and a system for intelligent classification of brain tumor based on multi-mode magnetic resonance, so as to realize accurate classification of brain tumor. A brain tumor intelligent classification method based on multi-modal magnetic resonance, the method comprising: S1, sampling a magnetic resonance image of a brain of a target object to obtain an up-sampling image and a down-sampling image, wherein the magnetic resonance image comprises a plurality of magnetic resonance images of sequences; s2, inputting the downsampled image to a visual encoder to obtain a global visual token, dividing the upsampled image into a plurality of subgraphs, and inputting the subgraphs to the visual encoder to obtain the visual token; S3, eliminating non-space patch tokens in the visual tokens to obtain anatomic detail features of each sub-graph, and splicing the anatomic detail features of each sub-graph to obtain spliced features; S4, fusing the splicing characteristics and the global visual tokens to obtain fusion tokens, and splicing the fusion tokens of the magnetic resonance images of the sequences to obtain fusion characteristics; and S5, projecting the fusion features by using a double-layer MLP projector to obtain first projection features, and inputting the first projection features into a language model to obtain the brain tumor classification labels of the target objects. The method has the beneficial effects that finer anatomical detail characteristics can be captured while the calculation efficiency is ensured, so that the final brain tumor classification label is more accurate. In one embodiment, step S5 includes: Projecting the fusion features by using a double-layer MLP projector to obtain a first projection feature, and presetting a text instruction for classification prediction; And inputting the first projection characteristic and the text instruction into a language model to obtain the brain tumor classification tag of the target object. In the application, the first projection characteristic and the text instruction are simultaneously input into the language model, so that the task executed by the language model is a classification prediction task, but not another task, and the brain tumor classification label of the target object can be obtained. In one embodiment, the language model, the visual encoder and the dual-layer MLP projector are components of a brain-oriented visual language multi-modal model, the method further comprising a training process of the visual language multi-modal model, the training process of the visual language multi-modal model comprising: The method comprises the steps of acquiring a training set, wherein the training set comprises a two-dimensional magnetic resonance image with text description and a multi-sequence magnetic resonance image with a radiation detection report; Training the double-layer MLP projector and the anatomic detail perception encoder in the visual language multi-m