US-12626402-B2 - Precise slice-level localization of intracranial hemorrhage on head CTS with networks trained on scan-level labels
Abstract
A weakly supervised intracranial hemorrhage (ICH) detection workflow includes training a deep learning (DL) model including a coupled convolutional neural network and recurrent neural network on a large dataset of CT scans with expert-labeled slices indicating presence or absence of ICH. Transfer learning (TL) is used to further train the DL model using a second large dataset of CT scans with only scan labels extracted from radiology reports using natural language processing (NLP). The DL model weights each slice of the scan against the final ICH diagnosis using an attention-based bi-directional long-short term memory network, where the attention weights represent slice-level ICH predictions. Model-generated heatmaps highlight significant regions of the CT scans that lead to the provided ICH predictions.
Inventors
- Yunan Wu
- Donald Robinson Cantrell
- Todd Parrish
- Aggelos Katsaggelos
- Virginia Boyce Hill
- Michael Alexander Iorga
- Michael Anthony Drakopoulos
- Amit Sanjay Adate
- Shamal Shashi Lalvani
- Andrew Mark Naidech
Assignees
- NORTHWESTERN UNIVERSITY
Dates
- Publication Date
- 20260512
- Application Date
- 20230511
Claims (20)
- 1 . A system for slice-level localization of intracranial hemorrhage (ICH) on head computed tomography (CT) scans, the system comprising: at least one device including a hardware computing processor; and a non-transitory memory having stored thereon computing instructions, executable by the hardware computing processor, to perform operations of a method for slice-level localization of intracranial hemorrhage (ICH) on head computed tomography (CT) scans, the method comprising: configuring a deep learning (DL) model including a convolutional neural network (CNN) and a recurrent neural network (RNN); initializing the CNN with first trained data for extracting important features from a CT scan for detecting an ICH; initializing the RNN with second trained data for detecting an ICH in a CT scan; inputting a CT scan into the CNN; performing convolutional neural network processing of the CT scan by the CNN for extracting important features from the CT scan for detecting an ICH; outputting slices of the CT scan that was processed, including extracted important features, from the CNN to pipelines of the RNN, wherein each pipeline comprises serially concatenated groupings of processing modules, wherein each grouping of processing modules comprises a fully-connected module, a leaky rectified linear unit, and a dropout module; processing each slice of the slices of the CT scan that was processed, in parallel, by a corresponding pipeline of the pipelines of the RNN; processing, by a bi-directional long-short term memory module, outputs from each pipeline, wherein attention weights of the bi-directional long-short term memory module are configured to predict slice-level labels of each slice of the slices of the CT scan that was processed; processing, by an attention module, outputs, of the slice-level labels of each slice of the slices of the CT scan that was processed, received from the bi-directional long-short term memory module, wherein the bi-directional long-short term memory module is configured to determine an ICH diagnosis of the CT scan; and outputting the ICH diagnosis.
- 2 . The system of claim 1 , wherein initializing the CNN with first trained data comprises training the CNN on a first dataset of CT scans including slice-level labels indicating presence status of ICH.
- 3 . The system of claim 1 , wherein initializing the RNN with second trained data comprises freezing the trained weight values of the CNN and training the RNN on a second dataset of CT scans including scan-level labels indicating presence status of ICH.
- 4 . The system of claim 1 , wherein the method further comprises fine-tuning the CNN and RNN by performing end-to-end DL model training with a third dataset of CT scans including scan-level labels indicating presence status of ICH.
- 5 . The system of claim 1 , wherein the method further comprises generating heatmaps showing important features for diagnosing ICH in CT slices based on gradient information flowing back to the last convolutional layer of the CNN.
- 6 . The system of claim 1 , wherein the CNN includes an EfficientNet-B2 network.
- 7 . The system of claim 1 , wherein the method further comprises preprocessing the CT scan prior to inputting the CT scan into the CNN, the preprocessing including applying a plurality of different window functions to the CT scan, concatenating results of applying the different window functions to the CT scan in a third dimension orthogonal to the image axes of the CT scan, and inputting the concatenated windowed CT scans into the CNN.
- 8 . The system of claim 1 , wherein outputting the processed CT scan including the extracted important features from the CNN to the RNN comprises reducing the feature size of the processed CT scan to a vector by a global average pooling (GAP) layer and outputting the vector to the RNN as extracted features for each slice.
- 9 . The system of claim 1 , wherein the method further comprises outputting probabilities of ICH by a last fully-connected layer of the CNN followed by a sigmoid activation function.
- 10 . A non-transitory computer readable medium having stored thereon computer-readable instructions executable by a hardware computing processor to perform operations of a method for slice-level localization of intracranial hemorrhage (ICH) on head computed tomography (CT) scans, the method comprising: configuring a deep learning (DL) model including a convolutional neural network (CNN), a plurality of network pipelines in parallel that receive respective data from the CNN, each network pipeline of the plurality of network pipelines including three serially coupled groupings of a fully connected layer, a LeakyReLU activation function, and a dropout layer, a bi-directional long-short term memory module configured to receive data from the plurality of network pipelines, and an attention module that receives data from the bi-directional long-short term memory module; initializing the CNN with first trained data for extracting important features from a CT scan for detecting an ICH; initializing the plurality of network pipelines, bi-directional long-short term memory module, and attention module with second trained data for detecting an ICH in a CT scan; inputting a CT scan into the CNN; performing convolutional neural network processing of the CT scan by the CNN for extracting important features from the CT scan for detecting an ICH; outputting slices of the CT scan that was processed, including extracted important features, from the CNN to the plurality of network pipelines; processing each slice of the slices of the CT scan that was processed, in parallel, by a corresponding network pipeline of the plurality of network pipelines; processing, by the bi-directional long-short term memory module, outputs from each network pipeline of the plurality of network pipelines, wherein attention weights of the bi-directional long-short term memory module are configured to predict slice-level labels of each slice of the slices of the CT scan that was processed; processing, by the attention module, outputs of the slice-level labels of each slice of the slices of the CT scan that was processed received from the bi-directional long-short term memory module, wherein the bi-directional long-short term memory module is configured to determine an ICH diagnosis of the CT scan; and outputting the ICH diagnosis.
- 11 . The medium of claim 10 , wherein initializing the CNN with first trained data comprises training the CNN on a first dataset of CT scans including slice-level labels indicating presence status of ICH.
- 12 . The medium of claim 10 , wherein initializing the plurality of network pipelines, bi-directional long-short term memory module, and attention module with second trained data comprises freezing the trained weight values of the CNN and training the plurality of network pipelines, bi-directional long-short term memory module, and attention module on a second dataset of CT scans including scan-level labels indicating presence status of ICH.
- 13 . The medium of claim 10 , wherein the method further comprises fine-tuning the CNN and the plurality of network pipelines, bi-directional long-short term memory module, and attention module by performing end-to-end DL model training with a third dataset of CT scans including scan-level labels indicating presence status of ICH.
- 14 . The medium of claim 10 , wherein the method further comprises generating heatmaps showing important features for diagnosing ICH in CT slices based on gradient information flowing back to the last convolutional layer of the CNN.
- 15 . The medium of claim 10 , wherein the CNN includes an EfficientNet-B2 network.
- 16 . The medium of claim 10 , wherein the method further comprises preprocessing the CT scan prior to inputting the CT scan into the CNN, the preprocessing including applying a plurality of different window functions to the CT scan, concatenating results of applying the different window functions to the CT scan in a third dimension orthogonal to the image axes of the CT scan, and inputting the concatenated windowed CT scans into the CNN.
- 17 . The medium of claim 10 , wherein outputting the processed CT scan including the extracted important features from the CNN to the plurality of network pipelines comprises reducing the feature size of the processed CT scan to a vector by a global average pooling (GAP) layer and outputting the vector to the plurality of network pipelines as extracted features for each slice.
- 18 . The medium of claim 10 , wherein the method further comprises outputting probabilities of ICH by a last fully-connected layer of the CNN followed by a sigmoid activation function.
- 19 . A method for slice-level localization of intracranial hemorrhage (ICH) on head computed tomography (CT) scans, the method comprising: initializing an EfficientNet-B2 network with first trained data for extracting important features from a CT scan for detecting an ICH; initializing, with second trained data for detecting an ICH in a CT scan, a plurality of network pipelines disposed in parallel, a bi-directional long-short term memory module coupled to outputs of the plurality of network pipelines, and an attention module coupled to outputs of the bi-directional long-short term memory module, wherein each network pipeline of the plurality of network pipelines comprises serially concatenated groupings of processing modules, wherein each grouping of processing modules comprises a fully-connected module, a leaky rectified linear unit, and a dropout module; preprocessing an input CT scan, the preprocessing including applying a plurality of different window functions to the CT scan, concatenating results of applying the different window functions to the CT scan in a third dimension orthogonal to the image axes of the CT scan, and inputting the concatenated windowed CT scans into the EfficientNet-B2 network; performing convolutional neural network processing of the CT scan by the EfficientNet-B2 network for extracting important features from the CT scan for detecting an ICH; outputting slices of the CT scan that was processed, including the extracted important features from the EfficientNet-B2 network, to the plurality of network pipelines; processing each slice of the slices of the CT scan that was processed, in parallel, by a corresponding network pipeline of the plurality of network pipelines; processing, by the bi-directional long-short term memory module, outputs from each network pipeline of the plurality of network pipelines, wherein attention weights of the bi-directional long-short term memory module are configured to predict slice-level labels of each slice of the slices of the CT scan that was processed; processing, by an attention module, outputs, of the slice-level labels of each slice of the slices of the CT scan that was processed, received from the bi-directional long-short term memory module, wherein the bi-directional long-short term memory module is configured to determine an ICH diagnosis of the CT scan; generating heatmaps showing important features for diagnosing ICH in CT slices based on gradient information flowing back to the last convolutional layer of the EfficientNet-B2 network; and outputting the ICH diagnosis.
- 20 . The method of claim 19 , further comprising: fine-tuning the EfficientNet-B2 network and the plurality of network pipelines, bi-directional long-short term memory module, and attention module by performing training with a dataset of CT scans including scan-level labels indicating presence status of ICH.
Description
CROSS REFERENCE TO RELATED APPLICATIONS The present application claims the benefit of priority under 35 U.S.C. § 119 from U.S. Provisional Patent Application Ser. No. 63/341,917, entitled “Identification of Intracranial Hemorrhage and its Subtypes on Head CT Scans Using Transfer Learning and Weakly-Supervised Networks,” filed on May 13, 2022, all of which is incorporated herein by reference in its entirety for all purposes. TECHNICAL FIELD The present disclosure generally relates to computed tomography, and more specifically relates to identification of intracranial hemorrhage and its subtypes on head computed tomography scans using transfer learning and weakly-supervised networks. BACKGROUND Acute intracranial hemorrhage (ICH) is a life-threatening medical emergency. ICH is devastating, accounting for ten percent (10%) to fifteen percent (15%) of all stroke cases with a high risk of mortality and disability. Radiologists typically read and interpret slice-level images produced by computed tomography (CT) head scans of a patient in order to diagnose ICH in the patient. Medical treatment of ICH may be delayed until the head scans are read and interpreted. Deep learning (DL) is a machine learning technology that utilizes a neural network, such as a convolutional neural network (CNN), to and identify and classify patterns in newly presented data such as images based on prior data by which the neural network has been trained. The description provided in the background section should not be assumed to be prior art merely because it is mentioned in or associated with the background section. The background section may include information that describes one or more aspects of the subject technology. SUMMARY An exemplary method for slice-level localization of intracranial hemorrhage (ICH) on head computed tomography (CT) scans includes configuring a deep learning (DL) model including a convolutional neural network (CNN) and a recurrent neural network (RNN). The CNN is initialized with first trained data for extracting important features from a CT scan for detecting an ICH. The RNN is initialized with second trained data for detecting an ICH in a CT scan. A CT scan is input into the CNN. Convolutional neural network processing of the CT scan is performed by the CNN for extracting important features from the CT scan for detecting an ICH. The processed CT scan including the extracted important features is output from the CNN to the RNN. Slices of the CT scan are processed, in parallel, by corresponding sequences of three fully connected layers of the RNN. Outputs of the parallel sequences of three fully connected layers are processed by a bi-directional long-short term memory module. Outputs of the bi-directional long-short term memory module are processed by an attention module to determine an ICH diagnosis of the CT scan. The ICH diagnosis is output. Initializing the CNN with first trained data may include training the CNN on a first dataset of CT scans including slice-level labels indicating presence status of ICH. The presence status may identify the ICH as either being present or not present in the CT scan. The presence status may identify the ICH as either being present or not present in the CT slice. The presence status may identify a location within the CT slice where the ICH is present and/or an extent of the presence of the ICH. Initializing the RNN with second trained data may include freezing the trained weight values of the CNN and training the RNN on a second dataset of CT scans including scan-level labels indicating presence status of ICH. The method may further include fine-tuning the CNN and RNN by performing end-to-end DL model training with a third dataset of CT scans including scan-level labels indicating presence status of ICH. The method may further include generating heatmaps showing important features for diagnosing ICH in CT slices based on gradient information flowing back to the last convolutional layer of the CNN. The CNN may include an EfficientNet-B2 network. The method may further include preprocessing the CT scan prior to inputting the CT scan into the CNN, the preprocessing including applying a plurality of different window functions to the CT scan, concatenating results of applying the different window functions to the CT scan in a third dimension orthogonal to the image axes of the CT scan, and inputting the concatenated windowed CT scans into the CNN. Outputting the processed CT scan including the extracted important features from the CNN to the RNN may include reducing the feature size of the processed CT scan to a vector by a global average pooling (GAP) layer and outputting the vector to the RNN as extracted features for each slice. The method may further include outputting probabilities of ICH by a last fully-connected layer of the CNN followed by a sigmoid activation function. An exemplary method for slice-level localization of intracranial hemorrhage (ICH) on head computed tomo