US-12620069-B2 - Processing multiplex images and analysis of immune enriched spatial proteomic data
Abstract
Techniques are disclosed herein that encompass image pre-processing and a semi-supervised clustering for optimization and analysis of immune-enriched single-cell proteomics data generated via multiplexed imaging technologies. This is achieved through an image pre-processing pipeline, which converts image data contained in one type of file (e.g., .mcd) into another type of file (e.g., .tiff) and removes artifact signals from the image data using various algorithms to generate improved image data. Thereafter, a semi-supervised clustering pipeline analyzes the improved image data using various techniques, including implementing a supervised algorithm to identify metaclusters such as general immune phenotypes (e.g., CD4−T-cells, Macrophages, Neutrophils, etc.) as well as non-immune phenotypes while implementing an unsupervised algorithm that enables the identification of specific subclusters and a more in-depth cellular status characterization.
Inventors
- Sarah Bangerth
- Arianna Barbetta
- Juliet Ann Emamaullee
Assignees
- UNIVERSITY OF SOUTHERN CALIFORNIA
Dates
- Publication Date
- 20260505
- Application Date
- 20250307
Claims (20)
- 1 . A computer-implemented method comprising: accessing an image file of a specimen stained with a panel of antibodies, wherein: the image file comprises regions of interest files of the specimen, the regions of interest files comprise individual signal files corresponding to each antibody in the panel of antibodies used to stain the specimen, and the individual signal files comprise artifact signals corresponding to background noise; performing an image pre-processing method to remove the artifact signals from the individual signal files, wherein the image pre-processing method comprises: performing an iterative process comprising: (a) applying, to a first individual signal file of the individual signal files from a first region on interest file of the regions of interest files using a denoising filter, a first denoising threshold value to generate a first noise signal and a second denoising threshold value to generate a second noise signal, (b) removing, from the first individual signal file, the first noise signal and the second noise signal to generate a denoised image, (c) comparing the first individual signal file to the denoised image to determine the performance quality of the denoising filter, and (d) choosing, based on the comparing, to: (i) repeat steps (a)-(c) on the first individual file from the first region of interest file by modifying the first denoising threshold value and the second denoising threshold value, (ii) repeat steps (a)-(c) on the first individual signal file from a second or subsequent region of interest file of the regions of interest files, or (iii) ending the iterative process for the first individual signal file, and repeating the iterative process on a second or subsequent individual signal file of the individual signal files from the first region of interest to generate a set of denoised images for the specimen; and outputting the set of denoised images.
- 2 . The computer-implemented method of claim 1 , wherein the image file is obtained from imaging mass cytometry.
- 3 . The computer-implemented method of claim 1 , wherein: (i) the panel of antibodies comprise two or more antibodies that recognize CD66b, CD20, CD28, CD16, CD163, CD11b, CD45, CD4, CD31, CD279, CD68, Foxp3, CK7, Ki-67, CD8a, Collagen Type I, CD3e, CD138, HLA-DR, Granzyme B, DNA1, DNA2, or any combination thereof, and (ii) the panel of antibodies are labeled with metal tags and wherein the metal tags comprise 139La, 142Nd, 144Nd, 146Nd, 147Sm, 149Sm, 152Sm, 153Eu, 154Sm, 156Gd, 159Tb, 160Gd, 164Dy, 167Er, 168Er, 169Tm, 170Er, 172Yb, 174Yb, 175Lu, 191Ir, 193Ir, or any combination thereof.
- 4 . The computer-implemented method of claim 1 , wherein: the first denoising threshold value is a minimum filter value dependent upon the antibody panel and corresponding to a signal level below a designated minimum threshold, and the second denoising threshold value is a uniform filter value used to average pixel intensities, and (i) the minimum filter value is set to a desired integer and the uniform threshold value is set to null, (ii) the minimum filter value is set to a null value and the uniform threshold value is set to a desired integer value, (iii) the minimum filter value is set to a desired integer and the uniform threshold value is set to a desired integer value, or (iv) the minimum filter value is set to a null value and the uniform threshold value is set to a null value.
- 5 . The computer-implemented method of claim 1 , wherein repeating steps (a)-(c) on the first individual file from the second or subsequent regions of interest files comprises: (i) applying the first denoising threshold value and the second denoising threshold value to all the first individual files in the second or subsequent regions of interest files, (ii) applying new minimum threshold values and uniform threshold values to each of the first individual signal files in the second or subsequent regions of interest files, or (iii) a combination of (i) and (ii).
- 6 . The computer-implemented method of claim 1 , wherein the image pre-processing further comprises: performing another iterative process starting with a first denoised image from the set of denoised images, wherein the other iterative process comprises: (e) processing the first denoised image using a spillover correction filter to generate a spillover corrected image, (f) processing the spillover corrected image using an aggregate removal filter to generate an aggregate removal image, and (g) repeating steps (e) and (f) for a second or subsequent denoised image from the set of denoised images to generate a set of stacked images comprising the aggregate removal images.
- 7 . The computer-implemented method of claim 6 , further comprises performing downstream analysis on the set of stacked images, wherein the downstream analysis comprises: generating, by a cell segmentation tool using the set of stacked images, single-cell masks and a marker-expression matrix; generating, by a cell-phenotype identification pipeline using the single-cell masks and the marker-expression matrix, subclusters of cells based on their expression of lineage markers; generating, by an extraction algorithm using the expression of lineage markers associated with each subcluster of cells, a labeled dataset comprising a list the subclusters of cells and their corresponding expression patterns of the lineage markers; determining, by inputting the labeled dataset into a machine learning model, a clinical outcome based on the subclusters of cells.
- 8 . A system comprising: one or more processors; and one or more computer-readable media storing instructions which, when executed by the one or more processors, cause the system to perform operations comprising: accessing an image file of a specimen stained with a panel of antibodies, wherein: the image file comprises regions of interest files of the specimen, the regions of interest files comprise individual signal files corresponding to each antibody in the panel of antibodies used to stain the specimen, and the individual signal files comprise artifact signals corresponding to background noise; performing an image pre-processing method to remove the artifact signals from the individual signal files, wherein the image pre-processing method comprises: performing an iterative process comprising: (a) applying, to a first individual signal file of the individual signal files from a first region on interest file of the regions of interest files using a denoising filter, a first denoising threshold value to generate a first noise signal and a second denoising threshold value to generate a second noise signal, (b) removing, from the first individual signal file, the first noise signal and the second noise signal to generate a denoised image, (c) comparing the first individual signal file to the denoised image to determine the performance quality of the denoising filter, and (d) choosing, based on the comparing, to: (i) repeat steps (a)-(c) on the first individual file from the first region of interest file by modifying the first denoising threshold value and the second denoising threshold value, (ii) repeat steps (a)-(c) on the first individual signal file from a second or subsequent region of interest file of the regions of interest files, or (iii) ending the iterative process for the first individual signal file, and repeating the iterative process on a second or subsequent individual signal file of the individual signal files from the first region of interest to generate a set of denoised images for the specimen; and outputting the set of denoised images.
- 9 . The computer-implemented method of claim 8 , wherein the image file is obtained from imaging mass cytometry.
- 10 . The system of claim 8 , wherein: (i) the panel of antibodies comprise two or more antibodies that recognize CD66b, CD20, CD28, CD16, CD163, CD11b, CD45, CD4, CD31, CD279, CD68, Foxp3, CK7, Ki-67, CD8a, Collagen Type I, CD3e, CD138, HLA-DR, Granzyme B, DNA1, DNA2, or any combination thereof, and (ii) the panel of antibodies are labeled with metal tags and wherein the metal tags comprise 139La, 142Nd, 144Nd, 146Nd, 147Sm, 149Sm, 152Sm, 153Eu, 154Sm, 156Gd, 159Tb, 160Gd, 164Dy, 167Er, 168Er, 169Tm, 170Er, 172Yb, 174Yb, 175Lu, 191Ir, 193Ir, or any combination thereof.
- 11 . The system of claim 8 , wherein: the first denoising threshold value is a minimum filter value dependent upon the antibody panel and corresponding to a signal level below a designated minimum threshold, and the second denoising threshold value is a uniform filter value used to average pixel intensities, and (i) the minimum filter value is set to a desired integer and the uniform threshold value is set to null, (ii) the minimum filter value is set to a null value and the uniform threshold value is set to a desired integer value, (iii) the minimum filter value is set to a desired integer and the uniform threshold value is set to a desired integer value, or (iv) the minimum filter value is set to a null value and the uniform threshold value is set to a null value.
- 12 . The system of claim 8 , wherein repeating steps (a)-(c) on the first individual file from the second or subsequent regions of interest files comprises: (i) applying the first denoising threshold value and the second denoising threshold value to all the first individual files in the second or subsequent regions of interest files, (ii) applying new minimum threshold values and uniform threshold values to each of the first individual signal files in the second or subsequent regions of interest files, or (iii) a combination of (i) and (ii).
- 13 . The system of claim 8 , wherein the image pre-processing further comprises: performing another iterative process starting with a first denoised image from the set of denoised images, wherein the other iterative process comprises: (e) processing the first denoised image using a spillover correction filter to generate a spillover corrected image, (f) processing the spillover corrected image using an aggregate removal filter to generate an aggregate removal image, and (g) repeating steps (e) and (f) for a second or subsequent denoised image from the set of denoised images to generate a set of stacked images comprising the aggregate removal images.
- 14 . The system of claim 13 , further comprises performing downstream analysis on the set of stacked images, wherein the downstream analysis comprises: generating, by a cell segmentation tool using the set of stacked images, single-cell masks and a marker-expression matrix; generating, by a cell-phenotype identification pipeline using the single-cell masks and the marker-expression matrix, subclusters of cells based on their expression of lineage markers; generating, by an extraction algorithm using the expression of lineage markers associated with each subcluster of cells, a labeled dataset comprising a list the subclusters of cells and their corresponding expression patterns of the lineage markers; determining, by inputting the labeled dataset into a machine learning model, a clinical outcome based on the subclusters of cells.
- 15 . One or more non-transitory computer-readable media storing instructions which, when executed by one or more processors, cause a system to perform operations comprising: accessing an image file of a specimen stained with a panel of antibodies, wherein: the image file comprises regions of interest files of the specimen, the regions of interest files comprise individual signal files corresponding to each antibody in the panel of antibodies used to stain the specimen, and the individual signal files comprise artifact signals corresponding to background noise; performing an image pre-processing method to remove the artifact signals from the individual signal files, wherein the image pre-processing method comprises: performing an iterative process comprising: (a) applying, to a first individual signal file of the individual signal files from a first region on interest file of the regions of interest files using a denoising filter, a first denoising threshold value to generate a first noise signal and a second denoising threshold value to generate a second noise signal, (b) removing, from the first individual signal file, the first noise signal and the second noise signal to generate a denoised image, (c) comparing the first individual signal file to the denoised image to determine the performance quality of the denoising filter, and (d) choosing, based on the comparing, to: (i) repeat steps (a)-(c) on the first individual file from the first region of interest file by modifying the first denoising threshold value and the second denoising threshold value, (ii) repeat steps (a)-(c) on the first individual signal file from a second or subsequent region of interest file of the regions of interest files, or (iii) ending the iterative process for the first individual signal file, and repeating the iterative process on a second or subsequent individual signal file of the individual signal files from the first region of interest to generate a set of denoised images for the specimen; and outputting the set of denoised images.
- 16 . The one or more non-transitory computer-readable media of claim 15 , wherein: (i) the panel of antibodies comprise two or more antibodies that recognize CD66b, CD20, CD28, CD16, CD163, CD11b, CD45, CD4, CD31, CD279, CD68, Foxp3, CK7, Ki-67, CD8a, Collagen Type I, CD3e, CD138, HLA-DR, Granzyme B, DNA1, DNA2, or any combination thereof, and (ii) the panel of antibodies are labeled with metal tags and wherein the metal tags comprise 139La, 142Nd, 144Nd, 146Nd, 147Sm, 149Sm, 152Sm, 153Eu, 154Sm, 156Gd, 159Tb, 160Gd, 164Dy, 167Er, 168Er, 169Tm, 170Er, 172Yb, 174Yb, 175Lu, 191Ir, 193Ir, or any combination thereof.
- 17 . The one or more non-transitory computer-readable media of claim 15 , wherein: the first denoising threshold value is a minimum filter value dependent upon the antibody panel and corresponding to a signal level below a designated minimum threshold, and the second denoising threshold value is a uniform filter value used to average pixel intensities, and (i) the minimum filter value is set to a desired integer and the uniform threshold value is set to null, (ii) the minimum filter value is set to a null value and the uniform threshold value is set to a desired integer value, (iii) the minimum filter value is set to a desired integer and the uniform threshold value is set to a desired integer value, or (iv) the minimum filter value is set to a null value and the uniform threshold value is set to a null value.
- 18 . The one or more non-transitory computer-readable media of claim 15 , wherein repeating steps (a)-(c) on the first individual file from the second or subsequent regions of interest files comprises: (i) applying the first denoising threshold value and the second denoising threshold value to all the first individual files in the second or subsequent regions of interest files, (ii) applying new minimum threshold values and uniform threshold values to each of the first individual signal files in the second or subsequent regions of interest files, or (iii) a combination of (i) and (ii).
- 19 . The one or more non-transitory computer-readable media of claim 15 , wherein the image pre-processing further comprises: performing another iterative process starting with a first denoised image from the set of denoised images, wherein the other iterative process comprises: (e) processing the first denoised image using a spillover correction filter to generate a spillover corrected image, (f) processing the spillover corrected image using an aggregate removal filter to generate an aggregate removal image, and (g) repeating steps (e) and (f) for a second or subsequent denoised image from the set of denoised images to generate a set of stacked images comprising the aggregate removal images.
- 20 . The one or more non-transitory computer-readable media of claim 19 , further comprises performing downstream analysis on the set of stacked images, wherein the downstream analysis comprises: generating, by a cell segmentation tool using the set of stacked images, single-cell masks and a marker-expression matrix; generating, by a cell-phenotype identification pipeline using the single-cell masks and the marker-expression matrix, subclusters of cells based on their expression of lineage markers; generating, by an extraction algorithm using the expression of lineage markers associated with each subcluster of cells, a labeled dataset comprising a list the subclusters of cells and their corresponding expression patterns of the lineage markers; determining, by inputting the labeled dataset into a machine learning model, a clinical outcome based on the subclusters of cells.
Description
CROSS-REFERENCE TO RELATED APPLICATION The present application is a non-provisional application of and claims the benefit and priority under 35 U.S.C. 119 (e) of U.S. Provisional Application No. 63/562,886, filed on Mar. 8, 2024, the entire contents of which is incorporated herein by reference in its entirety for all purposes. STATEMENT OF GOVERNMENT SUPPORT This invention was made with government support under Grant No. CA245220 awarded by the National Institutes of Health (NIH). The government has certain rights in the invention. FIELD The present disclosure is directed generally to imaging processing, and in particular to techniques for removing artifact signals to improve image quality for data analysis. BACKGROUND High-throughput spatial imaging technologies involve advanced methodologies designed to concurrently detect and analyze multiple biomolecules or cellular components within tissue samples, while preserving their spatial context. These technologies are instrumental in unraveling the intricate organization and interactions of cells within tissues, offering valuable insights into various biological processes and disease mechanisms. An example of high-throughput spatial imaging technologies is Imaging Mass Cytometry. Imaging mass cytometry operates on the principle of amalgamating mass spectrometry and metal tags to achieve the simultaneous detection of numerous proteins or markers in tissue sections at subcellular resolution. Utilizing antibodies labeled with stable isotopes, typically metal isotopes, imaging mass cytometry enables the targeted identification and quantification of specific biomolecules or cellular components through mass spectrometry. In a typical imaging mass cytometry workflow, tissue sections are prepared and treated with a panel of metal-conjugated antibodies, each designed to target a distinct protein or marker of interest. Subsequently, laser ablation is employed to analyze the tissue, where each laser pulse removes a small portion of the sample. The ablated material undergoes ionization, and the resulting ions are subjected to mass spectrometry analysis, revealing the presence and abundance of the labeled proteins. Applications of high-throughput spatial imaging technologies span various domains, with significant contributions to cancer research, particularly in studying tumor microenvironments, heterogeneity, and immune cell interactions. In neuroscience, high-throughput spatial imaging has proven instrumental in investigating the molecular composition of brain tissues, aiding the understanding of neural circuits and neurodegenerative diseases. Furthermore, high-throughput spatial imaging finds application in immunology, enabling the detailed study of immune responses and the distribution of different immune cell types within tissues. In essence, these high-throughput spatial imaging technologies, exemplified by imaging mass cytometry, significantly contribute to advancing our comprehension of complex biological systems by providing detailed, multiplexed information while retaining the spatial context within tissues. SUMMARY Image processing techniques disclosed herein (e.g., a computer implemented method, system and operations thereof, and non-transitory computer-readable medium storing code or instructions executable by one or more processors) for removing artifact signals to improve image quality for data analysis. Disclosed herein are techniques for removing artifact signals to improve image quality for data analysis. More specifically, these techniques encompass image pre-processing and a semi-supervised clustering for optimization and analysis of immune-enriched single-cell proteomics data generated via multiplexed imaging technologies. This is achieved through an image pre-processing pipeline (described herein as the IMClean pipeline), which converts image data contained in one type of file (e.g., .mcd) into another type of file (e.g., .tiff) and removes artifact signals from the image data using various algorithms to generate improved image data. Thereafter, a semi-supervised clustering pipeline (described herein as the IMmuneCite clustering pipeline) analyzes the improved image data using various techniques, including implementing a supervised algorithm to identify metaclusters such as general immune phenotypes (e.g., CD4−T-cells, Macrophages, Neutrophils, etc.) as well as non-immune phenotypes while implementing an unsupervised algorithm that enables the identification of specific subclusters and a more in-depth cellular status characterization. Advantageously, the image pre-processing pipeline facilitates downstream cell classification and identification of different cell phenotypes, while the semi-supervised clustering pipeline offers a robust and detailed description of the wide spectrum of clusters such as immune cell phenotypes associated with each tissue pathology in samples (e.g., human liver tissue). Lastly, described herein are algorithms and models tha