EP-4736138-A2 - SYSTEM AND METHOD FOR CELL-OF-ORIGIN CLASSIFICATION BASED ON INTERPRETABLE CELLULAR FEATURES
Abstract
A method of classifying a tissue sample by a classification system includes identifying, by the classification system, a plurality of tiles corresponding to whole-slide image data of the tissue sample; generating, by the classification system, a plurality of semantic masks corresponding to the plurality of tiles, each one of the plurality of semantic masks identifying a cell boundary and a cell type of each cell within a corresponding tile of the plurality of tiles; generating, by the classification system, a plurality of cellular features for each tile of the plurality of tiles based on a corresponding one of the plurality of semantic masks; and classifying, by the classification system, the tissue sample based on the plurality of cellular features for each one of the plurality of tiles.
Inventors
- GU, Qiangqiang
- SHAIKH, Nazim
- LIN, PING-CHANG
- Jayachandran, Srinath
- PORWAL, Prasanna
- LI, XIAO
- NIE, YAO
Assignees
- Ventana Medical Systems, Inc.
- Genentech, Inc.
Dates
- Publication Date
- 20260506
- Application Date
- 20240701
Claims (20)
- 1 . A method of classifying a tissue sample by a classification system based on machine learning, the method comprising: identifying, by the classification system, a plurality of tiles corresponding to whole-slide image data of the tissue sample; generating, by the classification system, a plurality of semantic masks corresponding to the plurality of tiles, each one of the plurality of semantic masks identifying a cell boundary and a cell type of each cell within a corresponding tile of the plurality of tiles; generating, by the classification system, a plurality of cellular features for each tile of the plurality of tiles based on a corresponding one of the plurality of semantic masks; and classifying, by the classification system, the tissue sample based on the plurality of cellular features for each one of the plurality of tiles.
- 2. The method of claim 1 , wherein the identifying the plurality of tiles comprises: receiving, by the classification system, the whole-slide image data corresponding to the tissue sample; and extracting, by the classification system, the plurality of tiles from the wholeslide image data.
- 3. The method of claim 1 , wherein the whole-slide image data comprises at least one digitized image of the tissue sample of a patient that is stained with hematoxylin and eosin (H&E) dyes or a region-of-interest (ROI) map.
- 4. The method of claim 1 , wherein the identifying the plurality of tiles further comprises: performing stain normalizing, by the classification system, based on the plurality of tiles to generate a plurality of normalized tiles.
- 5. The method of claim 4, wherein the performing stain normalizing comprises: generating, by a first model of the classification system, the plurality of normalized tiles based on the plurality of tiles, wherein the first model comprises a fully convolutional neural network.
- 6. The method of claim 1 , wherein the generating the semantic masks comprises: encoding, by an encoder of the classification system, the a tile of the plurality of tiles to generate encoded data corresponding to the tile; generating, by a segmentation decoder of the classification system, a segmentation mask corresponding to the tile based on the encoded data, the segmentation mask identifying the cell boundary of each cell within the tile; and classifying, by a classification decoder of the classification system, the cell type of each cell within the segmentation mask as one of a plurality of cell type categories; generating a semantic mask of the plurality of semantic masks to indicate the cell boundary and the cell type of each cell with the tile.
- 7. The method of claim 6, wherein the plurality of cell type categories comprises a tumor cell, a lymphocyte cell, and other.
- 8. The method of claim 1 , wherein the generating the plurality of cellular features for each tile of the plurality of tiles comprises: generating a plurality of nuclear-level features for the tile based on the corresponding one of the plurality of semantic masks; generating a plurality of tile-level features by aggregating the plurality of nuclear-level features; and extracting the plurality of cellular features from the plurality of tile-level features.
- 9. The method of claim 8, wherein the generating the plurality of nuclear- level features for the tile comprises: computing a plurality of nuclear morphology features for each cell having a tumor cell type within the corresponding one of the plurality of semantic masks, wherein the plurality of nuclear morphology features comprises at least one of: basic geometric features comprising shape, size, and circularity of a nucleus of the cell having the tumor cell type; first-order statistics of gray-level intensity inside the nucleus; texture features derived from gray-level co-occurrence matrix of the nucleus; advanced morphology features for characterizing irregularity of the nucleus; chromatin distribution features of the nucleus; nuclear boundary signature of the nucleus; or curvature features of the nucleus, and wherein the plurality of nuclear-level features comprises a collection of nuclear morphology features of all nuclei of cells having the tumor cell type within the corresponding one of the plurality of semantic masks.
- 10. The method of claim 8, wherein the generating the plurality of tile-level features comprises: calculating a statistical mean of ones of the nuclear-level features associated with cells having a tumor cell type within the corresponding one of the plurality of semantic masks to generate a mean vector; calculating a standard deviation of the ones of the nuclear-level features associated with cells having the tumor cell type to generate a standard deviation vector; and determining spatial distribution features of cells within the corresponding one of the plurality of semantic masks to generate one or more spatial distribution vectors, wherein the plurality of tile-level features comprises the mean vector, the standard deviation vector, and the one or more spatial distribution vectors.
- 11 . The method of claim 10, wherein the spatial distribution features comprises: density of each type of cell of cells within the corresponding one of the plurality of semantic masks; and average distances between cells within the corresponding one of the plurality of semantic masks.
- 12. The method of claim 8, wherein the extracting the plurality of cellular features comprises: removing one or more of the plurality of tile-level features that have low variance across cells or high correlation with other ones of the plurality of tile-level features; and normalizing remaining ones of the tile-level features to generate the plurality of cellular features.
- 13. The method of claim 1 , wherein the classifying the tissue sample comprises: generating, by an attention-based aggregator of the classification system, slide-level features based on the plurality of cellular features for each one of the plurality of tiles; and classifying, by a slide classifier of the classification system, the tissue sample based on the slide-level features.
- 14. The method of claim 1 , wherein the classifying of the tissue sample comprises: identifying the tissue sample as containing a first subtype of diffuse large 13- cell lymphoma (DLBCL) or a second subtype of DLBCL.
- 15. The method of claim 14, wherein the first subtype comprises a germinal center B-cell-like (GCB) subtype, and wherein the second subtype comprises an activated B-cell-like (ABC) subtype.
- 16. A classification system for classifying a tissue sample, the classification system comprising: a processor; and a memory storing instructions that, when executed on the processor, cause the processor to perform: identifying a plurality of tiles corresponding to whole-slide image data of the tissue sample; generating a plurality of semantic masks corresponding to the plurality of tiles, each one of the plurality of semantic masks identifying a cell boundary and a cell type of each cell within the tile; generating a plurality of cellular features for each tile of the plurality of tiles based on a corresponding one of the plurality of semantic masks; and classifying the tissue sample based on the plurality of cellular features for each one of the plurality of tiles.
- 17. The classification system of claim 16, wherein the generating the semantic masks comprises: encoding, by an encoder of the classification system, the a tile of the plurality of tiles to generate encoded data corresponding to the tile; generating, by a segmentation decoder of the classification system, a segmentation mask corresponding to the tile based on the encoded data, the segmentation mask identifying the cell boundary of each cell within the tile; and classifying, by a classification decoder of the classification system, the cell type of each cell within the segmentation mask as one of a plurality of cell type categories; generating a semantic mask of the plurality of semantic masks to indicate the cell boundary and the cell type of each cell with the tile, and wherein the plurality of cell type categories comprises a tumor cell, a lymphocyte cell, and other.
- 18. The classification system of claim 16, wherein the generating the plurality of cellular features for each tile of the plurality of tiles comprises: generating a plurality of nuclear-level features for the tile based on the corresponding one of the plurality of semantic masks; generating a plurality of tile-level features by aggregating the plurality of nuclear-level features; and extracting the plurality of cellular features from the plurality of tile-level features.
- 19. The classification system of claim 8, wherein the generating the plurality of tile-level features comprises: calculating a statistical mean of ones of the nuclear-level features associated with cells having a tumor cell type within the corresponding one of the plurality of semantic masks to generate a mean vector; calculating a standard deviation of the ones of the nuclear-level features associated with cells having the tumor cell type to generate a standard deviation vector; and determining spatial distribution features of cells within the corresponding one of the plurality of semantic masks to generate one or more spatial distribution vectors, wherein the plurality of tile-level features comprises the mean vector, the standard deviation vector, and the one or more spatial distribution vectors.
- 20. The classification system of claim 16, wherein the classifying the tissue sample comprises: generating, by an attention-based aggregator of the classification system, slide-level features based on the plurality of cellular features for each one of the plurality of tiles; and classifying, by a slide classifier of the classification system, the tissue sample based on the slide-level features.
Description
SYSTEM AND METHOD FOR CELL-OF-ORIGIN CLASSIFICATION BASED ON INTERPRETABLE CELLULAR FEATURES CROSS-REFERENCE TO RELATED APPLICATION(S) [0001] This application claims priority to, and the benefit of, Indian Provisional Application No. 202311044011 ("INTERPRETABLE FEATURE BASED NETWORK FOR CLASSIFYING CELL-OF-ORIGIN FROM WHOLE SLIDE IMAGES IN DIFFUSE LARGE B-CELL LYMPHOMA PATIENTS"), filed on June 30, 2023 with the Indian Patent Office, the entire content of which is incorporated herein by reference. FIELD [0002] Aspects of some embodiments of the present disclosure relate to a system and method for tissue sample classification. BACKGROUND [0003] Cancers in their various forms have become one of the leading causes of death worldwide. Diffuse large B-cell lymphoma (DLBCL), which accounts for about 25% to 30% of all the non-Hodgkin lymphomas, is an aggressive and the most common type of lymphoma. Although about two-thirds of DLBCL patients can be cured with standard treatment, research has focused on determining which patients have less favorable prognosis so that they can be considered for novel targeted- treatment strategies. Germinal center B-cell-like (GCB) and activated B-cell-like (ABC) are two major biologically distinct molecular subtypes of DLBCL. Patients with the ABC DLBCL generally have worse prognosis than the GCB DLBCL patients when treated with combined therapy R-CHOP (i.e. , a combination of chemotherapy and targeted therapy drugs used to treat cancer). Therefore, cell-of-origin (COO) classification or its surrogates have been incorporated into the clinical practice and clinical trials to help better understand DLBCL biological heterogeneity and enable researchers to develop more accurate therapeutic targeting strategies. [0004] Well-established COO classification algorithm uses gene expression profiling (GEP). However, as GEP is not widely accessible, researchers and pathologists in clinical practice approximate molecular subtypes using immunohistochemical (IHC) patterns such as the most widely used Hans algorithm, where expert visual assessment of multiple IHC assays are required. Due to the imperfection of IHC in assessing molecular subtype, more precise strategies are under development. [0005] The above information disclosed in this Background section is only for enhancement of understanding of the background and therefore the information discussed in this Background section does not necessarily constitute prior art. SUMMARY [0006] Aspects of some embodiments of the present disclosure are directed to a system and method for standardized and automated cell-of-origin (COO) classification based on hematoxylin and eosin (H&E) stained whole-slide-images (WSIs), which are readily available from primary diagnosis and thus tissue-saving and potentially more efficient by shortening the turnaround time. In some embodiments, the classification system leverages both interpretable cellular features derived from image tiles and an attention based multi-instance learning (AMIL) framework to provide classifications based on a single WSI of a tissue sample. [0007] According to some embodiments, the classification system first performs nuclei segmentation and classification to identify each nucleus in each tile of a WSI and to classify them into different phenotypes. Then, the classification system derives interpretable cellular features from nuclei in each image tile and uses them to generate a tile-level histopathological representation for the image tile. Lastly, the classification system utilizes an attention based multi-instance learning (AMIL) framework to aggregate all tile-level histopathological representations from a WSI to form the slide-level representation and to classify the whole slide image. [0008] According to some embodiments of the present disclosure, there is provided a method of classifying a tissue sample by a classification system based on machine learning, the method including: identifying, by the classification system, a plurality of tiles corresponding to whole-slide image data of the tissue sample; generating, by the classification system, a plurality of semantic masks corresponding to the plurality of tiles, each one of the plurality of semantic masks identifying a cell boundary and a cell type of each cell within a corresponding tile of the plurality of tiles; generating, by the classification system, a plurality of cellular features for each tile of the plurality of tiles based on a corresponding one of the plurality of semantic masks; and classifying, by the classification system, the tissue sample based on the plurality of cellular features for each one of the plurality of tiles. [0009] In some embodiments, the identifying the plurality of tiles includes: receiving, by the classification system, the whole-slide image data corresponding to the tissue sample; and extracting, by the classification system, the plurality of tiles from the whole-slide image data. [0010] In some e