EP-4738279-A2 - MULTI-SCALE SPATIAL TRANSCRIPTOMICS ANALYSIS

EP4738279A2EP 4738279 A2EP4738279 A2EP 4738279A2EP-4738279-A2

Abstract

The present disclosure provides methods for identifying cells in an image. An apparatus for identifying cells in an image is also provided by the present disclosure. Further provided herein is a non-transitory computer-readable storage medium for performing the methods disclosed herein. Methods of diagnosing a disease or disorder and of treating a disease or disorder in a subject using the methods disclosed are also provided herein.

Inventors

HE, Yichun
LIU, JIA
WANG, XIAO

Assignees

The Broad Institute Inc.
Massachusetts Institute of Technology
President And Fellows Of Harvard College

Dates

Publication Date: 20260506
Application Date: 20220218

Claims (15)

A method of identifying cells in an image, the method comprising: receiving, for each of a plurality of spots in the image, a spatial location of the spot in the image and genetic information associated with the spot, wherein each spot corresponds to one or more pixels in the image; filtering a superset of the plurality of spots to provide the plurality of spots and to remove background noise, wherein filtering the superset of the plurality of spots comprises: identifying a first region having at least some of the superset of the plurality of spots disposed therein; identifying a second region having at least some of the superset of the plurality of spots disposed therein; determining, for the first region, a first regional density of spots; determining, for the second region, a second regional density of spots; comparing the first regional density to the second regional density to obtain a density comparison result; based on the density comparison result: providing each spot of the superset disposed in the first region as the plurality of spots; and discarding each spot of the superset disposed in the second region; based on the spatial location and the genetic information for each of the plurality of spots: determining at least one spot that represents a cell center; and for each spot determined to represent a cell center, identifying, as representing a cell in the image, a set of spots from the plurality of spots, wherein the set of spots belong to a same cell as the spot determined to represent the cell center; and outputting an indication of the set of spots determined for each of the cells identified in the image.
A method of identifying cells in an image, the method comprising: receiving, for each of a plurality of spots in the image, a spatial location of the spot in the image and genetic information associated with the spot, wherein each spot corresponds to one or more pixels in the image; based on the spatial location and the genetic information for each of the plurality of spots: determining at least one spot that represents a cell center; and for each spot determined to represent a cell center, identifying, as representing a cell in the image, a set of spots from the plurality of spots, wherein the set of spots belong to a same cell as the spot determined to represent the cell center; outputting an indication of the set of spots determined for each of the cells identified in the image; and rejecting noise in the set of the spots determined for each of the cells identified in the image, wherein rejecting noise in the set of the spots determined for each of the cells identified in the image comprises, for each cell: determining a border region of the cell having disposed therein a subset of spots of the set of spots determined for the cell; identifying, within the border region, a spot of the subset having a highest border region density of spots; determining, for each of one or more spots of the set of spots determined for the cell, a point density of spots; comparing, for each of the one or more spots, the point density and the highest border region density to obtain a density comparison result; based on each density comparison result: keeping, in the set of spots determined for the cell, each of the one or more spots having a higher point density than the highest border region density; and discarding, from the set of spots determined for the cell, each of the one or more spots having a lower point density than the highest border region density.
The method of claim 1 or 2, wherein determining, based on the spatial location and the genetic information for each of the plurality of spots, at least one spot that represents a cell center comprises: for each of the plurality of spots: calculating, based on the spatial location and the genetic information, a local density of the spot; and calculating, based on the spatial location and the genetic information, a minimum distance to another spot of the plurality of spots having a higher local density; and determining the at least one spot that represents the cell centers based, at least in part, on the calculated local densities and the minimum distances of the plurality of spots.
The method of claim 3, wherein calculating, based on the spatial location and the genetic information, a minimum distance to another spot of the plurality of spots having a higher local density comprises: for a spot having a highest local density among the plurality of spots, calculating, based on the spatial location and the genetic information, a distance to another spot of the plurality of spots having a highest minimum distance.
The method of claim 3, wherein determining the at least one spot that represents a cell center based on the local densities and the minimum distances comprises, for each spot: calculating a product of the local density of the spot and the minimum distance of the spot; and determining that the spot represents a cell center if the product has a value greater than a threshold value.
The method of claim 3, wherein calculating the local density for the spot and the minimum distance for the spot comprises: counting, within a region having a first radius around the spot, numbers of spots corresponding to different gene types.
The method of claim 6, wherein calculating the local density for the spot and the minimum distance for the spot comprises calculating a first parameter based on a spatial distance and a genetic correlation, wherein: the spatial distance represents a spatial distance between a spatial location of the spot and a spatial location of another spot; and the genetic correlation represents a correlation between numbers of spots corresponding to different gene types within the first region having a first radius around the spot and numbers of spots corresponding to different gene types within a second region having the first radius around the other spot.
The method of claim 6, wherein the first radius is approximately equal to an average size of the cells.
The method of claim 1 or 2, wherein the spots comprise RNA spots.
The method of claim 1 or 2, further comprising, for each spot included in the set of spots, counting, within a region having a second radius around the spot, numbers of spots corresponding to different gene types.
The method of claim 10, further comprising: for each spot included in the set of spots, calculating a second parameter based on a spatial location of the spot and the numbers of spots corresponding to the different gene types within the region having the second radius around the spot; and for at least one cell identified in the image, based on the second parameter, segmenting the cell into subcellular components.
The method of claim 11, wherein the subcellular components are subcellular organelles, and the subcellular organelles comprise a nucleus and a cytoplasm.
The method of claim 1 or 2, further comprising: for each of the identified cells in the image, classifying the cell into a cell type; and counting, within a region having a third radius around the cell, numbers of cells corresponding to different cell types.
The method of claim 13, further comprising: for each of the identified cells in the image, calculating a third parameter based on a spatial location of the cell and the numbers of cells corresponding to the different cell types within the region having the third radius around the cell; and clustering at least some of the identified cells in the image into tissue regions based on the third parameter.
At least one non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by at least one computer processor, perform the method of any one of the preceding claims.

Description

RELATED APPLICATIONS This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application, U.S.S.N. 63/151,374, filed February 19, 2021, which is incorporated herein by reference. BACKGROUND OF THE INVENTION Quantifying RNAs in their spatial context is crucial to understanding gene expression and regulation in complex tissues. Tissue functions arise from the orchestrated interactions of multiple cell types, which are shaped by differential gene expression in three-dimensional (3D) space. To chart the spatial heterogeneity of gene expression in cells and tissues, a myriad of image-based in situ transcriptomics methods (e.g., STARmap, FISSEQ, pciSeq, MERFISH, seqFISH, osmFISH, etc.) have been developed1-8, providing an atlas of subcellular RNA localization in intact tissues. In situ transcriptomic methods generate spatially resolved RNA profiles in intact tissues; however, it has proven challenging to directly extract low-dimensional representations of biological patterns from high-dimensional spatial transcriptomic data. One of the main challenges is achieving precise and automated cell segmentation that accurately assigns RNAs into individual cells for single-cell analysis. The most common cell segmentation strategy is labelling cell nuclei or cell bodies by fluorescent staining9-11 (e.g., DAPI, Nissl, WGA, etc.) and then segmenting the continuous fluorescent signals by conventional or machine learning (ML)-based methods12. However, conventional methods, such as distance-transformed watershed13, require manual curation to achieve optimal segmentation results. On the other hand, while ML-based methods14,15 can automatically detect the targets (cells) in fluorescent stainings, they still require manually annotated datasets for model training. A unified computational framework for integrative analysis of in situ transcriptomic data is needed to address these challenges. SUMMARY OF THE INVENTION Disclosed herein is an unsupervised and annotation-free framework, termed ClusterMap, which incorporates physical proximity and gene identity of RNAs, formulates the task as a point pattern analysis problem, and defines biologically meaningful structures and groups (e.g., cells, and organelles within cells). Specifically, ClusterMap can precisely cluster RNAs into cells, as well as subcellular structures, cell bodies, and tissue regions in both two- and three-dimensional space and consistently perform on diverse tissue types, including brain, placenta, gut, and cardiac tissues. ClusterMap is broadly applicable to a variety of in situ transcriptomic measurements to uncover gene expression patterns, cell-cell interactions, and tissue organization principles from high-dimensional transcriptomic images. ClusterMap is also useful in the diagnosis and treatment of disease (e.g., Alzheimer's disease, cancer). Here, instead of using fluorescent staining, patterns of spatially resolved RNAs that intrinsically encode high-dimensional gene expression information were utilized for subcellular and cellular segmentation, followed by cell-type mapping. To leverage the spatial heterogeneity of RNA-defined cell types, the same strategy was applied to cluster discrete cells into tissue regions. It was demonstrated that this computational framework (ClusterMap) can identify subcellular structures, cells, and tissue regions in a way that bypasses auxiliary cell staining, hyperparameter tuning, and manual labeling (FIG. 1). In one aspect, the present disclosure provides methods of identifying cells in an image comprising the steps of: receiving, for each of a plurality of spots in the image, a spatial location of the spot in the image and genetic information associated with the spot, wherein each spot corresponds to one or more pixels in the image;based on the spatial location and the genetic information for each of the plurality of spots: determining at least one spot that represents a cell center; andfor each spot determined to represent a cell center, identifying, as representing a cell in the image, a set of spots from the plurality of spots, wherein the set of spots belong to a same cell as the spot determined to represent the cell center; andoutputting an indication of the set of spots determined for each of the cells identified in the image. In some embodiments, determining one spot that represents a cell center comprises: for each of the plurality of spots: calculating, based on the spatial location and the genetic information, a local density of the spot;calculating, based on the spatial location and the genetic information, a minimum distance to another spot of the plurality of spots having a higher local density; anddetermining the at least one spot that represents the cell centers based, at least in part, on the calculated local densities and the minimum distances of the plurality of spots. In another aspect, the present disclosure provides an apparatus comprising: at least one computer processor; andat leas