Search

US-12626779-B2 - Determining a targeted therapy using virtual inference of protein activity by regulon enrichment analysis of an individual subject tissue sample

US12626779B2US 12626779 B2US12626779 B2US 12626779B2US-12626779-B2

Abstract

Methods for determining regulon enrichment in gene expression signatures are disclosed herein. An example method can include obtaining a set of transcriptional targets of a regulon. The method can include obtaining a gene expression signature by comparing a gene expression profile of a test sample to gene expression profiles of a plurality of samples representing control phenotypes. The method can include calculating a regulon enrichment score for each regulon in the gene expression signature. The method can including determining whether a number of control samples in the control phenotypes is above a predetermined threshold to support evaluation of statistical significance using permutation analysis. The method can include, in response to determining that the number of control samples is above the predetermined threshold, calculating a significance value by comparing each regulon enrichment score to a null model.

Inventors

  • Andrea Califano
  • Mariano Javier Alvarez

Assignees

  • THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK

Dates

Publication Date
20260512
Application Date
20200925

Claims (20)

  1. 1 . A method for determining targeted therapy using a cell or tissue sample from an individual subject having a disease or disorder, comprising: identifying the cell or tissue sample from the individual subject having the disease or disorder; quantifying a protein activity of each of a plurality of regulator proteins (RP) in the individual subject's sample to provide a subject sample-specific RP activity signature comprising a plurality of activated and/or deactivated RPs characteristic of the disease or disorder, wherein quantifying the protein activity of each RP comprises computationally inferring each RP activity of the individual subject's sample based, at least in part, on measured expression levels of a plurality of transcriptional targets of the respective RP's (regulon) not including the expression level of the RP, for the individual subject's sample and for a plurality of samples representing a control phenotype, in the context of a tissue-specific regulatory model; and determining a targeted therapy for the subject, based on a ranking of the activated and/or deactivated RPs of the sample specific RP activity signature of the individual subject's sample, wherein at least one of the ranked RPs is not differentially expressed relative to the control phenotype.
  2. 2 . The method of claim 1 , wherein computationally inferring each RP activity of the individual subject's sample based, at least in part, on the measured expression levels of the plurality of transcriptional targets (regulon) comprises using a comparison method that generates a quantitative measurement of difference between the test sample and the control samples.
  3. 3 . The method of claim 2 , wherein the comparison method can include one or more of a fold change, a Student's t-test, and a Mann-Whitney U test analysis.
  4. 4 . The method of claim 1 , wherein quantifying the protein activity of each RP comprises: calculating a regulon enrichment score for each regulon in the subject sample-specific RP activity signature; determining whether the number of control samples in the control phenotype is above a predetermined threshold to support evaluation of statistical significance using permutation analysis; and in response to determining that the number of control samples is above the predetermined threshold, calculating a significance value by comparing each regulon enrichment score to a null model.
  5. 5 . The method of claim 4 , wherein the significance value includes one or more of a P value and a normalized enrichment score.
  6. 6 . The method of claim 4 , wherein the null model is generated by randomly permuting the control samples for a preset number of iterations.
  7. 7 . The method of claim 4 , wherein in response to determining that the number of control samples is below the predetermined threshold, calculating the significance value by performing permutation of the genes in at least one or more control gene expression signatures and providing an analytic approximation of the gene expression signatures.
  8. 8 . The method of claim 4 , wherein the subject sample-specific RP activity signature is obtained by comparing the expression levels of each regulon in the individual subject's test sample against the control samples.
  9. 9 . The method of claim 4 , wherein the enrichment value of each regulon in the subject sample-specific RP activity signature is calculated using an analytic rank-based enrichment analysis configured to determine whether a shift in the positions of each regulon gene occurs when each regulon gene is projected on a corresponding rank-sorted gene expression signature.
  10. 10 . The method of claim 9 , wherein the analytic rank-based enrichment analysis further comprises: (a) calculating a first regulon enrichment score by using a one-tail approach based on an absolute value of the gene expression signature; (b) calculating a second regulon enrichment score by using a two-tail approach; (c) generating the regulon enrichment score by combining the first and the second regulon enrichment scores; (d) determining a weighting of the first and the second regulon enrichment scores in the regulon enrichment score based on an estimated mode of regulation using a three-tail approach; and (e) calculating a statistical significance for the regulon enrichment score by comparison to the null model.
  11. 11 . The method of claim 10 , further comprising determining a contribution of each target gene from a given regulon to the regulon enrichment score based on at least one or more of a regulator-target gene interaction confidence, direction of regulation, and pleotropic correction.
  12. 12 . The method of claim 10 , wherein the first and the second regulon enrichment scores are calculated for the given regulon.
  13. 13 . The method of claim 10 , wherein the two-tail approach further comprises inverting positions of genes whose expression can be repressed by a regulator in the gene expression signature before determining the second regulon enrichment score.
  14. 14 . A method for determining a targeted therapy using a tissue sample from an individual subject, comprising: identifying a cell or tissue sample from an individual subject having a disease or disorder; obtaining a gene expression signature by comparing the test sample to a plurality of samples representing a distinctive or control phenotype; calculating, in the context of a tissue-specific regulatory model, a regulon enrichment score of each regulon in the gene expression signature by combining a first regulon enrichment score calculated using a one-tail approach and a second regulon enrichment score calculated using a two-tail approach; calculating a significance value by comparing each regulon enrichment score to a null model to provide a subject sample-specific regulatory protein (RP) activity signature; and determining a targeted therapy for the subject, based on a ranking of the activated and/or deactivated RPs of the sample specific RP activity signature of the individual subject's sample, wherein at least one of the ranked RPs is not differentially expressed relative to the control phenotype.
  15. 15 . The method of claim 14 , wherein the first regulon enrichment score is calculated based on an absolute value of the gene expression signature.
  16. 16 . The method of claim 14 , wherein the significance value is used to perform an assessment of regulatory protein (RP) activity from the gene expression data.
  17. 17 . The method of claim 14 , wherein the significance value is used to identify a mechanism of action of at least one of a small molecule, an antibody, and a perturbagen.
  18. 18 . The method of claim 14 , wherein the significance value is used to evaluate the functional relevance of genetic alterations in regulatory proteins across different samples.
  19. 19 . The method of claim 14 , wherein the subject's sample comprises a tumor, and wherein the significance value is used to identify tumors with aberrant activity of druggable oncoproteins having a lack of mutations.
  20. 20 . The method of claim 19 , comprising: determining a differential activity for the druggable oncoproteins; assigning a statistical significance value to the differential activity by comparing a specific sample against a distribution of all available samples; and identifying druggable proteins that are aberrantly activated in the tumor by prioritizing each druggable protein of the plurality of druggable proteins with a statistically significant aberrant expression on an individual patient basis using a predefined significance threshold as potentially relevant pharmacological targets for that specific patient.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 15/248,975, filed Aug. 26, 2016, now U.S. Pat. No. 10,790,040, which claims the benefit of and priority to U.S. Provisional Patent Application No. 62/211,373, filed on Aug. 28, 2015, U.S. Provisional Patent Application No. 62/211,562, filed on Aug. 28, 2015, and U.S. Provisional Patent Application No. 62/253,342, filed on Nov. 10, 2015, the disclosures of which are incorporated by reference herein in their entirety. STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH This invention was made with government support under grants CA121852 and CA168426 awarded by the National Institutes of Health. The Government has certain rights in this invention. BACKGROUND Cancer initiation and progression can be driven by aberrant activity of oncoproteins working in concert to regulate critical tumor hallmark programs. Pharmacological inhibition of aberrantly activated oncoproteins can elicit oncogene dependency, which can motivate the development and use of targeted inhibitors in precision cancer medicine. While activating genetic alterations can allow identification of candidate drug targets, activating mutations can represent only one of several techniques to dysregulate the activity of an oncoprotein. Genetic and epigenetic events in cognate binding partners, competitive endogenous RNAs and upstream regulators can contribute to the aberrant activity of oncoproteins. Thus, although cells in which mutations have been activated in a specific oncogene can be generally more sensitive to corresponding targeted inhibitors, cells lacking such mutations can present equivalent sensitivity. Conversely, an activating mutation cannot be guaranteed to induce aberrant protein activity due to autoregulatory mechanisms and epigenetic allele silencing. Thus, there is a need for a more universal and systematic methodology for the accurate and reproducible assessment of protein activity to complement the ability to identify targeted therapy responders based on mutational analysis, especially because many cancer patients have no actionable oncogene mutations. In addition, change of protein activity following treatment of a tissue with a perturbagen can be critically relevant to determining whether that perturbagen has therapeutic value in that specific tissue context. Perturbagens can include, without limitation, small molecules, biologics, biophysical perturbations, and antibodies. For example, determining that small molecule A can inhibit protein kinase B, which can be aberrantly activated or mutated in cancer C, can be used as the basis to develop A as a targeted drug for tumor C. While gene expression data are ubiquitous in cancer research. Certain methods to measure protein abundance based on arrays or mass spectrometry technologies can be labor-intensive, costly, and either cover a small fraction of the proteomic landscape or require large amounts of tissue. More importantly, these methods provide only an indirect measure of protein activity, because the latter is determined by a complex cascade of events, including protein synthesis, degradation, post-translational modification, complex formation and subcellular localization. It is ultimately unclear whether protein activity can be directly and systematically assessed by certain individual assays. One issue is a dearth of certain experimentally validated methods to accurately assess the activity of arbitrary proteins in individual samples based on the expression of their regulon genes. Reasons for this include a lack of accurate and context-specific protein regulon models, the largely pleiotropic nature of transcriptional regulation, and a lack of methodologies to assess statistical significance from single samples. This can limit the ability to understand the functional effect of mutations on protein activity and to identify candidate responders to targeted inhibitors based on aberrant protein activity rather than mutations. Accordingly, there is a need to develop an experimentally validated method to accurately assess the activity of arbitrary proteins in individual samples based on the expression of their regulon genes. SUMMARY The disclosed subject matter provides systems and methods to infer protein activity from gene expression profile data. This can be used (a) to determine the functional impact of genetic mutations, (b) to identify the key regulator(s) responsible for implementing the transition between two phenotypic states in either physiological (e.g., tissue differentiation or reprogramming) and/or pathological (e.g., transition between normal and disease related state) context, (c) to identify the non-oncogene driver genes in cancer, both for a single patient, and also at the single cell level, and (d) to characterize the cell context-specific mechanism of action of different types of perturbation to the cell, and in particular of those implemented by perturbagens