Search

EP-4742250-A2 - OPTIMIZED BURDEN TEST BASED ON NESTED T-TESTS THAT MAXIMIZE SEPARATION BETWEEN CARRIERS AND NON-CARRIERS

EP4742250A2EP 4742250 A2EP4742250 A2EP 4742250A2EP-4742250-A2

Abstract

A computer-implemented method of performing an optimized burden test for a particular gene, in which an optimal combination of a maximum allele count and a minimum pathogenicity score threshold that maximize significance of burden testing for rare deleterious variants are determined using a grid search protocol. Each combination of maximum allele count and minimum pathogenicity score threshold is tested with a t-test to obtain effect size and p-value. The combination of allele count and pathogenicity score threshold with the most significant p-value is selected as the optimal parameters for the rare deleterious variant burden test for a particular gene.

Inventors

  • FIZIEV, PETKO PLAMENOV
  • MCRAE, JEREMY FRANCIS
  • FARH, Kai-How

Assignees

  • ILLUMINA, INC.

Dates

Publication Date
20260513
Application Date
20221228

Claims (15)

  1. A method for determining a weighted variant polygenic risk score (PRS), comprising: accessing or acquiring genomic data and phenotypic data for a cohort of individuals, wherein the genomic data comprises gene sequence data for each individual and the phenotypic data comprises phenotypic measurement data for each individual, wherein the gene sequence data further comprises a respective carrier status for each variant and for each individual; for each gene of a plurality of genes, obtaining a plurality of gene-specific effective strength scores by: modeling a relationship between genotypes determined from the genomic data and phenotypes determined from the phenotypic data; determining, based on the relationship, an effective strength score of a strength of association in the cohort between carrier status of a plurality of variants of each gene and a phenotypic response; determining a weighted burden score based on the plurality of gene-specific effective strength scores, wherein the weighted burden score comprises a variant polygenic risk score determined by summing respective products of effect size and carrier status for each gene of the plurality of genes.
  2. The method of claim 1, wherein the variant polygenic risk score comprises a rare variant polygenic risk score, wherein a respective variant is determined to be rare based on an occurrence rate of the respective variant in a population being below a threshold.
  3. The method of claim 2, wherein the threshold is gene-specific.
  4. The method of any one of claims 1-3 wherein the phenotypic measurement data comprises quantitative biomarker phenotype measurement data.
  5. The method of claim 4, wherein the quantitative biomarker phenotype is burden tested using a two-tailed t-test.
  6. The method of any one of claims 1-4 wherein the phenotypic measurement data comprises categorical clinical diagnosis phenotype data.
  7. The method of any one of claims 1-6, wherein the respective carrier status for each variant and for each individual is determined on a gene-resolution basis
  8. The method of any one of claims 1-7, wherein a respective individual that possesses at least one rare deleterious variant within a particular gene is a carrier at the particular gene.
  9. The method of any one of claims 1-8, wherein a respective individual that does not possesses at least one rare deleterious variant within a particular gene is not a carrier at the particular gene.
  10. The method of any one of claims 1-9, wherein determining the effective strength score comprises applying a t-test.
  11. The method of claim 10, wherein a respective t-test is performed for the plurality of genes to obtain a plurality of gene-specific effective strength scores for a particular shared phenotypic response.
  12. The method of claim 11, wherein the weighted burden score is determined based on the plurality of gene-specific effective strength scores.
  13. The method of any one of claims 1-12, wherein the genomic data further comprises allele counts corresponding to groups of variants observed in a particular gene across the cohort of individuals.
  14. A system including one or more processors coupled to memory, the memory loaded with computer instructions to determine a weighted variant polygenic risk score (PRS), the instructions, when executed on the processors, implement actions comprising the method according to any one of claims 1 to 13.
  15. A non-transitory computer readable storage medium impressed with computer program instructions to determine a weighted variant polygenic risk score (PRS), the instructions, when executed on a processor, implement a method according to any one of claims 1 to 13.

Description

PRIORITY APPLICATIONS This application claims the benefit of and priority to the following: U.S. Provisional Patent Application No.: 63/294,813, titled "PERIODIC MASK PATTERN FOR REVELATION LANGUAGE MODELS," filed December 29, 2021 (Attorney Docket No. ILLM 1063-1/IP-2296-PRV);U.S. Provisional Patent Application No.: 63/294,816, titled "CLASSIFYING MILLIONS OF VARIANTS OF UNCERTAIN SIGNIFICANCE USING PRIMATE SEQUENCING AND DEEP LEARNING," filed December 29, 2021 (Attorney Docket No. ILLM 1064-1/IP-2297-PRV);U.S. Provisional Patent Application No.: 63/294,820, titled "IDENTIFYING GENES WITH DIFFERENTIAL SELECTIVE CONSTRAINT BETWEEN HUMANS AND NON-HUMAN PRIMATES," filed December 29, 2021 (Attorney Docket No. ILLM 1065-1/IP-2298-PRV);U.S. Provisional Patent Application No.: 63/294,827, titled "DEEP LEARNING NETWORK FOR EVOLUTIONARY CONSERVATION," filed December 29, 2021 (Attorney Docket No. ILLM 1066-1/IP-2299-PRV);U.S. Provisional Patent Application No.: 63/294,828, titled "INTER-MODEL PREDICTION SCORE RECALIBRATION," filed December 29, 2021 (Attorney Docket No. ILLM 1067-1/IP-2301-PRV);U.S. Provisional Patent Application No.: 63/294,830, titled "SPECIES-DIFFERENTIABLE EVOLUTIONARY PROFILES," filed December 29, 2021 (Attorney Docket No. ILLM 1068-1/IP-2302-PRV);U.S. Provisional Patent Application No.: 63/351,283, titled "OPTIMIZED BURDEN TEST BASED ON NESTED T-TESTS THAT MAXIMIZE SEPARATION BETWEEN CARRIERS AND NON-CARRIERS," filed June 10, 2022 (Attorney Docket No. ILLM 1070-1/IP-2368-PRV);U.S. Provisional Patent Application No.: 63/351,299, titled "RARE VARIANT POLYGENIC RISK SCORES," filed June 10, 2022 (Attorney Docket No. ILLM 1071-1/IP-2378-PRV); andU.S. Provisional Patent Application No.: 63/351,317, titled "COVARIATE CORRECTION INCLUDING DRUG USE FROM TEMPORAL DATA," filed June 10, 2022 (Attorney Docket No. ILLM 1073-1/IP-2387-PRV). The priority applications are incorporated by reference as if fully set forth herein. FIELD OF THE TECHNOLOGY DISCLOSED The technology disclosed relates to artificial intelligence type computers and digital data processing systems and corresponding data processing methods and products for emulation of intelligence (i.e., knowledge based systems, reasoning systems, and knowledge acquisition systems); and including systems for reasoning with uncertainty (e.g., fuzzy logic systems), adaptive systems, machine learning systems, and artificial neural networks. In particular, the technology disclosed relates to using deep convolutional neural networks to analyze ordered data. RELATED APPLICATIONS This application is related to US Nonprovisional Patent Application titled "RARE VARIANT POLYGENIC RISK SCORES" (Attorney Docket No. ILLM 1071-2/IP-2378-US), filed contemporaneously. The related application is hereby incorporated by reference for all purposes. This application is related to US Nonprovisional Patent Application titled "COVARIATE CORRECTION INCLUDING DRUG USE FROM TEMPORAL DATA" (Attorney Docket No. ILLM 1073-2/IP-2387-US), filed contemporaneously. The related application is hereby incorporated by reference for all purposes. INCORPORATIONS BY REFERENCE The following are incorporated by reference for all purposes as if fully set forth herein: Sundaram, L. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat. Genet. 50, 1161-1170 (2018);Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535-548 (2019);US Patent Application No. 62/573,144, titled "TRAINING A DEEP PATHOGENICITY CLASSIFIER USING LARGE-SCALE BENIGN TRAINING DATA," filed October 16, 2017 (Attorney Docket No. ILLM 1000-1/IP-1611-PRV);US Patent Application No. 62/573,149, titled "PATHOGENICITY CLASSIFIER BASED ON DEEP CONVOLUTIONAL NEURAL NETWORKS (CNNs)," filed October 16, 2017 (Attorney Docket No. ILLM 1000-2/IP-1612-PRV);US Patent Application No. 62/573,153, titled "DEEP SEMI-SUPERVISED LEARNING THAT GENERATES LARGE-SCALE PATHOGENIC TRAINING DATA," filed October 16, 2017 (Attorney Docket No. ILLM 1000-3/IP-1613-PRV);US Patent Application No. 62/582,898, titled "PATHOGENICITY CLASSIFICATION OF GENOMIC DATA USING DEEP CONVOLUTIONAL NEURAL NETWORKS (CNNs)," filed November 7, 2017 (Attorney Docket No. ILLM 1000-4/IP-1618-PRV);US Patent Application No. 16/160,903, titled "DEEP LEARNING-BASED TECHNIQUES FOR TRAINING DEEP CONVOLUTIONAL NEURAL NETWORKS," filed on October 15, 2018 (Attorney Docket No. ILLM 1000-5/IP-1611-US);US Patent Application No. 16/160,986, titled "DEEP CONVOLUTIONAL NEURAL NETWORKS FOR VARIANT CLASSIFICATION," filed on October 15, 2018 (Attorney Docket No. ILLM 1000-6/IP-1612-US);US Patent Application No. 16/160,968, titled "SEMI-SUPERVISED LEARNING FOR TRAINING AN ENSEMBLE OF DEEP CONVOLUTIONAL NEURAL NETWORKS," filed on October 15, 2018 (Attorney Docket No. ILLM 1000-7/IP-1613-US);US Patent Application No. 16/160,978, titled "DEEP LEARNING-BASED SPLICE SITE CLASSIFICATION," filed on October 15, 2018 (Attorney Docket No.