EP-4741514-A2 - METHODS AND REAGENTS FOR EFFICIENT GENOTYPING OF LARGE NUMBERS OF SAMPLES VIA POOLING

EP4741514A2EP 4741514 A2EP4741514 A2EP 4741514A2EP-4741514-A2

Abstract

Methods and associated reagents for efficient genotyping of large numbers of samples via pooling are disclosed herein. Some of the embodiments of the technology are directed utilizing Duplex Sequencing for efficient genotyping of large numbers of samples (e.g., nucleic acid samples, patient samples, tissue samples, blood samples, etc.) and associated applications. Various aspects of the present technology have many applications in both pre-clinical and clinical disease assessment, screening large sample numbers where relatively infrequent variants are being sought, and others.

Inventors

SALK, Jesse J.
DANAHER, Patrick
VALENTINE, Charles Clinton, III

Assignees

Twinstrand Biosciences, Inc.

Dates

Publication Date: 20260513
Application Date: 20191016

Claims (10)

A method for screening biological sources for variant allele(s), the method comprising: aliquoting a plurality of biological samples derived from the biological sources into a unique combination of sub-pools, wherein each biological sample comprises target double-stranded DNA molecules, and wherein each biological sample is aliquoted into more than one sub-pool, generating an error-corrected sequence read for each of a plurality of the target double-stranded DNA molecules in the sub-pools; identifying a presence of one or more variant allele(s) from the error-corrected sequence reads; and determining the biological source containing the variant allele(s) by identifying the unique combination of sub-pools containing the variant allele(s), wherein generating the error-corrected sequence reads for a target DNA molecule comprises: (i) single-stranded consensus sequencing; (ii) a combination of single-stranded and duplex consensus sequencing, or; (iii) ligating adapter molecules to the target double-stranded DNA molecules to generate a plurality of adapter-DNA molecules; for each of a plurality of adapter-DNA molecules, generating a set of copies of an original first strand of the adapter-DNA molecule and a set of copies of an original second strand of the adapter-DNA molecule; sequencing one or more copies of the original first and second strands to provide a first strand sequence and a second strand sequence; and comparing the first strand sequence and the second strand sequence to identify one or more correspondences between the first and second strand sequences.
The method of claim 1, wherein generating error-corrected sequence reads for the target DNA molecule further comprises selectively enriching one or more targeted genomic regions prior to sequencing.
The method of claim 2, wherein the one or more targeted genomic regions comprise genes known to harbor disease-causing mutations, optionally wherein a disease-causing mutation is or includes a loss of function mutation, a gain of function mutation, or a dominant negative mutation.
The method of claims 2 or 3, wherein the one or more targeted genomic regions comprise genetic loci known to be associated with a disease or disorder, optionally wherein the disease or disorder: (a) is a rare genetic disorder; (b) is a single-gene disorder or a complex disorder involving mutations in two or more genes; (c) is associated with an autosomal recessive mutation; (d) is associated with an autosomal dominant mutation; or (e) comprises Phenylketonuria (PKU), Cystic fibrosis, Sickle-cell anemia, Albinism, Huntington's disease, Myotonic dystrophy type 1, Hypercholesterolemia, Neurofibromatosis, Polycystic kidney disease 1 and 2, Hemophilia A, Muscular dystrophy (Duchenne type), Hypophosphatemic rickets, Rett's syndrome, Tay-Sachs disease, Wilson disease, and/or Spermatogenic failure.
The method of any one of claims 1-4, wherein identifying a presence of one or more variant allele(s) from the error-corrected sequence reads comprises comparing the error-corrected sequence reads to a reference genome DNA sequence.
The method of any one of claims 1-5, further comprising determining a frequency of the one or more variants among the plurality of target double-stranded DNA molecules in each sub-pool, optionally further determining if a biological source donor of the biological sample comprising the variant allele(s) is heterozygous or homozygous for the variant allele.
The method of any one of claims 3-6, wherein the one or more targeted genomic regions comprise a cancer driver, a proto-oncogene, a tumor suppressor gene and/or an oncogene, optionally wherein the cancer driver comprises ABL, ACC, BCR, BLCA, BRCA, CESC, CHOL, COAD, DLBC, DNMT3A, EGFR, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LAML, LGG, LIHC, LUAD, LUSC, MESO, OV, PAAD, PCPG, PI3K, PIK3CA, PRAD, PTEN, RAS, READ, SARC, SKCM, STAD, TGCT, THCA, THYM, TP53, UCEC, UCS, and/or UVM.
The method of claim 2, wherein the one or more targeted genomic regions comprise (a) a gene associated with a rare autoimmune, metabolic or neurological genetic disorder or disease; or (b) a genetic locus associated with rare genetic disorders of obesity, optionally wherein the rare genetic disorders of obesity are or include Proopiomelanocortin (POMC) Deficiency Obesity, Alström syndrome, Leptin Receptor (LEPR) Deficiency Obesity, Prader-Willi syndrome (PWS), Bardet-Biedl syndrome (BBS), and high-impact Heterozygous Obesity.
The method of any one of claims 1-8, wherein the biological sample comprises double-stranded DNA molecules extracted from tissue and/or a blood sample.
The method of any one of claims 1-9, wherein the number of sub-pools is or comprises: (a) 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, , 35, 36, 37, 38, 39, 40, 42, 45, 47, 50, 52, 55, 57, 60, 62, 65, 67, or 70 sub-pools, or; (b) between about 15 and about 40 sub-pools, between about 30 and about 50 sub-pools, between about 35 and about 55 sub-pools, between about 40 and about 60 sub-pools, or over 60 sub-pools.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to and the benefit of U.S. Provisional Patent Application No. 62/746,543, filed October 16, 2018, the disclosure of which is hereby incorporated by reference in its entirety. BACKGROUND Next generation DNA sequencing (NGS) makes it possible to sequence trillions of DNA bases in a single sequencer run. Although nucleotide throughput of sequencing has increased enormously in the last decade, cost-efficient technologies for multiplexing hundreds or thousands of samples together to capitalize on NGS' massive capacity have lagged. For some applications where the sequencing needs per sample are large (e.g., whole mammalian genomes or exomes), modest multiplexing capacity is sufficient. For samples where the sequencing needs per sample are small (e.g., panels of a few thousand or tens of thousands of base pairs in size), the cost of filling out a sequencer run becomes high, not by the sequencing itself, but by the cost and effort of preparing such large numbers of samples and individually labeling each with a unique index sequence, and then pooling for multiplexed sequencing. For example, applications involving population sequencing for rare inherited variants within a relatively small targeted gene panel, the hundreds or thousands of parallel library preparations is taxing, expensive and often rate limiting. SUMMARY The present technology relates generally to methods and associated reagents for efficient genotyping of multiple samples via pooling. In particular, some embodiments of the technology are directed to utilizing Duplex Sequencing for efficient genotyping of large numbers of samples (e.g., nucleic acid samples, patient samples, tissue samples, blood samples, plasma samples, serum samples, swabbing samples, scraping samples, cell culture samples, microbial samples etc.) and associated applications. For example, various embodiments of the present technology include performing Duplex Sequencing methods on pooled nucleic acid samples (e.g., patient DNA samples) to simultaneously sequence all, or targeted sections of the genome in a manner that is efficient (e.g., cost efficient, time efficient) and with high accuracy and sensitivity. Such embodiments allow for screening for variant alleles (i.e. genetic variants such as SNVs, MNVs, SNPs, MNPs, INDELs, mutations, structural variants, copy number variants, inversions, rearrangements, etc.) from a pool of a modest or large number of original pooled samples, as well as the identification of an individual sample (or samples) having the variant allele. Various aspects of this technology have many applications in both pre-clinical and clinical disease assessment, screening large sample numbers where relatively infrequent variants are being sought, and others. In some embodiments, the present disclosure provides methods for genotyping a plurality of biological samples via pooling that comprises the steps of pooling the plurality of biological samples, or nucleic acid derivatives of biological samples, into a unique combination of sub-pools, wherein each biological sample comprises target double-stranded DNA molecules: and generating an error-corrected sequence read for each of a plurality of the target double-stranded DNA molecules in the sub-pools. In certain embodiments, generating an error-corrected sequence read comprises the steps of ligating adapter molecules to the plurality of target double-stranded DNA molecules to generate a plurality of adapter-DNA molecules; for each of a plurality of adapter-DNA molecules, generating a set of copies of an original first strand of the adapter-DNA molecule and a set of copies of an original second strand of the adapter-DNA molecule; sequencing one or more copies of the original first and second strands to provide a first strand sequence and a second strand sequence: and comparing the first strand sequence and the second strand sequence to identify one or more correspondences between the first and second strand sequences. In one embodiment, the method further comprises identifying a donor source of nucleic acid present in the mixture of nucleic acid by deconvolving the error-corrected sequence reads into individual genotypes. For example, the method can include identifying a presence of one or more variant alleles from the error-corrected sequence reads; and determining the original biological sample containing the variant allele(s) by identifying the unique combination of sub-pools containing the variant allele(s). In another embodiment, the present technology provides a method for screening biological sources for a genetic variant that includes aliquoting a plurality of biological samples derived from the biological sources into a unique combination of sub-pools, wherein each biological sample comprises target double-stranded DNA molecules, and wherein each biological sample is aliquoted into more than one sub-pool. The method further includes gener