US-20260128122-A1 - ACCURATE ALLELE-SPECIFIC SOMATIC COPY NUMBER CALLING FROM PICOGRAM QUANTITIES OF DNA

US20260128122A1US 20260128122 A1US20260128122 A1US 20260128122A1US-20260128122-A1

Abstract

The invention relates to a method of determining somatic allele-specific copy number alterations (CNAs) in the genomes of cells in a test-sample from a subject, the method comprising: i) providing an indexed-DNA library of DNA fragments resulting from whole-genome amplification of genomic DNA from cells of the test-sample, ii) providing whole genome sequencing data of reference non-cancer cells from a reference-sample from the subject; and iii) determining somatic allele-specific copy number alterations in the genome(s) of the cells of the test-sample; and associated methods and uses in cancer therapy.

Inventors

Joel NULSEN
Ahmed Ahmed

Assignees

OXFORD UNIVERSITY INNOVATION LIMITED

Dates

Publication Date: 20260507
Application Date: 20230929
Priority Date: 20221003

Claims (15)

1 . A method of determining somatic allele-specific copy number alterations (CNAs) in the genomes of cells in a test-sample from a subject, the method comprising: i) providing an indexed-DNA library of DNA fragments resulting from whole-genome amplification of genomic DNA from cells of the test-sample, wherein fragments of DNA in the indexed-DNA library are distributed among, and are indexed to, individual wells, and the fragments are at least 40 kb in length; ii) providing whole genome sequencing data of reference non-cancer cells from a reference-sample from the subject; and iii) determining somatic allele-specific copy number alterations in the genome(s) of the cells of the test-sample by: a) determining a read depth ratio (RDR), which is the ratio of read counts of an allele in the test-sample relative to the reference-sample, or a log ratio (LogR) thereof, thereby providing the total copy number of the allele, wherein the RDR or LogR value is derived from counts of the allele in reconstructed fragments of DNA that have been reconstructed in silico, and wherein the counts are within a bin of between 100 kb and 5 Mb, or between 80 kb and 120 kb, on the genome; and b) determining a B-allele frequency (BAF) value, which is the allelic ratio of parental alleles determined by the ratio of the wells supporting a reference allele versus a non-reference allele and averaged over phased groups of heterozygous single nucleotide polymorphisms (SNPs); and c) using the determined RDR or logR, and the BAF value to call allele-specific somatic CNAs via an allele specific CNA calling algorithm.
2 . The method according to claim 1 , wherein the allele-specific CNA-calling algorithm comprises or consists of a mean-squared error (MSE)-minimisation algorithm or TITAN algorithm.
3 . The method according to claim 2 , wherein the TITAN algorithm has a tuneable segment length parameter set within a range of 10 20 to 10 23 .
4 . The method according to any preceding claim , wherein two or more sequences of DNA from the same well may be determined to be a read pair from the same in silico reconstructed fragment if there is a gap between them shorter than 10 kb in length.
5 . The method according to any preceding claim , wherein providing an indexed-DNA library resulting from whole-genome amplification of genomic DNA from cells in the test-sample is by carrying out, or obtaining results from, a method of whole genome sequencing of the cells comprising the steps of: i) providing a multi-well array plate comprising rows and columns of reaction wells; ii) providing genomic DNA of cells of a test-sample, wherein the genomic DNA is distributed into a plurality of reaction wells on the multi-well array plate, such that there is no more than one single-stranded genomic DNA molecule of any given locus per reaction well, iii) carrying out whole genome amplification (WGA) of each genomic DNA molecule to provide multiple copies of the genomic DNA molecule in each reaction well; iv) fragmenting the DNA molecules of each reaction well and ligating a pair of looped adapters at each end or tagmenting using transposase-delivered adapters to form adapted-DNA fragments, wherein the looped adapters or transposase-delivered adapters comprise either a Column Index (Ci) sequence or a Row Index (Ri) sequence, wherein the Ci sequence is common to each looped adapter or transposase-delivered adapter of every reaction well in a column of the multi-well array plate, or wherein each Ri sequence is common to each looped adapter or transposase-delivered adapter of every reaction well in a row of the multi-well array plate; vi) providing the indexed DNA library by performing indexing PCR on the adapted-DNA fragments, wherein the adapted-DNA fragments are amplified to form indexed PCR products using forward and reverse indexing primers, wherein either a Row Index (Ri) sequence or Column Index (Ci) sequence is introduced by each forward and reverse indexing primers onto each end of the adapted-DNA fragments, such that the resulting indexed PCR products comprise both a pair of flanking Column Index (Ci) sequences that are common to each well of a column and a pair of flanking Row Index (Ri) sequences that are common to each well of a row, and vii) sequencing the indexed DNA library.
6 . The method according to any preceding claim , wherein the cells are cancerous cells, or pre-cancerous cells.
7 . The method according to any preceding claim , wherein the cells are laser-captured micro-dissected cells or circulating tumour cells.
8 . The method according to any preceding claim , wherein the cells from the reference-sample are cells from non-tumour-containing tissue or blood from the subject.
9 . The method according to any preceding claim , wherein the method further comprises sequencing the indexed-DNA library of DNA fragments to provide data for determining any single nucleotide polymorphisms (SNPs) in the genome of cells.
10 . A method for monitoring cancer in a subject, or monitoring or detecting minimal residual disease (MRD) or cancer relapse in a subject, the method comprising: determining somatic allele-specific copy number alterations (CNAs) in the genomes of cells in a test-sample from the subject in accordance with any of claims 1 - 9 .
11 . The method according to claim 10 , wherein the subject has received previous cancer therapy; optionally wherein the subject is in a state of remission.
12 . The method according to any of claims 10 - 12 , wherein the method further comprises treatment of the subject, such as administration of cancer therapy, if they are determined to have a progressing cancer, MRD or cancer relapse.
13 . A method of cancer therapy for a subject, the method comprising obtaining or receiving results of a method carried out in accordance with any of claims 1-10 ; wherein if the subject is determined to have a progressing cancer, MRD or cancer relapse, treating the subject, such as by administration of cancer therapy.
14 . The method according to claim 12 or 13 , wherein the cancer therapy comprises one or more of chemotherapy, radiotherapy, surgery, CAR-T therapy, and anti-cancer vaccination.
15 . Use of the method according to any of claims 1-9 for monitoring cancer in a subject, such as monitoring progression of cancer; or monitoring or detecting minimal residual disease (MRD) or cancer relapse in a subject.

Description

This invention relates to a method of determining somatic allele-specific copy number alterations (CNAs) in the genomes of cells in a test-sample from a subject. Cancer is a genomic disease. The driving force behind each cancer is a repertoire of genomic changes known as somatic alterations {Bailey, 2018 #6: Consortium, 2020 #7}. Understanding how these alterations drive cancer is one of the central aims of cancer genomics {Vogelstein, 2013 #13}, and efforts in this field have brought about clinical benefits including improved patient stratification, new prognostic biomarkers and an arsenal of new therapies {Berger, 2018 #30: Malone, 2020 #29}. Copy number alterations (CNAs) are an important class of somatic alterations in cancer in which large segments of the genome are either amplified or deleted. They have been found to drive the cancer phenotype {Sondka, 2018 #9: Wang, 2020 #10}, play a prominent role in cancer evolution {Gerstung, 2020 #8}, and hold substantial prognostic value {Hieronymus, 2018 #12: Smith, 2018 #11}. They are particularly important in genomically unstable cancer types such as ovarian {Penner-Goeke, 2017 #31}, oesophageal {Paulson, 2009 #32} and gastric cancers {Maleki, 2017 #33}. Due to this importance, researchers have developed numerous algorithms to call somatic CNAs from sequencing data, including ASCAT {Van Loo, 2010 #1}, ABSOLUTE {Carter, 2012 #26}, Control-FREEC {Boeva, 2012 #27}, OncoSNP-SEQ {Yau, 2013 #28}, and TITAN {Ha, 2014 #2}, among many others. In addition to determining the total number of copies at each locus in the genome, many of these algorithms also determine the relative numbers of the two parental alleles. Such methods are said to call allele-specific CNAs. This added allele-specific information allows researchers to identify subtle CNAs such as copy number neutral loss of heterozygosity (LOH) {Van Loo, 2010 #1}, which can have important clinical implications in cancer {Ryland, 2015 #23}. While early efforts tended to analyse bulk tumour samples, recent research has become more directed towards microscopic samples. For example, studies have investigated the therapeutic opportunities associated with circulating tumour cells {Lin, 2021 #39}, minimal residual disease {Artibani, 2021 #42; Luskin, 2018 #40} and cancer sub-populations such as tumour initiating cells {Qureshi-Baig, 2017 #41}. One approach for characterising the genomes of such samples is single cell sequencing. Several CNA-calling algorithms have been developed that combine shallow whole genome sequencing (WGS) data across hundreds of cells to call CNAs at single-cell resolution. Established algorithms for calling total copy number include Ginkgo {Garvin, 2015 #14} and AneuFinder {Bakker, 2016 #3}. More recently, researchers developed allele-specific single cell algorithms, including CHISEL {Zaccaria, 2021 #44} and Alleloscope {Wu, 2021 #43}. However, while they may provide accurate CNA calls, single cell methods are still limited in their accuracy to genotype single nucleotide variants (SNVs) {Dong, 2017 #45}, which are complementary to CNAs and can be highly consequential for therapeutics {Morotti, 2021 #46}. An ideal platform for sequencing microscopic tumour samples would generate both accurate SNVs and CNAs for a complete genomic characterisation. Moreover, many single cell methods rely on aggregating data across hundreds or even thousands of cells, and are therefore unsuitable when total sample size is limited to tens of cells. To investigate the genomes of tumour samples comprising tens of cells (e.g. obtained by laser capture microdissection), we recently developed a linked-read sequencing platform called DigiPico, which is described in International Patent Application Publication No: WO2021116677A1, and which is incorporated herein by reference. DigiPico allows accurate identification of clonal SNVs and short insertions and deletions (indels) from picogram quantities of DNA {KaramiNejadRanjbar, 2020 #4}. In the DigiPico protocol, we first distribute large fragments of genomic DNA, on the order of 100 kb in length, across one or more 384-well plates. Within each well separately, we then amplify, fragment and barcode the material. Performing the amplification step independently in each well allows us to computationally identify and remove artefactual mutations arising from oxidation and spontaneous deamination. Intuitively, real mutations have support from a larger proportion of wells than artefactual mutations, which tend to appear in only a single well due to the random nature of the DNA damage. We developed a machine learning algorithm, MutLX, to exploit this effect to reliably identify the real mutations in a DigiPico sample {KaramiNejadRanjbar, 2020 #4}. Given the biological and prognostic importance of somatic CNAs, we sought to develop a method to accurately call somatic allele-specific CNAs from DigiPico sequencing data. We noted that existing CNA calling methods for linked-reads only pr