EP-3924510-B1 - METHOD OF DETERMINING THE ORIGIN OF NUCLEIC ACIDS IN A MIXED SAMPLE
Inventors
- KOUMBARIS, GEORGE
- ACHILLEOS, ACHILLEAS
- IOANNIDES, Marios
- PATSALIS, Philippos
Dates
- Publication Date
- 20260506
- Application Date
- 20200211
Claims (10)
- A method for determining the origin of a nucleic acid fragment, or detecting a nucleic acid fragment, in a mixture of nucleic acid fragments, wherein the mixture comprises maternal and fetal cell-free DNA, or tumor and non-tumor cell-free DNA, the method comprising the steps of a) providing a mixture of fragmented nucleic acids stemming from a human subject, b) preparing a sequencing library from the mixture of fragmented nucleic acids, c) hybridizing one or more probes to at least one location in said library, wherein the mixture of fragmented nucleic acids comprises a genomic region comprising, at a distance of less than 300 bp, genomic bases at which the frequency of being an end-point of a read is significantly different, with a p-value of less than 0.05, between two tissue types present in a mixture of cell-free DNA (cfDNA) and said probe covers said genomic region, wherein the probes are double-stranded probes and, i. each probe is between 100-500 base pairs in length, ii. each denatured probe has a 5'-end and a 3'-end, iii. each probe binds to a genomic region comprising, at a distance of less than 300 bp, genomic bases differentiating two tissue types present in a mixture of cfDNA at least 10 base pairs away, on both the 5'-end and the 3'-end, from regions harboring copy number variations (CNVs), segmental duplications or repetitive DNA elements, and iv. the GC content of each probe is between 10% and 80%, d) isolating one or more fragmented nucleic acids from the mixture that are bound by the one or more probes, e) sequencing the enriched library wherein a duplication rate of the sequencing library from the template DNA fragments is more than 5%, f) determining the fragment size of a fragmented nucleic acid without alignment on any reference genome, by using the sequence similarity of sequenced reads by detecting the amount of overlap and/or homology of paired-reads or the length of single reads, g) determining the sequence of the, at least 20bp, outermost nucleotides of the fragmented nucleic acid, and h) utilizing the information from steps (f) and (g) in order to i. determine whether the size of the nucleic acid fragment is less than 150 bp, or ii. determine whether the size of the nucleic acid fragment is greater than or equal to 150 bp and the sequence derived from step (g) overlaps said genomic region, and i) categorizing the fragment as of placental or tumor origin if the criteria in step (h)(i) or step (h)(ii) are met or categorizing the fragment as of maternal or non-tumor origin if neither of the criteria in step (h)(i) and step (h)(ii) are met.
- The method according to claim 1, wherein the nucleic acid fragment is circulating cell-free DNA or RNA.
- The method according to any of the claims 1 and 2, wherein the nucleic acid fragments are selected from the groups comprising: i. embryonic DNA and maternal DNA, ii. tumor derived DNA and non-tumor derived DNA, iii. pathogen DNA and host DNA, iv. DNA derived from a transplanted organ and DNA derived from the host.
- The method according to any of the preceding claims, wherein the nucleic acid fragment to be detected or the origin of which is to be determined is present in the mixture at a concentration lower than a nucleic acid fragment from the same genetic locus but of different origin.
- The method according to claim 4, wherein the nucleic acid fragment to be detected or the origin of which is to be determined and the nucleic acid fragment from the same genetic locus but of different origin are present in the mixture at a ratio selected from the group of, 1:2, 1:4, 1:10, 1:20, 1:50, 1:100, 1:200, 1:500, 1:1000, 1:2000 and 1:5000.
- The method according to any of the preceding claims, wherein the probes are fixed to a support.
- The method according to any of the preceding claims, wherein the probes are biotinylated and are bound to streptavidin-coated magnetic beads.
- The method according to any of the preceding claims, wherein the GC content of the probes or probes is between 10% and 70%, preferably 15% and 60%, more preferably 20% and 50%.
- A method for isolating one or more nucleic acid fragments from a mixture of nucleic acid fragments, comprising the steps of: a. providing a mixture of fragmented nucleic acids, preferably DNAs, stemming from a human subject; b. hybridizing one or more probes to at least one location in the nucleic acid fragments, where a genomic region comprising, at a distance of less than 300 bp, genomic bases at which the frequency of being an end-point of a read is significantly different, with a p-value of less than 0.05, between two tissue types present in a mixture of cfDNA lies, wherein: i. each probe is between 100-500 base pairs in length; ii. each denatured probe has a 5'-end and a 3'-end; iii. each probe binds to a genomic region comprising, at a distance of less than 300 bp, genomic bases differentiating two tissue types present in a mixture of cfDNA at least 10 base pairs away, on both the 5'-end and the 3'-end, from regions harboring copy number variations (CNVs), segmental duplications or repetitive DNA elements; and iv. the GC content of each probe is between 10% and 80%, or c. amplifying one or more locations from the nucleic acid fragments, wherein the primers for the amplification lie adjacent to said genomic region.
- Kit for determining the origin of a nucleic acid fragment in a mixture of nucleic acid fragments for the use in a method according to claims 1 to 9, comprising: a. probes that hybridize to at least one location in the nucleic acid fragment, wherein said at least one location partially or completely encompasses a genomic region comprising, at a distance of less than 300 bp, genomic bases at which the frequency of being an end-point of a read is significantly different, with a p-value of less than 0.05, between two tissue types present in a mixture of cfDNA, wherein i. each probe is between 100-500 base pairs in length; ii. each denatured probe has a 5'-end and a 3'-end; iii. each probe binds to a genomic region comprising, at a distance of less than 300 bp, genomic bases differentiating two tissue types present in a mixture of cfDNA at least 10 base pairs away, on both the 5'-end and the 3'-end, from regions harboring copy number variations (CNVs), segmental duplications or repetitive DNA elements; and iv. the GC content of each probe is between 10% and 80%; and, optionally, b. reagents and/or software for performing the method described according to claims 1 to 9 and a determination and/or detection method.
Description
FIELD OF THE INVENTION The invention is in the field of biology, medicine and chemistry, in particular in the field of molecular biology and more in particular in the field of molecular diagnostics. BACKGROUND OF THE INVENTION The discovery of cell-free fetal DNA (cffDNA) in maternal plasma has greatly promoted the development of non-invasive prenatal diagnosis. However, the concentration of cffDNA in maternal plasma varies among individuals, is extremely low and accounts for, in most cases, 2-19% of the total maternal plasma cell-free DNA (cfDNA). When the proportion of cffDNA in the maternal circulation is below 4%, even with next generation sequencing (NGS) technology, which has a high sensitivity, obtaining sufficient accuracy for non-invasive prenatal testing (NIPT) is challenging. Methods for detecting and monitoring diseases, such as cancer, based on the analysis of cfDNA fragmentation patterns are described in the international patent applications pub. No. WO 2018/081130 A1 and WO 2017/181146 A1. SUMMARY OF THE INVENTION Eukaryotic genomes are organized into chromatin which enables not only to compact DNA but also regulates DNA metabolism (replication, transcription, repair, recombination). A current challenge is thus to understand (i) how functional chromatin domains are established in the nucleus, (ii) how chromatin structure/information is dynamic through assembly, disassembly, modifications and remodeling mechanisms and (iii) how these events participate in and/or maintain disease establishment, progression and relapse. Understanding these events will allow identification of novel mechanisms of disease progression and new therapeutic targets, as well as controlling the effect of therapeutic molecules. It has been shown that signatures of chromatin structure in eukaryotic organisms, in particular the nucleosome arrangement, can be used to identify rare nucleic acid fragments in complex mixtures present in eukaryotic organisms (Heitzer E. et al. Nat. Rev. Genet. 2019 Feb;20(2):71-88). Such complex mixtures may be, for example, cffDNA and maternal cfDNA or, DNA derived from circulating tumor cells (CTCs) or tissue, and DNA derived from healthy circulating cells. In particular, the inventors discovered a new method of isolating and identifying rare nucleic acids in mixed samples employing a novel targeted approach utilizing long synthetic TArget Capture Sequences (TACS) (probes) and novel bioinformatics and discovered that non-random fragmentation patterns exist in regions spanned by these TACS (probes). If fragmentation is random then it would be equally likely to identify DNA fragments with start and/or stop positions at any base position spanned by the TACS (probes). This would lead to uniform-like coverage of fragments' start and/or stop positions across a probe location. Deviations from such coverage illustrate non-random fragmentation positions. In order to identify such deviations, the following was performed: 1. A vector of genomic coordinates of all start and/or stop locations of all fragments that align within a probe-specific region (i.e. one probe) was created;2. A density of start and/or stop location was created from the vector obtained in step 1. (i.e. a plot where the y-axis is frequency of occurrence and x-axis is coordinates spanning a single probe);3. The deviation from uniform coverage from the density created from step 2 was assessed as this implies a non-random fragmentation mechanism. A number of such positions per chromosome were discovered. The mechanism hypothesized to be responsible for the increased frequency of non-random fragmentation positions in certain regions is the protection of the DNA by the nucleosome. That is, deviation from uniform-like coverage may imply a reduced presence of a type of nucleic acid at such positions (e.g. an abundant nucleic acid comprising a complex mixture of nucleic acids) due to the protection conferred by the nucleosomal arrangement, and by extension an increased presence of other types of nucleic acid (e.g. a rare nucleic acid present in a complex mixture of nucleic acids), permitting the detection of regions with increased frequency of non-random fragmentation positions (referred to as hot spots for non-random fragmentation [HSNRF]). HSNRF is hereby termed as a genomic region comprising, at a distance of less than 300 bp, (preferably less than 200 bp, more preferably less than 100 bp), preferred sites differentiating two tissue types present in a mixture of cfDNA, and where said preferred sites are present at a higher frequency in HSNRF regions than in other non-HSNRF regions. Preferred sites are hereby termed as genomic bases at which the frequency of being an end-point of a read is significantly different (p value of at least less than 0.05) between the two tissue types present in a mixture of cfDNA. In one embodiment, deviation from uniform coverage was assessed by quantifying the number of modes in the distribution created from the