Search

EP-4740211-A1 - ROBUST QUANTIFICATION OF CIRCULATING TUMOUR DNA THROUGH FRAGMENT LENGTH ANALYSIS

EP4740211A1EP 4740211 A1EP4740211 A1EP 4740211A1EP-4740211-A1

Abstract

Systems and methods for quantifying circulating tumour DNA (ctDNA) in a plasma sample by performing fragment length analysis of the sequencing data using a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample. Wherein the ctDNA quantification model comprises a neural network based system trained using a training dataset generated using control cfDNA data augmented with ctDNA data obtained from known cancer samples to generate artificial training dataset records with a known ctDNA metric.

Inventors

  • SKANDERUP, Anders Martin Jacobsen
  • ZHU, Guanhua
  • RAHMAN, Chowdhury Rafeed
  • IAN, Tan Bee Huat

Assignees

  • Agency for Science, Technology and Research
  • Singapore Health Services Pte Ltd

Dates

Publication Date
20260513
Application Date
20240704

Claims (20)

  1. 1. A system for quantifying circulating tumour DNA (ctDNA) in a plasma sample, the system comprising at least one processor configured to: receive low-pass cell free DNA (cfDNA) sequencing data obtained by processing the plasma sample using a sequencing platform; and perform fragment length analysis on the sequencing data using a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample; wherein the ctDNA quantification model comprises a machine learning integrative model trained using a training dataset generated using cfDNA data augmented with cfDNA data obtained from known cancer and healthy samples to generate artificial training dataset records with a known ctDNA metric.
  2. 2. The system of claim 1, wherein the ctDNA quantification model comprises a low ctDNA burden (LT) model and a high ctDNA burden (HT) model; the LT model being trained to estimate a ctDNA metric for low tumour burden samples; and the HT model being trained to estimate a ctDNA metric for high tumour burden samples.
  3. 3. The system of claim 2, wherein the records with a tumour burden greater than or equal to 3% are considered as records with a high tumour burden; and wherein the records with a tumour burden less than 3% are considered as records with a low tumour burden.
  4. 4. The system of claim 1, wherein the one processor is further configured to: allocate the cfDNA sequencing data into a plurality of whole genome data bins; generate a histogram for each of the plurality of data bins; and process the generated histograms by a local branch of the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  5. 5. The system of claim 4, wherein the at least one processor is further configured to: process the cfDNA sequencing data to generate a global fragment length histogram; and process the global fragment length histogram using a second branch of the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  6. 6. The system of claim 5, wherein the at least one processor is further configured to: concatenate output of the first branch and second branch of the ctDNA quantification model to obtain a concatenated intermediate output; and process the concatenated intermediate output by a block of the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  7. 7. The system of claim 1, wherein at least one processor is further configured to: process a subset of the cfDNA sequencing data relating to cancer specific transcription start sites (TSS) and surrounding regions of the genome to generate a cancer fragment length histogram; process a subset of the cfDNA sequencing data relating to blood specific transcription start sites (TSS) and surrounding regions of the genome to generate a blood fragment length histogram; and process the cancer fragment length histogram and the blood fragment length histogram by the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  8. 8. The system of claim 1, wherein the received cfDNA sequencing data is obtained by sequencing the plasma sample to a depth of coverage of 0.05x or greater.
  9. 9. A system for quantifying circulating tumour DNA (ctDNA) in a plasma sample, the system comprising at least one processor configured to: receive targeted sequencing data obtained by processing the plasma sample by a sequencing platform, the targeted sequencing data comprising a fragment length histogram feature of an off-target portion of the targeted sequencing data; and feeding the fragment length histogram feature to a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample; wherein the ctDNA quantification model comprises a neural network integrative model trained using a training dataset comprising labelled sequencing data of off-target read associated fragments originating from healthy and cancer afflicted individuals.
  10. 10. The system of claim 9, wherein the received targeted sequencing data is obtained by sequencing the plasma sample to a depth of coverage of 60x or greater.
  11. 11. The system of claim 9 or 10, wherein the targeted sequencing data comprises off-target reads.
  12. 12. A method for quantifying circulating tumour DNA (ctDNA) in a plasma sample, the method comprising: processing the plasma sample by a sequencing platform to obtain low- pass cell free DNA (cfDNA) sequencing data; and performing fragment length analysis of the sequencing data using a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample; wherein the ctDNA quantification model comprises a machine learning integrative model trained using a training dataset generated using cfDNA data augmented with ctDNA data obtained from known cancer samples to generate artificial training dataset records with a known ctDNA metric.
  13. 13. The method of claim 12, wherein the ctDNA quantification model comprises a low ctDNA (LT) model and a high ctDNA (HT) model; the LT model being trained to estimate a ctDNA metric using records corresponding to low tumour burden; and the HT model being trained to estimate a ctDNA metric using records corresponding to high tumour burden.
  14. 14. The method of claim 13, wherein the records with a tumour burden greater than or equal to 3% are considered as records with a high tumour burden; and wherein the records with a tumour burden less than 3% are considered as records with a low tumour burden.
  15. 15. The method of claim 12, wherein the method further comprises: allocating the cfDNA sequencing data into a plurality of whole genome data bins; generating a histogram of each of the plurality of data bins; and processing the generated histograms by a first branch of the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  16. 16. The method of claim 15, wherein the method further comprises: processing the cfDNA sequencing data to generate a global fragment length histogram; and processing the global fragment length histogram using a second branch of the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  17. 17. The method of claim 15, wherein the method further comprises: concatenating output of the first branch and second branch of the ctDNA quantification model to obtain a concatenated intermediate output; and processing the concatenated intermediate output by a block of the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  18. 18. The method of claim 12, wherein the method further comprises: processing a subset of the cfDNA sequencing data relating to cancer specific transcription start sites (TSS) and surrounding regions of the genome to generate a cancer fragment length histogram; processing a subset of the cfDNA sequencing data relating to blood specific transcription start sites (TSS) and surrounding regions of the genome to generate a blood fragment length histogram; and processing the cancer fragment length histogram and the blood fragment length histogram by the ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample.
  19. 19. The method of claim 12, wherein the plasma sample is sequenced to a depth of coverage of 0.05X or greater.
  20. 20. A method for quantifying circulating tumour DNA (ctDNA) in a plasma sample, the method comprising: receiving targeted sequencing data obtained by processing the plasma sample by a sequencing platform, the targeted sequencing data comprising a fragment length histogram feature of an off-target portion of the targeted sequencing data; and feeding the fragment length histogram feature to a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample; wherein the ctDNA quantification model comprises a neural network integrative model trained using a training dataset comprising labelled sequencing data of off-target read associated fragments originating from healthy and cancer afflicted individuals.

Description

ROBUST QUANTIFICATION OF CIRCULATING TUMOUR DNA THROUGH FRAGMENT LENGTH ANALYSIS Technical Field [0001] This disclosure generally relates to methods and systems for analysis of cell free DNA sequencing data from blood plasma. Background [0002] This background description is provided for the purpose of generally presenting the context of the disclosure. Contents of this background section are neither expressly nor impliedly admitted as prior art against the present disclosure. [0003] The death of non-malignant cells, primarily of the hematopoietic lineage, releases cell-free DNA (cfDNA) into the blood circulation. In cancer patients, the blood plasma also carries circulating tumour DNA (ctDNA), enabling non-invasive diagnostics and disease surveillance. The ability to monitor tumour growth dynamics based on ctDNA levels in the blood provides a promising non-invasive approach to track disease progression during therapy and clinical trials. [0004] Ultra-deep targeted cfDNA sequencing assays are often preferred in the clinic due to their ability to identify actionable mutations. While mutation variant allele frequencies (VAFs) can be used to approximate ctDNA levels, not all tumours will have mutations covered by a given targeted sequencing gene panel. Furthermore, the accuracy of this approximation depends on sample-specific and treatment-dynamic properties such as mutation clonality, copy number, as well as potential confounding noise from clonal hematopoiesis. [0005] Existing methods developed for ctDNA quantification are not directly compatible with targeted sequencing panels. These methods require either low-pass whole genome sequencing (IpWGS) data, DNA methylation profiling, or modifications to the targeted sequencing panel. Thus, there is an unmet need to develop accurate and orthogonal approaches for ctDNA quantification that can generalize across patients, tumour types, and sequencing modalities. [0006] The fragment length distribution of cfDNA in plasma has a mode of ~166 base pairs (bp) as nucleosome-bound cfDNA molecules display increased protection from DNA degradation. cfDNA fragments from cancer patients tend to be shorter and more variably sized than those from healthy individuals. These observations have motivated studies exploring how cfDNA fragment length properties can be used to classify cfDNA samples from cancer patients and healthy individuals. [0007] It is challenging to quantify circulating tumour DNA (ctDNA) in blood plasma using cell-free DNA (cfDNA) sequencing. Currently there are limited solutions tailored for measuring ctDNA fraction. Thus, there is an unmet need to develop approaches for quantification of ctDNA. Summary [0008] Disclosed is a system for quantifying circulating tumour DNA (ctDNA) in a plasma sample, the system comprising at least one processor configured to: receive low-pass cell free DNA (cfDNA) sequencing data obtained by processing the plasma sample using a sequencing platform; and perform fragment length analysis on the sequencing data using a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample; wherein the ctDNA quantification model comprises a machine learning integrative model trained using a training dataset generated using cfDNA data augmented with cfDNA data obtained from known cancer and healthy samples to generate artificial training dataset records with a known ctDNA metric. [0009] Also disclosed is a system for quantifying circulating tumour DNA (ctDNA) in a plasma sample, the system comprising at least one processor configured to: receive targeted sequencing data obtained by processing the plasma sample by a sequencing platform, the targeted sequencing data comprising a fragment length histogram feature of an off-target portion of the targeted sequencing data; and feed the fragment length histogram feature to a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample; wherein the ctDNA quantification model comprises a neural network integrative model trained using a training dataset comprising labelled sequencing data of off- target read associated fragments originating from healthy and cancer afflicted individuals. [OO1O] As used herein, "off-target read" refers to those reads that are obtained during high-throughput targeted sequencing (such as next-generation sequencing) and are not aligned with the intended target region or gene. They spread throughout the whole genome and provides a similar effect as shallow whole genome sequencing [0011] Also disclosed is a method for quantifying circulating tumour DNA (ctDNA) in a plasma sample, the method comprising: processing the plasma sample by a sequencing platform to obtain low-pass cell free DNA (cfDNA) sequencing data; and performing fragment length analysis of the sequencing data using a ctDNA quantification model to estimate a ctDNA metric of the blood plasma sample; wherein the ctDNA quantification model comprises a machine learning integrati