CN-122029293-A - Method for determining arm aneuploidy scores

CN122029293ACN 122029293 ACN122029293 ACN 122029293ACN-122029293-A

Abstract

A method for determining arm aneuploidy scores in a tumor sample genome includes selectively amplifying nucleic acid sequences at specific locations in the tumor genome using a targeting set to generate sequence reads. Next, the genomic positions are divided into segments with homogenous copy numbers based on the logarithmic probability of heterozygous SNPs and the CNV logarithmic ratio of sequence reads. Increased and deleted segments relative to a reference copy number are identified, which segments intersect with corresponding chromosome arms. The cell abundance of these segments is compared to a minimum threshold. The longest segment of the arm that satisfies the minimum cell abundance is retained and the total number of bases thereof is summed. The total number is divided by the number of bases in the arm to give a fraction. If the score meets a minimum threshold, the segment is filtered based on the fold change and an increase or a deletion is determined. Arms judged to be added or missing are counted to produce an arm aneuploidy score.

Inventors

Y.Jin
M. Gupta
S. Sardis

Assignees

生命科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20241023
Priority Date: 20231024

Claims (20)

1. A method for determining arm aneuploidy scores in a tumor sample genome, comprising: selectively amplifying nucleic acid sequences at a target location in the tumor sample genome by a targeting group to generate a plurality of nucleic acid sequence reads; dividing the location of the genome into segments having homogeneous copy numbers using the logarithmic probability of heterozygous Single Nucleotide Polymorphisms (SNPs) and the logarithmic ratio of Copy Number Variation (CNV) determined for the plurality of nucleic acid sequence reads, wherein the heterozygous SNPs are distributed across the genome; Identifying segments exhibiting an increase relative to the reference copy number as increasing segments and segments exhibiting a deletion relative to the reference copy number as missing segments, wherein the respective identified increasing/missing segments have positions intersecting the respective arms of the chromosome; for each of the plurality of arms, Comparing the cell abundance of the add/delete segment to a minimum cell abundance relative to the sample cell abundance of the tumor sample; Summing the number of bases in the longest add/delete segment to yield a total number of bases, wherein the longest add/delete segment has at least the minimum cellular abundance; dividing the total number of bases in the longest add/delete segment by the number of bases in the arm to yield a score; If the score has a value of at least a minimum score threshold, retaining the longest add/miss segment; Filtering the retained longest add/delete segments based on fold change of add/delete; Determining the determined increase or deletion based on the copy number of the amplicon contained in the longest preserved increase/deletion segment, and Arms with a judged increase or deletion were counted to derive an arm aneuploidy score for the tumor sample.
2. The method of claim 1, wherein the step of determining a determined addition or deletion further comprises calculating a p-value based on the copy number of the amplicon contained by the retained longest addition/deletion segment.
3. The method of claim 2, wherein the step of determining a determined increase or decrease further comprises applying a p-value threshold to the p-value to determine an increase or decrease of the arm if the p-value is less than or equal to the p-value threshold, wherein no increase or decrease of the arm is determined if the p-value is greater than the p-value threshold.
4. The method of claim 1, wherein the step of comparing the cell abundance further comprises calculating a ratio of the cell abundance of the add/delete segment to the sample cell abundance.
5. The method of claim 4, further comprising comparing the ratio to a segment cell abundance threshold.
6. The method of claim 5, further comprising if the ratio is less than the segment cell abundance threshold, not using the add/delete segment in further analysis.
7. The method of claim 1, further comprising identifying, for a given arm, a gap between two flanking add/delete segments, wherein the two flanking add/delete segments have the same copy number and the gap has a copy number that is different from the copy number of the two flanking add/delete segments.
8. The method of claim 7, further comprising merging the gap with the two flanking add/delete segments by joining the gap with the two flanking add/delete segments to form a merge segment.
9. The method of claim 8, wherein the step of comparing the cell abundance of the add/delete segment to a minimum cell abundance is applied to the merge segment.
10. The method of claim 1, wherein filtering the retained longest add/miss segment based on an add/miss multiple variation further comprises applying a least multiple variation filter to a given add segment for the given add segment.
11. The method of claim 10, further comprising comparing a multiple change of the given increment segment to a minimum increment threshold.
12. The method of claim 11, further comprising confirming an increase in the given increase section if a multiple change in the given increase section is greater than or equal to the minimum increase threshold.
13. The method of claim 1, wherein filtering the retained longest add/drop zone based on an add/drop multiple variation further comprises applying a maximum multiple variation filter to a given drop zone for the given drop zone.
14. The method of claim 13, further comprising comparing the fold change of the given deletion segment to a maximum deletion threshold.
15. The method of claim 14, further comprising confirming a deletion of the given deletion segment if a multiple change of the given deletion segment is less than or equal to the maximum deletion threshold.
16. A system for determining an arm aneuploidy score for a tumor sample genome, comprising a processor and a memory communicatively connected to the processor, the processor configured to execute instructions that, when executed by the processor, cause the system to perform a method comprising: Receiving a plurality of nucleic acid sequence reads generated by selectively amplifying nucleic acid sequences at a target location in the tumor sample genome with a targeting group; Dividing the location of the genome into segments having homogeneous copy numbers using the logarithmic probability of heterozygous Single Nucleotide Polymorphisms (SNPs) and the logarithmic ratio of Copy Number Variation (CNV) determined for the plurality of nucleic acid sequence reads, wherein the heterozygous SNPs are distributed across the genome; Identifying segments exhibiting an increase relative to the reference copy number as increasing segments and segments exhibiting a deletion relative to the reference copy number as missing segments, wherein the respective identified increasing/missing segments have positions intersecting the respective arms of the chromosome; for each of the plurality of arms, Comparing the cell abundance of the add/delete segment to a minimum cell abundance relative to the sample cell abundance of the tumor sample; summing the number of bases in the longest add/delete segment to yield a total number of bases, wherein the longest add/delete segment has at least a minimum cellular abundance; dividing the total number of bases in the longest add/delete segment by the number of bases in the arm to yield a score; If the score has a value of at least a minimum score threshold, retaining the longest add/miss segment; Filtering the retained longest add/delete segments based on fold change of add/delete; Determining the determined increase or deletion based on the copy number of the amplicon contained in the longest preserved increase/deletion segment, and Arms with a judged increase or deletion were counted to derive an arm aneuploidy score for the tumor sample.
17. The system of claim 16, wherein the step of determining the determined addition or deletion further comprises calculating a p-value based on the copy number of the amplicon contained in the retained longest addition/deletion segment.
18. The system of claim 17, wherein the step of determining a determined increase or decrease further comprises applying a p-value threshold to the p-value to determine an increase or decrease of the arm if the p-value is less than or equal to the p-value threshold, wherein the arm is determined not to be increased or decreased if the p-value is greater than the p-value threshold.
19. A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform a method for determining an arm aneuploidy score for a tumor sample genome, the method comprising: receiving, at the processor, a plurality of sequence reads generated by selectively amplifying nucleic acid sequences at a target location in the tumor sample genome with a target set; Dividing the location of the genome into segments having homogeneous copy numbers using the logarithmic probability of heterozygous Single Nucleotide Polymorphisms (SNPs) and the logarithmic ratio of Copy Number Variation (CNV) determined for the plurality of nucleic acid sequence reads, wherein the heterozygous SNPs are distributed across the genome; Identifying segments exhibiting an increase relative to the reference copy number as increasing segments and segments exhibiting a deletion relative to the reference copy number as missing segments, wherein the respective identified increasing/missing segments have positions intersecting the respective arms of the chromosome; for each of the plurality of arms, Comparing the cell abundance of the add/delete segment to a minimum cell abundance relative to the sample cell abundance of the tumor sample; summing the number of bases in the longest add/delete segment to yield a total number of bases, wherein the longest add/delete segment has at least a minimum cellular abundance; dividing the total number of bases in the longest add/delete segment by the number of bases in the arm to yield a score; If the score has a value of at least a minimum score threshold, retaining the longest add/miss segment; Filtering the retained longest add/delete segments based on fold change of add/delete; Determining the determined increase or deletion based on the copy number of the amplicon contained in the longest preserved increase/deletion segment, and Arms with a judged increase or deletion were counted to derive an arm aneuploidy score for the tumor sample.
20. The non-transitory computer-readable medium of claim 19, wherein the step of determining the determined addition or deletion further comprises calculating a p-value based on the copy number of amplicons contained in the retained longest addition/deletion segment.

Description

Method for determining arm aneuploidy scores Cross Reference to Related Applications The present application claims the benefit of U.S. provisional application number US63/592,732 filed on day 24 of month 10 of 2023 in accordance with 35 u.s.c. ≡119 (e). The entire contents of the foregoing application are incorporated herein by reference. Technical Field The present disclosure relates to methods, systems, and computer readable media for determining arm aneuploidy scores, and more particularly, to methods, systems, and computer readable media for determining arm aneuploidy scores for tumor sample genomes using nucleic acid sequencing data from targeted sequencing groups and Next Generation Sequencing (NGS) technologies. Drawings FIGS. 1A and 1B are block diagrams of an example process for analyzing a sample genome to determine arm aneuploidy scores. FIG. 2 is a schematic diagram of an exemplary system for reconstructing a nucleic acid sequence according to various embodiments. FIG. 3 is an example of a block diagram of an analysis pipeline for signal data obtained from a nucleic acid sequencing instrument. Detailed Description In accordance with the teachings and principles embodied in the present application, novel methods, systems, and non-transitory machine-readable storage media are provided for determining arm aneuploidy scores by analyzing nucleic acid sequence reads from a tumor sample genome. In various embodiments, DNA (deoxyribonucleic acid) may be referred to as a nucleotide chain consisting of 4 types of nucleotides, A (adenine), T (thymine), C (cytosine), and G (guanine), and RNA (ribonucleic acid) consists of 4 types of nucleotides, A, U (uracil), G, and C. Certain nucleotide pairs specifically bind to each other in a complementary manner (known as complementary base pairing). That is, adenine (a) pairs with thymine (T) (however, in the case of RNA, adenine (a) pairs with uracil (U)), and cytosine (C) pairs with guanine (G). When a first nucleic acid strand is bound to a second nucleic acid strand consisting of nucleotides complementary to the nucleotides in the first strand, the two strands bind to form a double strand. In various embodiments, "nucleic acid sequencing data," "nucleic acid sequencing information," "nucleic acid sequence," "genomic sequence," "gene sequence," or "fragment sequence," "nucleic acid sequence read," or "nucleic acid sequencing read" refers to any information or data that indicates the order of nucleotide bases (e.g., adenine, guanine, cytosine, and thymine/uracil) in a DNA or RNA molecule (e.g., whole genome, whole transcriptome, exome, oligonucleotide, polynucleotide, fragment, etc.). It should be understood that the present teachings contemplate sequence information obtained using all available various techniques, platforms or technologies, including, but not limited to, capillary electrophoresis, microarrays, ligation-based systems, polymerase-based systems, hybridization-based systems, direct or indirect nucleotide identification systems, pyrosequencing, ion-or pH-based detection systems, electronic signature-based systems, and the like. "Polynucleotide", "nucleic acid" or "oligonucleotide" refers to a linear polymer of nucleosides (including deoxyribonucleosides, ribonucleosides, or analogs thereof) joined by internucleoside linkages. Typically, a polynucleotide comprises at least three nucleosides. Typically, the size of the oligonucleotide is in the range of a few monomer units, e.g., 3-4 to hundreds of monomer units. Whenever a polynucleotide (e.g., an oligonucleotide) is represented as a series of letters, such as "ATGCCTG," it is understood that the nucleotides are in 5'- >3' order from left to right and "a" represents deoxyadenosine, "C" represents deoxycytidine, "G" represents deoxyguanosine, and "T" represents thymidine, unless otherwise indicated. Letters A, C, G and T may be used to refer to the base itself, the nucleoside, or the nucleotide comprising the base, as is standard in the art. The phrase "next generation sequencing" or NGS refers to sequencing technologies that have increased throughput compared to traditional Sanger (Sanger) and capillary electrophoresis-based methods, e.g., that have the ability to produce hundreds of thousands of relatively small sequence reads at a time. Some examples of next generation sequencing technologies include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization. The phrase "genomic variant (genomic variants or genome variants)" means a single or a set of sequences (in DNA or RNA) that have been altered relative to a particular species or a subset within a particular species due to mutation, recombination/exchange, or genetic drift. Examples of types of genomic variants include, but are not limited to, single Nucleotide Polymorphisms (SNPs), copy Number Variations (CNVs), indels, inversions, and the like. In various embodiments, genomic variants may be detec