US-20260126245-A1 - SYSTEMS AND METHODS FOR ALIGNING SEQUENCES TO PERSONALIZED REFERENCES
Abstract
Techniques for generating a personalized reference sequence construct for an individual to align sequence reads obtained for the individual. The techniques include: obtaining a plurality of sequence reads for an individual; obtaining information identifying a plurality of locations; genotyping the plurality of sequence reads for the plurality of locations to obtain a first set of variants for the individual for at least some of the plurality of locations; identifying a second set of variants associated with the first set of variants; generating a personalized reference sequence construct using the second set of variants; and aligning the plurality of sequence reads to the personalized reference sequence construct.
Inventors
- Yongan Zhao
- Wan-Ping Lee
Assignees
- SEVEN BRIDGES GENOMICS INC.
Dates
- Publication Date
- 20260507
- Application Date
- 20251007
Claims (2)
- 1 . A system, comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: obtaining a plurality of sequence reads for an individual; obtaining information identifying a plurality of locations; genotyping the plurality of sequence reads for the plurality of locations to obtain a first set of variants for the individual for at least some of the plurality of locations; identifying a second set of variants associated with the first set of variants; generating a personalized reference sequence construct using the second set of variants; and aligning the plurality of sequence reads to the personalized reference sequence construct.
- 2 - 20 . (canceled)
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority from U.S. Provisional Patent Application Ser. No. 62/420,585, filed on Nov. 11, 2016, entitled “SYSTEMS AND METHODS FOR ALIGNING SEQUENCES TO PERSONALIZED REFERENCES”, which is hereby incorporated by reference. FIELD Aspects of the technology described herein relates to systems and methods for generating personalized reference constructs and aligning sequence reads to the generated personalized reference constructs. BACKGROUND Advances in sequencing technology, including the development of next generation sequencing methods, have made sequencing an important tool used both in research and in medicine. Some applications of sequencing technology include aligning the sequence reads obtained by sequencing techniques against a reference sequence construct, and identifying the differences, sometimes termed “variants,” between the sequence reads and the reference sequence construct. In turn, the identified differences may be used for diagnostic, therapeutic, research, and/or other purposes. There are different types of reference sequence constructs to which sequence reads may be aligned. For example, sequence reads may be aligned against a linear reference sequence construct such as, for example, the hg19 or hg38 human reference genomes. As another example, sequence reads may be aligned against a reference sequence construct that accounts for one or more known variants at one or more respective locations. One example of such a reference sequence construct is a graph-based reference sequence construct (sometimes referred to herein as a “graph reference”). A graph reference may include a graph (e.g., a directed acyclic graph) through which there may be multiple paths, each of which may represent one or multiple known variants. SUMMARY Some embodiments are directed to a system, comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: obtaining a plurality of sequence reads for an individual; obtaining information identifying a plurality of locations; genotyping the plurality of sequence reads for the plurality of locations to obtain a first set of variants for the individual for at least some of the plurality of locations; identifying a second set of variants associated with the first set of variants; generating a personalized reference sequence construct using the second set of variants; and aligning the plurality of sequence reads to the personalized reference sequence construct. Some embodiments are directed to at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one hardware processor, cause the at least one hardware processor to perform: obtaining a plurality of sequence reads for an individual; obtaining information identifying a plurality of locations; genotyping the plurality of sequence reads for the plurality of locations to obtain a first set of variants for the individual for at least some of the plurality of locations; identifying a second set of variants associated with the first set of variants; generating a personalized reference sequence construct using the second set of variants; and aligning the plurality of sequence reads to the personalized reference sequence construct. Some embodiments are directed to a method, comprising: using at least one hardware processor to perform: obtaining a plurality of sequence reads for an individual; obtaining information identifying a plurality of locations; genotyping the plurality of sequence reads for the plurality of locations to obtain a first set of variants for the individual for at least some of the plurality of locations; identifying a second set of variants associated with the first set of variants; generating a personalized reference sequence construct using the second set of variants; and aligning the plurality of sequence reads to the personalized reference sequence construct. Some embodiments are directed to a system, comprising at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: obtaining a plurality of sequence reads for an individual; obtaining information identifying a plurality of locations and information about variant occurrence at the plurality of locations for each of at least some of a plurality of subpopulations; genotyping the plurality of sequence reads for the plurality of locations; identifying, using results of the genotyping and the information about variant occurrence, at least one subpopulation in the plurality of subpopulations to which the individual likely belongs; generatin