US-20260128127-A1 - GENOMIC INFRASTRUCTURE FOR ON-SITE OR CLOUD-BASED DNA AND RNA PROCESSING AND ANALYSIS

US20260128127A1US 20260128127 A1US20260128127 A1US 20260128127A1US-20260128127-A1

Abstract

A system, method and apparatus include one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations for hardware-accelerated execution of a genomic data processing pipeline based on one or more user-selectable options presented via a graphical user interface. The operations include obtaining first data representing a selection of one or more of a plurality of user-selectable options submitted via the graphical user interface. One or more of the plurality of user-selectable options identify a particular genomic data processing pipeline. The operations further include configuring, using the application programming interface executed by the one or more computers, an integrated circuit to perform one or more hardware accelerated steps of a primary, secondary, and/or tertiary processing protocol of the particular genomic data processing pipeline.

Inventors

Pieter Van Rooyen
Robert J. McMillen
Michael Ruehle
Rami Mehio

Assignees

ILLUMINA, INC.

Dates

Publication Date: 20260507
Application Date: 20251106

Claims (20)

1 . A method for hardware-accelerated execution of a genomic data processing pipeline based on one or more user-selectable options presented via a graphical user interface, the method comprising: obtaining, by one or more computers, first data representing a selection of one or more of a plurality of user-selectable options submitted via a graphical user interface, wherein one or more of the plurality of user-selectable options identify a particular genomic data processing pipeline; configuring, using an application programming interface executed by the one or more computers, an integrated circuit to perform one or more hardware accelerated steps of a primary, secondary, and/or tertiary processing protocol of the particular genomic data processing pipeline; obtaining, by the one or more computers, second data representing a set of genomic data or a set of data derived from genomic data; instructing, by the one or more computers, the configured integrated circuit to execute a genomic processing operation of the particular genomic data processing pipeline on the obtained second data to generate result data; and providing, by the one or more computers, output data that is based on the result data, wherein the integrated circuit is formed of a set of hardwired digital logic circuits interconnected by physical electrical interconnects, the set of hardware digital logic circuits arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic units.
2 . The method of claim 1 , wherein the integrated circuit is a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC).
3 . The method of claim 2 , wherein the genomic processing operation executed on the FPGA or ASIC is a mapping operation, an alignment operation, or a variant calling operation.
4 . The method of claim 2 , wherein the integrated circuit is an FPGA, and wherein configuring the integrated circuit includes configuring one or more programmable hardware resources of the FPGA from a first state that does not include hardware resources configured as the particular genomic data processing pipeline to a second state that includes hardware resources configured as the particular genomic data processing pipeline.
5 . The method of claim 1 , wherein the integrated circuit is a graphics processing unit (GPU).
6 . The method of claim 5 , wherein the genomic processing operation executed on the GPU is a sorting operation, a deduplication operation, a compression operation, or a variant calling operation.
7 . The method of claim 6 , wherein the genomic processing operation is parallelized across multiple processing engines of the GPU.
8 . A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations for hardware-accelerated execution of a genomic data processing pipeline based on one or more user-selectable options presented via a graphical user interface, the operations comprising: obtaining, by one or more computers, first data representing a selection of one or more of a plurality of user-selectable options submitted via a graphical user interface, wherein one or more of the plurality of user-selectable options identify a particular genomic data processing pipeline; configuring, using an application programming interface executed by the one or more computers, an integrated circuit to perform one or more hardware accelerated steps of a primary, secondary, and/or tertiary processing protocol of the particular genomic data processing pipeline; obtaining, by the one or more computers, second data representing a set of genomic data or a set of data derived from genomic data; instructing, by the one or more computers, the configured integrated circuit to execute a genomic processing operation of the particular genomic data processing pipeline on the obtained second data to generate result data; and providing, by the one or more computers, output data that is based on the result data, wherein the integrated circuit is formed of a set of hardwired digital logic circuits interconnected by physical electrical interconnects, the set of hardware digital logic circuits arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic units.
9 . The non-transitory computer-readable medium of claim 8 , wherein the integrated circuit is a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC).
10 . The non-transitory computer-readable medium of claim 9 , wherein the genomic processing operation executed on the FPGA or ASIC is a mapping operation, an alignment operation, or a variant calling operation.
11 . The non-transitory computer-readable medium of claim 9 , wherein the integrated circuit is an FPGA, and wherein configuring the integrated circuit includes configuring one or more programmable hardware resources of the FPGA from a first state that does not include hardware resources configured as the particular genomic data processing pipeline to a second state that includes hardware resources configured as the particular genomic data processing pipeline.
12 . The non-transitory computer-readable medium of claim 8 , wherein the integrated circuit is a graphics processing unit (GPU).
13 . The non-transitory computer-readable medium of claim 12 , wherein the genomic processing operation executed on the GPU is a sorting operation, a deduplication operation, a compression operation, or a variant calling operation.
14 . The non-transitory computer-readable medium of claim 13 , wherein the genomic processing operation is parallelized across multiple processing engines of the GPU.
15 . A system, comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations for hardware-accelerated execution of a genomic data processing pipeline based on one or more user-selectable options presented via a graphical user interface, the operations comprising: obtaining, by one or more computers, first data representing a selection of one or more of a plurality of user-selectable options submitted via a graphical user interface, wherein one or more of the plurality of user-selectable options identify a particular genomic data processing pipeline; configuring, using an application programming interface executed by the one or more computers, an integrated circuit to perform one or more hardware accelerated steps of a primary, secondary, and/or tertiary processing protocol of the particular genomic data processing pipeline; obtaining, by the one or more computers, second data representing a set of genomic data or a set of data derived from genomic data; instructing, by the one or more computers, the configured integrated circuit to execute a genomic processing operation of the particular genomic data processing pipeline on the obtained second data to generate result data; and providing, by the one or more computers, output data that is based on the result data, wherein the integrated circuit is formed of a set of hardwired digital logic circuits interconnected by physical electrical interconnects, the set of hardware digital logic circuits arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic units.
16 . The system of claim 15 , wherein the integrated circuit is a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC).
17 . The system of claim 16 , wherein the genomic processing operation executed on the FPGA or ASIC is a mapping operation, an alignment operation, or a variant calling operation.
18 . The system of claim 16 , wherein the integrated circuit is an FPGA, and wherein configuring the integrated circuit includes configuring one or more programmable hardware resources of the FPGA from a first state that does not include hardware resources configured as the particular genomic data processing pipeline to a second state that includes hardware resources configured as the particular genomic data processing pipeline.
19 . The system of claim 15 , wherein the integrated circuit is a graphics processing unit (GPU).
20 . The system of claim 19 , wherein: the genomic processing operation executed on the GPU is a sorting operation, a deduplication operation, a compression operation, or a variant calling operation; and the genomic processing operation is parallelized across multiple processing engines of the GPU.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. non-provisional patent application Ser. No. 17/032,117, filed on Sep. 25, 2020, which is a continuation of U.S. non-provisional patent application Ser. No. 15/404,146, filed on Jan. 11, 2017, now U.S. Pat. No. 10,847,251, issued on Nov. 24, 2020, which is a continuation-in-part of U.S. non-provisional patent application Ser. No. 14/695,010, filed on Apr. 23, 2015, now U.S. Pat. No. 9,576,103, issued on Feb. 21, 2017, which is a continuation of U.S. non-provisional patent application Ser. No. 14/279,063, filed on May 15, 2014, now U.S. Pat. No. 9,679,104, issued on Jun. 13, 2017, which is a continuation-in-part of U.S. non-provisional patent application Ser. No. 14/158,758, filed on Jan. 17, 2014, now U.S. Pat. No. 9,483,610, issued on Nov. 1, 2016, which claims the benefit of U.S. provisional application No. 61/753,775, filed on Jan. 17, 2013, U.S. provisional application No. 61/822,101, filed on May 10, 2013, U.S. provisional application No. 61/823,824, filed on May 15, 2013, U.S. provisional application No. 61/826,381, filed on May 22, 2013, and U.S. provisional application No. 61/910,868, filed on Dec. 2, 2013, the contents of each of which are hereby incorporated by reference in their entireties. U.S. patent application Ser. No. 14/279,063 is also a continuation-in-part of U.S. non-provisional patent application Ser. No. 14/180,248, filed on Feb. 13, 2014, now U.S. Pat. No. 9,014,989, issued on Apr. 21, 2015, which is a continuation of U.S. non-provisional patent application Ser. No. 14/158,758 filed on Jan. 17, 2014, now U.S. Pat. No. 9,483,610, issued on Nov. 1, 2016, the contents of each of which are hereby incorporated by reference in their entireties. U.S. patent application Ser. No. 14/279,063 is also a continuation-in-part of U.S. provisional patent application Ser. No. 14/179,513, filed on Feb. 12, 2014, which is a continuation of U.S. non-provisional patent application Ser. No. 14/158,758, filed on Jan. 17, 2014, now U.S. Pat. No. 9,483,610, issued on Nov. 1, 2016, the contents of each of which are hereby incorporated by reference in their entireties. U.S. patent application Ser. No. 14/279,063 also claims the benefit of provisional application No. 61/823,824, filed on May 15, 2013, U.S. provisional application No. 61/826,381, filed on May 22, 2013, U.S. provisional application No. 61/910,868, filed on Dec. 2, 2013, U.S. provisional application No. 61/943,870, filed on Feb. 24, 2014, U.S. provisional application No. 61/984,663, filed on Apr. 25, 2014, and U.S. provisional application No. 61/988,128, filed on May 2, 2014, the contents of each of which are hereby incorporated by reference in their entireties. U.S. non-provisional patent application Ser. No. 15/404,146, filed on Jan. 11, 2017, also claims the benefit of U.S. provisional application No. 62/277,445, filed on Jan. 11, 2016, which is hereby incorporated herein by reference in its entirety. TECHNICAL FIELD The subject matter described herein relates to bioinformatics, and more particularly to systems, apparatuses, and methods for implementing bioinformatic protocols, such as performing one or more functions for analyzing genomic data on an integrated circuit, such as on a hardware processing platform. BACKGROUND A goal for health care researchers and practitioners is to improve the safety, quality, and effectiveness of health care for every patient. Personalized health care is directed to achieving these goals on an individual level. For instance, “genomics” and/or “bioinformatics” are fields of study that aim to facilitate the safety, the quality, and the effectiveness of prophylactic and therapeutic treatments on a personalized, individual level. Accordingly, by employing genomics and/or bioinformatics techniques, the identity of an individual's genetic makeup, e.g., his or hers genes, may be determined and that knowledge may be used in the development of therapeutic and/or prophylactic regimens, including drug treatments, that are personalized to the individual, thus, enabling medicine to be tailored to meet each person's individual needs. The desire to provide personalized care to individuals is transforming the health care system. This transformation of the health care system is likely to be powered by breakthrough innovations at the intersection of medical science and information technology such as is represented by the fields of genomics and bioinformatics. Accordingly, genomics and bioinformatics are key foundations upon which this future will be built. Science has evolved dramatically since the first human genome was fully sequenced in 2000 at a total cost of over $1Billion. Today, we are on the verge of high resolution sequencing at a cost of less than $1K per genome, making it economically feasible for the first time to move out of the research lab and into widespread adoption for medical care. Genomic data, therefore, may become a vital input