Search

EP-4158636-B1 - MACHINE LEARNING-BASED ANALYSIS OF PROCESS INDICATORS TO PREDICT SAMPLE REEVALUATION SUCCESS

EP4158636B1EP 4158636 B1EP4158636 B1EP 4158636B1EP-4158636-B1

Inventors

  • GIETZEN, Kimberly Jean
  • REZAEI, Naghmeh
  • FILIPE CRUZ, PEDRO MIGUEL

Dates

Publication Date
20260513
Application Date
20210528

Claims (18)

  1. A non-transitory computer readable storage medium impressed with computer program instructions to score whether to reevaluate a sample after one or more inconclusive sample evaluation runs, the instructions, when executed on a processor, implement a method comprising: scoring, from the one or more sample evaluation runs that produced inconclusive results, using a classifier trained, a combination of: one or more call rates indicating a percentage of sample locations with a quality score above a threshold, and a plurality of readouts of radiant signals from process probes during at least a first, pre-hybridization stage of the sample evaluation run in which sample DNA is in a liquid form, a second stage of the sample evaluation run in which sample DNA is hybridized to an image-generating chip, and a third stage of the sample evaluation run in which probe DNA is extended and the extension is labeled with a fluorescent label, wherein each process probe is configured to produce the radiant signals indicative of one or more processing conditions during the sample evaluation run; generating, from the trained classifier, at least a retry success confidence score indicative of whether a further sample evaluation run of the sample will produce a conclusive result; and reporting at least the retry success confidence score to an operator to evaluate when determining whether to conduct an additional sample evaluation run of the sample.
  2. The non-transitory computer readable storage medium of claim 1, wherein the readouts are from types of process probes that include: the readouts from a first, pre-hybridization stage of the sample evaluation run in which sample DNA is in a liquid form, the readouts of one or more probes selected from a group including: a plurality of first readouts from process probes that respond to non-human bacterial DNA not present in human DNA and produce signals indicative of contamination of the sample by non-human bacterial DNA, and a plurality of second readouts from four complementary process probes that respond to extensions of target bases at non-polymorphic sites of common human sample sequences, and produce radiant signals indicative of good extensions for each of four complementary extension reagents, and combinations thereof; the readouts from a second stage of the sample evaluation run in which sample DNA is hybridized to an image-generating chip, the readouts of one or more probes selected from a group including: a plurality of third readouts from process probes that respond to a common sequence known as wild-type allele in a human sample and produce high radiant signals indicative of good sample composition and binding conditions, a plurality of fourth readouts from process probes that include mismatched complementary bases, which respond to a common sequence in a human sample by binding weakly, resulting in separation of the common sequence from the probes with the mismatched complementary bases, and producing approximately background level radiant signals, and a plurality of fifth readouts from process probes that respond to synthetic sequences mixed with reagent in high, medium, and low concentration levels by respectively producing high, medium, and low radiant signals indicative of good reagent delivery, and combinations thereof; and the readouts from a third stage of the sample evaluation run in which probe DNA is extended and the extension is labeled with a fluorescent label, the readouts from one or more probes selected from a group including: a plurality of sixth readouts from process probes that include a hairpin complementary sequence that respond to chemicals mixed in reagent to perform single-base extensions to produce radiant signals indicative of good conditions for single base extensions, a plurality of seventh readouts from process probes engineered to block extensions on a 3' end of probe sequences, such that synthetic targets mixed in reagent and extensions of the synthetic targets are removed after extension and staining, and to produce low radiant signals indicative of good conditions for target removal, and a plurality of eighth readouts from process probes covered with chemicals that bind fluorescent labels mixed in reagent to produce high radiant signals indicative of a good quality staining process, and combinations thereof.
  3. A system including one or more processors coupled to memory, the memory loaded with the computer program instructions of claim 1 to score whether to reevaluate the sample after one or more inconclusive sample evaluation runs.
  4. The system of claim 3 wherein the readouts are from types of process probes that include: the readouts from a first, pre-hybridization stage of the sample evaluation run in which sample DNA is in a liquid form, the readouts of one or more probes selected from a group including: a plurality of first readouts from process probes that respond to non-human bacterial DNA not present in human DNA and produce signals indicative of contamination of the sample by non-human bacterial DNA, and a plurality of second readouts from four complementary process probes that respond to extensions of target bases at non-polymorphic sites of common human sample sequences, and produce radiant signals indicative of good extensions for each of four complementary extension reagents, and combinations thereof; the readouts from a second stage of the sample evaluation run in which sample DNA is hybridized to an image-generating chip, the readouts of one or more probes selected from a group including: a plurality of third readouts from process probes that respond to a common sequence known as wild-type allele in a human sample and produce high radiant signals indicative of good sample composition and binding conditions, a plurality of fourth readouts from process probes that include mismatched complementary bases, which respond to a common sequence in a human sample by binding weakly, resulting in separation of the common sequence from the probes with the mismatched complementary bases, and producing approximately background level radiant signals, and a plurality of fifth readouts from process probes that respond to synthetic sequences mixed with reagent in high, medium, and low concentration levels by respectively producing high, medium, and low radiant signals indicative of good reagent delivery, and combinations thereof; and the readouts from a third stage of the sample evaluation run in which probe DNA is extended and the extension is labeled with a fluorescent label, the readouts from one or more probes selected from a group including: a plurality of sixth readouts from process probes that include a hairpin complementary sequence that responds to chemicals mixed in reagent to perform single-base extensions to produce radiant signals indicative of good conditions for single base extensions, a plurality of seventh readouts from process probes engineered to block extensions on a 3' end of probe sequences, such that synthetic targets mixed in reagent and extensions of the synthetic targets are removed after extension and staining, and to produce low radiant signals indicative of good conditions for target removal, and a plurality of eighth readouts from process probes covered with chemicals that bind fluorescent labels mixed in reagent to produce high radiant signals indicative of a good quality staining process, and combinations thereof.
  5. The system of claim 3, wherein one of the process probes from the first stage responds to non-human contamination not present in human samples and produces radiant signals indicative of contamination of the sample by non-human sequences.
  6. The system of claim 3, wherein the probes from the first stage include four complementary process probes that respond to extensions of target bases at non-polymorphic sites of common human sample sequences, and produce radiant signals indicative of good extensions for each of four complementary extension reagents.
  7. The system of claim 3, wherein one of the process probes from the second stage responds to a common sequence known as wild-type allele in a human sample and produce high radiant signals indicative of good sample composition and binding conditions.
  8. The system of claim 3, wherein one of the process probes from the second stage includes mismatched complementary bases, which respond to a common sequence in a human sample by binding weakly, resulting in separation of the common sequence from the probes with the mismatched complementary bases, and producing approximately background level radiant signals.
  9. The system of claim 3, wherein one of the process probes from the second stage responds to synthetic sequences mixed with reagent in high, medium, and low concentration levels by respectively producing high, medium, and low radiant signals indicative of good reagent delivery.
  10. The system of claim 3, wherein one of the process probes from the third stage includes a hairpin complementary sequence that responds to chemicals mixed in reagent to perform single-base extensions to produce radiant signals indicative of good conditions for single base extensions.
  11. The system of claim 3, wherein one of the process probes from the third stage is engineered to block extensions on a 3' end of probe sequences, such that synthetic targets mixed in reagent and extensions of the synthetic targets are removed after extension and staining, and to produce low radiant signals indicative of good conditions for target removal.
  12. The system of claim 3, wherein one of the process probes from the third stage is covered with chemicals that bind fluorescent labels mixed in reagent to produce high radiant signals indicative of a good quality staining process.
  13. A method of scoring whether to reevaluate a sample after one or more inconclusive sample evaluation runs, including: scoring, from the one or more sample evaluation runs that produced inconclusive results, using a classifier trained, a combination of: one or more call rates indicating a percentage of sample locations with a quality score above a threshold, and a plurality of readouts of radiant signals from process probes during at least a first, pre-hybridization stage of the sample evaluation run in which sample DNA is in a liquid form, a second stage of the sample evaluation run in which sample DNA is hybridized to an image-generating chip, and a third stage of the sample evaluation run in which probe DNA is extended and the extension is labeled with a fluorescent label, wherein each process probe is configured to produce the radiant signals indicative of one or more processing conditions during the sample evaluation run; generating, from the trained classifier, at least a retry success confidence score indicative of whether a further sample evaluation run of the sample will produce a conclusive result; and reporting at least the retry success confidence score to an operator to evaluate when determining whether to conduct an additional sample evaluation run of the sample.
  14. The method of claim 13, wherein the readouts are from types of process probes that include: the readouts from a first, pre-hybridization stage of the sample evaluation run in which sample DNA is in a liquid form, the readouts of one or more probes selected from a group including: a plurality of first readouts from process probes that respond to non-human bacterial DNA not present in human DNA and produce signals indicative of contamination of the sample by non-human bacterial DNA, and a plurality of second readouts from four complementary process probes that respond to extensions of target bases at non-polymorphic sites of common human sample sequences, and produce radiant signals indicative of good extensions for each of four complementary extension reagents, and combinations thereof; the readouts from a second stage of the sample evaluation run in which sample DNA is hybridized to an image-generating chip, the readouts of one or more probes selected from a group including: a plurality of third readouts from process probes that respond to a common sequence known as wild-type allele in a human sample and produce high radiant signals indicative of good sample composition and binding conditions, a plurality of fourth readouts from process probes that include mismatched complementary bases, which respond to a common sequence in a human sample by binding weakly, resulting in separation of the common sequence from the probes with the mismatched complementary bases, and producing approximately background level radiant signals, and a plurality of fifth readouts from process probes that respond to synthetic sequences mixed with reagent in high, medium, and low concentration levels by respectively producing high, medium, and low radiant signals indicative of good reagent delivery, and combinations thereof; and the readouts from a third stage of the sample evaluation run in which probe DNA is extended and the extension is labeled with a fluorescent label, the readouts from one or more probes selected from a group including: a plurality of sixth readouts from process probes that include a hairpin complementary sequence that responds to chemicals mixed in reagent to perform single-base extensions to produce radiant signals indicative of good conditions for single base extensions, a plurality of seventh readouts from process probes engineered to block extensions on a 3' end of probe sequences, such that synthetic targets mixed in reagent and extensions of the synthetic targets are removed after extension and staining, and to produce low radiant signals indicative of good conditions for target removal, and a plurality of eighth readouts from process probes covered with chemicals that bind fluorescent labels mixed in reagent to produce high radiant signals indicative of a good quality staining process, and combinations thereof.
  15. A method of scoring whether to reevaluate a sample after one or more inconclusive sample evaluation runs, including: assembling a training set of sample evaluations, each including one or more inconclusive sample evaluation runs that produced inconclusive results for samples, followed by an additional sample evaluation run, wherein training data for each of the inconclusive sample evaluation runs includes one or more call rates indicating a percentage of sample locations with a quality score above a threshold and a plurality of readouts of radiant signals from process probes during at least a first, pre-hybridization stage of the sample evaluation run in which sample DNA is in a liquid form, a second stage of the sample evaluation run in which sample DNA is hybridized to an image-generating chip, and a third stage of the sample evaluation run in which probe DNA is extended and the extension is labeled with a fluorescent label, wherein each process probe is configured to produce the radiant signals indicative of one or more processing conditions during the sample evaluation run; wherein training data for each of the additional sample evaluation runs includes one or more ground truth indicators of a conclusive or inconclusive result; training a classifier, using the training data, to score whether an additional sample evaluation run for a particular sample is likely to produce the conclusive result; and saving parameters of the trained classifier for use determining whether to reevaluate production samples after one or more inconclusive sample evaluation runs.
  16. The method of claim 15 wherein the plurality of readouts are from types of process probes that include: the readouts from a first, pre-hybridization stage of the sample evaluation run in which sample DNA is in a liquid form, the readouts of one or more probes selected from a group including: a plurality of first readouts from process probes that respond to non-human bacterial DNA not present in human DNA and produce signals indicative of contamination of the sample by non-human bacterial DNA, and a plurality of second readouts from four complementary process probes that respond to extensions of target bases at non-polymorphic sites of common human sample sequences, and produce radiant signals indicative of good extensions for each of four complementary extension reagents, and combinations thereof; the readouts from a second stage of the sample evaluation run in which sample DNA is hybridized to an image-generating chip, the readouts of one or more probes selected from a group including: a plurality of third readouts from process probes that respond to a common sequence known as wild-type allele in a human sample and produce high radiant signals indicative of good sample composition and binding conditions, a plurality of fourth readouts from process probes that include mismatched complementary bases, which respond to a common sequence in a human sample by binding weakly, resulting in separation of the common sequence from the probes with the mismatched complementary bases, and producing approximately background level radiant signals, and a plurality of fifth readouts from process probes that respond to synthetic sequences mixed with reagent in high, medium, and low concentration levels by respectively producing high, medium, and low radiant signals indicative of good reagent delivery, and combinations thereof; and the readouts from a third stage of the sample evaluation run in which probe DNA is extended and the extension is labeled with a fluorescent label, the readouts from one or more probes selected from a group including: a plurality of sixth readouts from process probes that include a hairpin complementary sequence that responds to chemicals mixed in reagent to perform single-base extensions to produce radiant signals indicative of good conditions for single base extensions, a plurality of seventh readouts from process probes engineered to block extensions on a 3' end of probe sequences, such that synthetic targets mixed in reagent and extensions of the synthetic targets are removed after extension and staining, and to produce low radiant signals indicative of good conditions for target removal, and a plurality of eighth readouts from process probes covered with chemicals that bind fluorescent labels mixed in reagent to produce high radiant signals indicative of a good quality staining process, and combinations thereof.
  17. A non-transitory computer readable storage medium impressed with computer program instructions to train a classifier to score whether to reevaluate a sample after one or more inconclusive sample evaluation runs, the instructions, when executed on a processor, implement a method according to claim 15 or 16.
  18. A system including one or more processors coupled to memory, the memory loaded with computer program instructions of claim 17 to train a classifier to score whether to reevaluate the sample after one or more inconclusive sample evaluation runs.

Description

PRIORITY APPLICATION This application claims the benefit of U.S. Application 17/332,904, entitled "MACHINE LEARNING-BASED ANALYSIS OF PROCESS INDICATORS TO PREDICT SAMPLE REEVALUATION SUCCESS," filed May 27, 2021 (Attorney Docket No. ILLM 1027-2/IP-1973-US) which claims the benefit of U.S. Provisional Patent Application No.: 63/032,083, entitled "MACHINE LEARNING-BASED ANALYSIS OF PROCESS INDICATORS TO PREDICT SAMPLE REEVALUATION SUCCESS," filed May 29, 2020 (Attorney Docket No. ILLM 1027-1/IP-1973-PRV). FIELD OF THE TECHNOLOGY DISCLOSED The technology disclosed relates to evaluation of readouts from process controls for production rerun decisions. BACKGROUND The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the claimed technology. Genotyping is a process that can take multiple days to complete. The process is vulnerable to process and sample errors. Collected samples for genotyping are extracted and distributed in sections and areas of image-generating chips. The samples are then chemically processed through multiple steps to generate fluorescing images. The process generates a quality score for each section analyzed. This quality cannot provide insight into the root cause of failure of a low-quality process. The document Huang et al. BMC Bioinformatics 2004, 5:36 teaches a method of quality control of a genotyping assay using signal clustering and neural networks. Accordingly, an opportunity arises to introduce new methods and systems to evaluate quality score and other outputs from the genotyping process to determine the root cause of failure. BRIEF DESCRIPTION OF THE DRAWINGS In the drawings, like reference characters generally refer to like parts throughout the different views. Also, the drawings are not necessarily to scale, with an emphasis instead generally being placed upon illustrating the principles of the technology disclosed. In the following description, various implementations of the technology disclosed are described with reference to the following drawings, in which: FIG. 1 shows an architectural level schematic of a system in which readouts from process control probes and call rates for one or more genotyping sample evaluation runs are scored to predict further sample reevaluation.FIG. 2 illustrates subsystem components of the feature generator of FIG. 1.FIG. 3 presents process steps for an example genotyping process.FIG. 4 presents images of sections of an image-generating chip after successful completion of a production run.FIG. 5 is an example illustrating process probes positioned on a section of an image-generating chip.FIG. 6 is an example grouping of process probes into sample-independent and sample-dependent process probes.FIGs. 7A-1and 7A-2present examples of staining process probes and signal intensities from staining process probes.FIGs. 7B-1and 7B-2present examples of extension process probes and signal intensities from extension process probes.FIGs. 7C-1and 7C-2present examples of target removal process probes and signal intensities from target removal process probes.FIGs. 7D-1and 7D-2present examples of hybridization process probes and signal intensities from hybridization process probes.FIGs. 7E-1and 7E-2present examples of stringency process probes and signal intensities from stringency process probes.FIGs. 7F-1and 7F-2present examples of non-polymorphic process probes and signal intensities from non-polymorphic process probes.FIG. 7G presents signal intensities from non-specific binding process probes.FIG. 8 is another example of grouping process probes according to different stages of the genotyping process.FIG. 9 illustrates the training of a retry classifier using labeled training data comprising call rates and readouts from production runs.FIG. 10 illustrates a process in which call rates and readout from process probes from inconclusive production runs are given as input to a retry classifier to generate a retry success confidence score indicative of whether a sample reevaluation will produce a conclusive result.FIG. 11 is an example convolutional neural network to generate a retry success confidence score.FIG. 12 is a block diagram illustrating the training of the convolutional neural network of FIG. 11.FIG. 13 is a simplified block diagram of a computer system that can be used to implement the technology disclosed. DETAILED DESCRIPTION The following discussion is presented to enable any person skilled in the art to make and use the technology disclosed, and is provided in the context of a particular application and its requirements. The scope of protection is def