EP-3874513-B1 - GENERALIZED BIOMARKER MODEL
Inventors
- BIRNBAUM, Benjamin E.
- Ambwani, Geetu
Dates
- Publication Date
- 20260513
- Application Date
- 20191029
Claims (20)
- A model-assisted system (130) to determine whether an individual from a population is a candidate of a cohort based on whether the individual exceeds a first likelihood threshold of having been tested for a biomarker, the system comprising: at least one processor programmed to: access (510) a database (132) from which patient medical information including biomarkers associated with a population of individuals can be derived; provide (520), to a generalized biomarker model (330), a first biomarker associated with a cohort, the generalized biomarker model (330) being trained based on one or more second biomarkers (311) using the patient medical information, wherein the first biomarker is different from the one or more second biomarkers (311) and the model training is done based in that medical records (312) describe or discuss biomarkers (311) in similar ways, the generalized biomarker model (310) being trained to identify patients that have been tested for biomarkers other than the one or more second biomarkers (311); apply the generalized biomarker model (330) in relation to the first biomarker to the patient medical information associated with the population of individuals; receive (530), from the generalized biomarker model (330), a first output (350) indicating a first group of the population of individuals exceeding a first likelihood threshold of having been tested for the first biomarker; and determine (540), based on the first output, whether an individual from among the first group of the population of individuals is a candidate for the cohort, wherein candidacy in the cohort enables identification of candidates for one or more of a particular treatment and a particular clinical trial.
- The model-assisted system (130) of claim 1, wherein the patient medical information comprises medical records (210) associated with the population of individuals.
- The model-assisted system (130) of claim 2, wherein the medical records (210) include structured information (212) and unstructured information (211) associated with the population of individuals.
- The model-assisted system (130) of claim 3, wherein the unstructured information (211) includes text written by a health care provider, a radiology report, or a pathology report.
- The model-assisted system (130) of claim 4, wherein the generalized biomarker model is trained based on the unstructured information (211), preferably wherein at least a portion of the unstructured information (211) has been subject to an optical character recognition process.
- The model-assisted system (130) of claim 1, wherein determining whether the individual is a candidate for the cohort comprises verifying, based on a medical record associated with the individual, that the individual has been tested for the biomarker.
- The model-assisted system (130) of claim 1, wherein the at least one processor is further programmed to: receive, from the generalized biomarker model (330), a second output indicating a second group of the population of individuals exceeding a second likelihood threshold of having been tested positive for the first biomarker, the individual being included in the second group , preferably wherein determining whether the individual is a candidate for the cohort comprises verifying, based on a medical record associated with the individual, that the individual has tested positive for the biomarker.
- The model-assisted system (130) of claim 1, wherein the at least one processor is further programmed to store the first output (350) for access by a user of the generalized biomarker model (330).
- The model-assisted selection system (130) of claim 1, wherein the generalized biomarker model (330) generates the first output using a binary classification algorithm, preferably wherein the binary classification algorithm includes at least one of a logistic regression (416), a random forest, gradient boosted trees, support vector machines, or neural networks.
- The model-assisted system (130) of claim 1, wherein the generalized biomarker model (330) is developed at least in part based on feature vectors (414) extracted from the information based on the one or more second biomarkers, preferably wherein the feature vectors (414) comprise at least one biomarker token representing text associated with the at least one second biomarker.
- The model-assisted selection system (130) of claim 1, wherein the one or more second biomarkers appear in the information more than the first biomarker.
- The model-assisted system of (130) claim 1, wherein the at least one processor is further programmed to: provide the first biomarker to a biomarker specific model, the biomarker specific model being trained based on the first biomarker using the information; receive, from the biomarker specific model, a third output indicating a third group of the population of individuals exceeding a likelihood threshold of having been tested for the at least one biomarker; and verify the accuracy of the generalized biomarker model (330) by comparing the first output to the third output.
- The model-assisted system (130) of claim 1, wherein the at least one processor is further programmed to: search the information for the first biomarker to generate a fourth output indicating a fourth group of the population of individuals having been tested for the at least one biomarker; and verify the accuracy of the generalized biomarker model by comparing the first output to the fourth output.
- A computer-implemented method for determining whether an individual from a population is a candidate of a cohort based on whether the individual exceeds a first likelihood threshold of having been tested for a biomarker, the method comprising: accessing (510) a database (132) from which patient medical information including biomarkers associated with a population of individuals can be derived; providing (520), to a generalized biomarker model (330), a first biomarker associated with a cohort, the generalized biomarker model (330) being trained based on one or more second biomarkers (311) using the patient medical information, wherein the first biomarker is different from the one or more second biomarkers (311), wherein the model training is done based in that medical records (312) describe or discuss biomarkers (311) in similar ways, the generalized biomarker model (330) being trained to identify patients that have been tested for biomarkers other than the one or more second biomarkers (311); applying the generalized biomarker model (330) in relation to the first biomarker to the patient medical information associated with the population of individuals; receiving (530), from the generalized biomarker model (330), a first output (350) indicating a first group of the population of individuals exceeding a first likelihood threshold of having been tested for the first biomarker; and determining (540), based on the first output, whether an individual from among the first group of the population of individuals is a candidate for the cohort, wherein candidacy in the cohort enables identification of candidates for one or more of a particular treatment and a particular clinical trial.
- The computer-implemented method of claim 14, wherein the patient medical information comprises medical records (210) associated with the population of individuals.
- The computer-implemented method of claim 15, wherein the medical records include structured information (212) and unstructured information (211) associated with the population of individuals.
- The computer-implemented method of claim 16, wherein the unstructured information (211) includes text written by a health care provider, a radiology report, or a pathology report, preferably wherein the generalized biomarker model is trained based on the unstructured information (211).
- The computer-implemented method of claim 14, wherein determining whether the individual is a candidate for the cohort comprises verifying, based on a medical record (210) associated with the individual, that the individual has been tested for the biomarker.
- The computer-implemented method of claim 14, wherein the likelihood threshold is adjustable based on levels of efficiency and performance of the model.
- A model-assisted system (130) to determine whether an individual from a population is a candidate of a cohort based on whether the individual exceeds a first likelihood threshold of having been treated with a drug, the system comprising: at least one processor programmed to: access (510) a database (132) from which patient information identifying treatment of a population of individuals with drugs can be derived; provide (520), to a generalized treatment model, the identification of treatment with a first drug associated with a cohort, the generalized treatment model being trained based on identification of treatment of patients with one or more second drugs using the information, wherein the first drug is different from the one or more second drugs, and the model training is done based in that medical records describe or discuss treatment of patients using drugs in similar ways, the generalized treatment model being trained to identify patients that have been identified as having been treated with drugs other than the one or more second drugs; applying the generalized treatment model in relation to treatment with the first drug to the patient medical information associated with the population of individuals; receive (530), from the generalized treatment model, a first output indicating a first group of the population of individuals exceeding a first likelihood threshold of been treated using the first drug; and determine (540), based on the first output, whether an individual from among the first group of the population of individuals is a candidate for the cohort, wherein candidacy in the cohort enables identification of candidates for one or more of a particular treatment and a particular clinical trial.
Description
The present disclosure relates to the selection of cohorts and, more specifically, to the use of one or more generalized models to automatically select cohorts. Background Information In cancer treatment and in the treatment of various other diseases, there is an increasing drive to provide personalized treatment for patients. As one example, in order to provide a more effective treatment, patients with a particular form of cancer (e.g., lung cancer, breast cancer, etc.) may be provided an individualized treatment plan based on genomic markers of the individual's tumor cells. Each of the tumor cells may have a particular genetic profile defining how they interact with other cells in the body and defining the kinds of biological pathways that may allow for the most effective treatment. Thus, as the medical industry moves towards more individualized treatment plans, it may be increasingly important to be able to identify patients having certain treatment histories and/or characteristics. Returning to the example of oncology patients, it may be desirable to identify patients exhibiting certain biomarkers. For example, patients may be identified as candidates for particular treatments, particular clinical trials, or other similar groups based on whether they have been tested for a particular biomarker and the results of the treatment. However, identifying patients with particular biomarkers may be difficult when examining large groups of medical data. For example, this may require searching through thousands of medical records for an indication of whether a patient has been tested for a biomarker and to find the result of the tests. Complicating matters further, individual patients are often tested for hundreds of different biomarkers, many of which are not used as a basis for treatment of the patient. In addition, the medical records often contain handwritten notes or other text which may make automation of this process more difficult. Some solutions may include developing a machine learning model to determine whether a patient has been tested for a specific biomarker. For example, the model may be trained based on a set of medical records where it is known whether the patient has been tested for a particular biomarker or not. But such solutions require individualized models for each biomarker, which may not be feasible due to the wide variety of biomarkers that may be tested for and the limited data available for certain biomarkers. Thus, there is a need for an improved approach for identifying patients having particular treatment characteristics. Solutions should allow for development of a machine learning model that is not dependent on the particular biomarkers (or other characteristics) that were used to train the model. Accordingly, using a generalized biomarker model, patients associated with a particular biomarker may be identified, regardless of the availability of medical data associated with that particular biomarker. US 2018/300640 discloses systems and methods are disclosed for selecting cohorts. In one disclosed implementation, a model-assisted selection system for identifying candidates for placement into a cohort includes a data interface and at least one processing device. The at least one processing device is programmed to access, via the data interface, a database from which feature vectors associated with an individual from among a population of individuals can be derived; derive, for the individual, one or more feature vectors from the database; provide the one or more feature vectors to a model; receive an output from the model; and determine whether the individual from among the population of individuals is a candidate for the cohort based on the output received from the model. US 2014/122126 discloses methods for processing data in order to assess the likelihood that a patient belongs within a specified cohort. The disclosed method includes receiving a plurality of data elements from multiple data sets, wherein at least a portion of the plurality of data elements are unstructured data elements; and assessing the likelihood that the patient belongs within the specified cohort using at least a portion of the plurality of data elements including at least one unstructured data element. In some of the disclosed embodiments, the method further includes processing the unstructured data elements and may further include querying at least a portion of the plurality of data elements including at least one unstructured data element to assess the likelihood that the patient belongs within the specified cohort. SUMMARY In accordance with one aspect of the present invention there is provided a model-assisted system to determine whether an individual from a population is a candidate of a cohort based on whether the individual exceeds a first likelihood threshold of having been tested for a biomarker in accordance with claim 1. In accordance with another aspect of the present invention there is provided