Search

EP-4736184-A1 - MACHINE LEARNING (ML)-BASED SYSTEMS AND METHODS FOR PREDICTING DISEASE

EP4736184A1EP 4736184 A1EP4736184 A1EP 4736184A1EP-4736184-A1

Abstract

Machine Learning (ML)-based systems and methods are described for predicting cardiovascular disease of users of specific geographic regions. In various aspects, user- specific cardiovascular data of a user may be input into an ML model trained with data of a plurality of cardiovascular risk factors specific to a population of given geographic region. The plurality of cardiovascular risk factors is subdivided into a first training data subset (preselected factors) and a second training data subset (remaining factors). The user-specific cardiovascular data of the user as input into the ML model is data of the user corresponding to the preselected subset of cardiovascular risk factors and the remaining subset of cardiovascular risk factors. The ML model outputs a user-specific cardiovascular prediction of the user. The user-specific cardiovascular prediction comprises a cardiovascular risk score of the user. The cardiovascular prediction is displayed on a graphical user interface (GUI).

Inventors

  • Chui, Sze Ling Celine
  • LUO, RUIBANG
  • ZHOU, YEKAI
  • Wong, Ian Chi Kei

Assignees

  • Amgen Inc.

Dates

Publication Date
20260506
Application Date
20240816

Claims (1)

  1. PATENT APPLICATION Attorney Docket No.: 32263/59457 PC CLAIMS What is claimed is: 1. A machine learning (ML)-based system for predicting cardiovascular disease, the ML-based system comprising: an ML model stored on a computer memory, the ML model trained with data of a plurality of cardiovascular risk factors, the plurality of cardiovascular risk factors subdivided into a first training data subset and a second training data subset prior to training the ML model, wherein the first training data subset comprises a preselected subset of cardiovascular risk factors, and wherein the second training data subset comprises a remaining subset of cardiovascular risk factors; a set of computing instructions stored on the computer memory and configured to access the ML model; a processor communicatively coupled to the computer memory, and the processor configured to access the set of computing instructions and the ML model, wherein the computing instructions, when executed by the processor, cause the processor to: input user-specific cardiovascular data of a user into the ML model, wherein the user is a member of a geographic region, wherein the user-specific cardiovascular data of the user as input into the ML model is data of the user corresponding to the preselected subset of cardiovascular risk factors and the remaining subset of cardiovascular risk factors, and wherein the ML model outputs a user-specific cardiovascular prediction of the user, the user-specific cardiovascular prediction comprising a cardiovascular risk score of the user; displaying, by a graphical user interface (GUI), the user-specific cardiovascular prediction. 2. The ML-based system of claim 1, wherein the ML model is a Cox proportional hazards model. 3. The ML-based system of claim 2, wherein the computing instructions are further configured, when executed by the processor, to implement or apply a gradient boosting PATENT APPLICATION Attorney Docket No.: 32263/59457 PC algorithm to the second training data subset of the remaining subset of cardiovascular risk factors to enhance the Cox proportional hazards model. 4. The ML-based system of claim of any one of claims 1-3, wherein each of the plurality of cardiovascular risk factors is specific to a population of the geographic region. 5. The ML-based system of claim 4, wherein the geographic region defining the plurality of cardiovascular risk factors on which the ML model is trained comprises a plurality subregions or cohorts comprising individuals located within each respective subregion or cohort. 6. The ML-based system of any one of claims 1-5, wherein the preselected subset of cardiovascular risk factors comprises risk factors selected from one or more risk categories defining indications of cardiovascular health. 7. The ML-based system of claim 6, wherein the one or more risk categories comprise demographic factors, family history of disease, healthcare utilization, clinical laboratory testing, medication history, disease history, and drug use. 8. The ML-based system of any one of claims 1-6, wherein the preselected subset of cardiovascular risk factors have a linear relationship with the ML model, and wherein the remaining subset of cardiovascular risk factors have a non-linear relationship with the ML model. 9. The ML-based system of claim 8, wherein the preselected subset of cardiovascular risk factors comprises one or more of values related to: age, sex, family history of diabetes, accident and emergency visits per year, aspartate transaminase, alanine aminotransferase, low-density lipoprotein cholesterol, neutrophil, statins, myocardial infarction, angina, revascularization, atrial fibrillation, hypertension, and/or user history of diabetes. 10. The ML-based system of any one of claims 1-9, wherein at least a portion of the preselected subset of cardiovascular risk factors comprises imputed data generated to replace missing values, and wherein the remaining subset of cardiovascular risk factors are not imputed. PATENT APPLICATION Attorney Docket No.: 32263/59457 PC 11. The ML-based system of claim 4, wherein the ML model is further trained with data defining one or more threshold risks, where each threshold risk defines a magnitude of a clinical health benefit to a user of the geographic region. 12. The ML-based system of any one of claims 1-11, wherein a C-statistic for the ML model has a value of at least 0.69. 13. The ML-based system of any one of claims 1-12, wherein the user-specific cardiovascular prediction is a cardiovascular disease (CVD) risk prediction for the user in a 10- year timeframe. 14. The ML-based system of any one of claims 1-13, wherein the ML model is further trained with data of one or more drug classes identified for reducing cardiovascular disease (CVD), and wherein the user-specific cardiovascular data of the user as input into the ML model further comprises a selection of one or more of the drug classes, and wherein the user- specific cardiovascular prediction of the user comprises a CVD risk prediction that predicts the user’ s cardiovascular after using the one or more of the drug classes as selected. 15. The ML-based system of any one of claims 1-14, wherein the GUI is configured to receive the user-specific cardiovascular data of the user, and wherein the GUI is further configured to provide the user-specific cardiovascular data as input to the ML model. 16. The ML-based system of claim 15, wherein the GUI provides graphical fields or selections for selecting one or more types of drug classes for selection or generation of a user- specific plan to address the user’ s cardiovascular health. 17. The ML-based system of any one of claims 1-16, wherein the user-specific cardiovascular prediction comprises a user-specific medical prescription predicted to reduce the user’ s cardiovascular disease (CVD) risk. PATENT APPLICATION Attorney Docket No.: 32263/59457 PC 18. The ML-based system of any one of claims 1-17, wherein the user-specific cardiovascular prediction causes generation of a user-specific activity predicted to reduce the user’ s cardiovascular disease (CVD) risk. 19. A machine learning (ML)-based method for predicting cardiovascular disease, the ML-based method comprising: training, by one or more processors, an ML model with data of a plurality of cardiovascular risk factors, the plurality of cardiovascular risk factors subdivided into a first training data subset and a second training data subset prior to training the ML model, wherein the first training data subset comprises a preselected subset of cardiovascular risk factors, and wherein the second training subset comprises a remaining subset of cardiovascular risk factors; inputting, by the one or more processors, user-specific cardiovascular data of a user into the ML model, wherein the user is a member of a geographic region, and wherein the user- specific cardiovascular data of the user as input into the ML model is data of the user corresponding to the preselected subset of cardiovascular risk factors and the remaining subset of cardiovascular risk factors; outputting, by the one or more processors accessing the ML model, a user-specific cardiovascular prediction of the user, the user-specific cardiovascular prediction comprising a cardiovascular risk score of the user; and displaying, by the one or more processors, the user-specific cardiovascular prediction on a graphical user interface (GUI). 20. The ML-based method of claim 19, wherein the ML model is a Cox proportional hazards model. 21. The ML-based method of claim 20, further comprising implementing or applying a gradient boosting algorithm to the second training data subset of the remaining subset of cardiovascular risk factors to enhance the Cox proportional hazards model. 22. The ML-based method of any one of claims 19-21, wherein each of the plurality of cardiovascular risk factors is specific to a population of a geographic region PATENT APPLICATION Attorney Docket No.: 32263/59457 PC 23. The ML-based method of any one of claims 19-22, wherein the geographic region defining the plurality of cardiovascular risk factors on which the ML model is trained comprises a plurality subregions or cohorts comprising individuals located within each respective subregion or cohort. 24. The ML-based method of any one of claims 19-23, wherein the preselected subset of cardiovascular risk factors comprises risk factors selected from one or more risk categories defining indications of cardiovascular health. 25. The ML-based method of claim 24, wherein the one or more risk categories comprise demographic factors, family history of disease, healthcare utilization, clinical laboratory testing, medication history, disease history, and drug use. 26. The ML-based method of any one of claims 19-25, wherein the preselected subset of cardiovascular risk factors have a linear relationship with the ML model, and wherein the remaining subset of cardiovascular risk factors have a non-linear relationship with the ML model. 27. The ML-based method of claim 26, wherein the preselected subset of cardiovascular risk factors comprises one or more of values related to: age, sex, family history of diabetes, accident and emergency visits per year, aspartate transaminase, alanine aminotransferase, low-density lipoprotein cholesterol, neutrophil, statins, myocardial infarction, angina, revascularization, atrial fibrillation, hypertension, and/or user history of diabetes. 28. The ML-based method of any one of claims 19-27, wherein at least a portion of the preselected subset of cardiovascular risk factors comprises imputed data generated to replace missing values, and wherein the remaining subset of cardiovascular risk factors are not imputed. 29. The ML-based method of any one of claims 19-28, wherein the ML model is further trained with data defining one or more threshold risks, where each threshold risk defines a magnitude of a clinical health benefit to a user of the geographic region. PATENT APPLICATION Attorney Docket No.: 32263/59457 PC 30. The ML-based method of any one of claims 19-29, wherein a C-statistic for the ML model has a value of at least 0.69. 31. The ML-based method of any one of claims 19-30, wherein the user-specific cardiovascular prediction is a cardiovascular disease (CVD) risk prediction for the user in a 10- year timeframe. 32. The ML-based method of any one of claims 19-31, wherein the ML model is further trained with data of one or more drug classes identified for reducing cardiovascular disease (CVD), and wherein the user-specific cardiovascular data of the user as input into the ML model further comprises a selection of one or more of the drug classes, and wherein the user- specific cardiovascular prediction of the user comprises a CVD risk prediction that predicts the user’ s cardiovascular after using the one or more of the drug classes as selected. 33. The ML-based method of any one of claims 19-32, wherein the GUI is configured to receive the user-specific cardiovascular data of the user, and wherein the GUI is further configured to provide the user-specific cardiovascular data as input to the ML model. 34. The ML-based method of claim 33, wherein the GUI provides graphical fields or selections for selecting one or more types of drug classes for selection or generation of a user- specific plan to address the user’ s cardiovascular health. 35. The ML-based method of any one of claims 19-34, wherein the user-specific cardiovascular prediction comprises a user-specific medical prescription predicted to reduce the user’ s cardiovascular disease (CVD) risk. 36. The ML-based method of any one of claims 19-35, wherein the user-specific cardiovascular prediction causes generation of a user-specific activity predicted to reduce the user’ s cardiovascular disease (CVD) risk. PATENT APPLICATION Attorney Docket No.: 32263/59457 PC 37. A tangible, non-transitory computer-readable medium storing computing instructions for predicting cardiovascular disease, that when executed by one or more processors cause the one or more processors to: train an ML model with data of a plurality of cardiovascular risk factors, the plurality of cardiovascular risk factors subdivided into a first training data subset and a second training data subset prior to training the ML model, wherein the first training data subset comprises a preselected subset of cardiovascular risk factors, and wherein the second training subset comprises a remaining subset of cardiovascular risk factors, input user-specific cardiovascular data of a user into an ML model stored on a computer memory, wherein the user is a member of a geographic region, and wherein the user-specific cardiovascular data of the user as input into the ML model is data of the user corresponding to the preselected subset of cardiovascular risk factors and the remaining subset of cardiovascular risk factors, output, by the ML model, a user-specific cardiovascular prediction of the user, the user- specific cardiovascular prediction comprising a cardiovascular risk score of the user; and display, by a graphical user interface (GUI), the user-specific cardiovascular prediction. 38. The tangible, non-transitory computer-readable medium of claim 37, wherein the ML model is a Cox proportional hazards model. 39. The tangible, non-transitory computer-readable medium of claim 38, wherein the computing instructions are further configured, when executed by the processor, to implement or apply a gradient boosting algorithm to the second training data subset of the remaining subset of cardiovascular risk factors to enhance the Cox proportional hazards model. 40. The tangible, non-transitory computer-readable medium of any one of claims 37- 39, wherein each of the plurality of cardiovascular risk factors is specific to a population of a geographic region. 41. The tangible, non-transitory computer-readable medium of any one of claims 37- 40, wherein the geographic region defining the plurality of cardiovascular risk factors on which PATENT APPLICATION Attorney Docket No.: 32263/59457 PC the ML model is trained comprises a plurality subregions or cohorts comprising individuals located within each respective subregion or cohort. 42. The tangible, non-transitory computer-readable medium of any one of claims 37- 41, wherein the preselected subset of cardiovascular risk factors comprises risk factors selected from one or more risk categories defining indications of cardiovascular health. 43. The tangible, non-transitory computer-readable medium of claim 42, wherein the one or more risk categories comprise demographic factors, family history of disease, healthcare utilization, clinical laboratory testing, medication history, disease history, and drug use. 44. The tangible, non-transitory computer-readable medium of any one of claims 37- 43, wherein the preselected subset of cardiovascular risk factors have a linear relationship with the ML model, and wherein the remaining subset of cardiovascular risk factors have a non-linear relationship with the ML model. 45. The tangible, non-transitory computer-readable medium of claim 44, wherein the preselected subset of cardiovascular risk factors comprises one or more of values related to: age, sex, family history of diabetes, accident and emergency visits per year, aspartate transaminase, alanine aminotransferase, low-density lipoprotein cholesterol, neutrophil, statins, myocardial infarction, angina, revascularization, atrial fibrillation, hypertension, and/or user history of diabetes. 46. The tangible, non-transitory computer-readable medium of any one of claims 37- 45, wherein at least a portion of the preselected subset of cardiovascular risk factors comprises imputed data generated to replace missing values, and wherein the remaining subset of cardiovascular risk factors are not imputed. 47. The tangible, non-transitory computer-readable medium of any one of claims 37- 46, wherein the ML model is further trained with data defining one or more threshold risks, where each threshold risk defines a magnitude of a clinical health benefit to a user of the geographic region. PATENT APPLICATION Attorney Docket No.: 32263/59457 PC 48. The tangible, non-transitory computer-readable medium of any one of claims 37- 47, wherein a C-statistic for the ML model has a value of at least 0.69. 49. The tangible, non-transitory computer-readable medium of any one of claims 37- 48, wherein the user-specific cardiovascular prediction is a cardiovascular disease (CVD) risk prediction for the user in a 10-year timeframe. 50. The tangible, non-transitory computer-readable medium of any one of claims 37- 49, wherein the ML model is further trained with data of one or more drug classes identified for reducing cardiovascular disease (CVD), and wherein the user-specific cardiovascular data of the user as input into the ML model further comprises a selection of one or more of the drug classes, and wherein the user-specific cardiovascular prediction of the user comprises a CVD risk prediction that predicts the user’ s cardiovascular after using the one or more of the drug classes as selected. 51. The tangible, non-transitory computer-readable medium of any one of claims 37- 50, wherein the GUI is configured to receive the user-specific cardiovascular data of the user, and wherein the GUI is further configured to provide the user-specific cardiovascular data as input to the ML model. 52. The tangible, non-transitory computer-readable medium of claim 51, wherein the GUI provides graphical fields or selections for selecting one or more types of drug classes for selection or generation of a user-specific plan to address the user’ s cardiovascular health. 53. The tangible, non-transitory computer-readable medium of any one of claims 37- 52, wherein the user-specific cardiovascular prediction comprises a user-specific medical prescription predicted to reduce the user’ s cardiovascular disease (CVD) risk. 54. The tangible, non-transitory computer-readable medium of any one of claims 37- 53, wherein the user-specific cardiovascular prediction causes generation of a user-specific activity predicted to reduce the user’ s cardiovascular disease (CVD) risk. PATENT APPLICATION Attorney Docket No.: 32263/59457 PC 55. A machine learning (ML)-based method for predicting disease, the ML-based method comprising: training, by one or more processors, an ML model with data of a plurality of disease risk factors specific to a population of a given geographic region, the plurality of disease risk factors subdivided into a first training data subset and a second training data subset prior to training the ML model, wherein the first training data subset comprises a preselected subset of disease risk factors, and wherein the second training subset comprises a remaining subset of disease risk factors, inputting, by the one or more processors, user-specific health data of a user into the ML model, wherein the user is a member of the geographic region, and wherein the user-specific health data of the user as input into the ML model is data of the user corresponding to the preselected subset of disease risk factors and the remaining subset of disease risk factors, outputting, by the one or more processors accessing the ML model, a user-specific disease prediction of the user, the user-specific disease prediction comprising a disease risk score of the user; and displaying, by the one or more processors, the user-specific disease prediction on a graphical user interface (GUI).

Description

PATENT APPLICATION Attorney Docket No.: 32263/59457 PC MACHINE LEARNING (ML)-BASED SYSTEMS AND METHODS FOR PREDICTING DISEASE RELATED APPLICATION [0001] This application claims the benefit of U.S. Provisional Application No. 63/520,554 (filed on August 18, 2023), which is incorporated in its entirety by reference herein. FIELD OF THE DISCLOSURE [0002] The present disclosure generally relates to artificial intelligence (AI)-based systems and methods, and, more particularly, to machine learning (ML)-based systems and methods for predicting disease (e.g., cardiovascular disease) of users. BACKGROUND [0003] Predicting different types of diseases is important for personalized medicine. Cardiovascular disease (CVD) is a leading cause of mortality, especially in developing countries. Cardiovascular diseases (CVD), including coronary heart disease and stroke, are the leading cause of non-communicable deaths globally, with an estimated 18·6 million fatalities recorded in 2019. Cardiovascular diseases can be measured and affect various geographic regions. For example, Cardiovascular diseases are the leading cause of death and disease burden in China, contributing to 3.72 million deaths in 2013 and total hospitalization costs of approximately $14.5 billion (US) in 2016. As a further example, in Hong Kong, heart disease and cerebrovascular diseases were the third and fourth leading cause of deaths in 2021. However, according to a World Health Organization report, 80% of premature heart attacks and strokes are preventable. BRIEF SUMMARY [0004] As described herein, ML-based systems and methods are disclosed for predicting disease (e.g., cardiovascular disease) of users. The output of the ML-based systems and methods disclosed herein can be geographically specific, and therefore can account for risk factors, and make predictions for, a given population of that geographic location or region. Further the risk prediction model described herein can be specifically tailored to a specific population for disease prevention and provides dynamic medication treatment with drugs proven to reduce Cardiovascular disease (CVD) risk. In this way, the ML-based systems and methods described PATENT APPLICATION Attorney Docket No.: 32263/59457 PC herein can provide an important technology to identify and reduce the CVD healthcare burden for a specific geographic region. [0005] In one aspect, a disclosed ML model is trained with data comprising cardiovascular risk factors specific to a specific geographic region in China, which includes one or more geographic regions of China (e.g., Hong Kong). In view of this, the disclosed ML model is referred to herein as the Personalized CARdiovascular DIsease risk Assessment for Chinese (P- CARDIAC) model, which is a specific ML model trained and validated among Chinese population data using Machine-Learning (ML) techniques as described herein. However, it is to be understood that the ML-based systems and methods as described herein may be used with respect to different datasets comprising cardiovascular risk factors specific to additional or different geographic regions having people with additional or different biodiversity. [0006] The ML model (i.e., the P-CARDIAC model), as described herein, can be used to identify patterns in large data sets to enable delivery of healthcare services by facilitating effective patient-provider decision-making. The ML model (e.g., the P-CARDIAC model) can provide early intervention for patients at high risk of recurrent CVD by leveraging a rich data source of electronic health records (EHR). The ML model (i.e., P-CARDIAC) can estimate the 10 years of recurrent CVD risk for high-risk individuals with consideration of an array of risk variables captured in the EHR. [0007] The ML model (i.e., P-CARDIAC), as described herein, can provide predictions of CVD and guidance, treatments, or other output specific to a user, where the guidance, treatments, or other output can comprise information comprising a recommended prescription of one or more drugs or drug classes for treating CVD for a specific user, a user-specific activity for the user (e.g., increased visits to a medical professional), or other such guidance for providing early intervention for a user of the given geographic region (e.g., China) with a high-risk of recurrent CVD. [0008] The performance of the ML model (i.e., P-CARDIAC), as described herein, is more accurate than known techniques involving risk scores for recurrent CVD risk prediction among individuals with established CVD. Such known techniques include TRS-2°P and SMART2. In particular, the ML model (i.e., the P-CARDIAC model) achieves a higher predictive accuracy than TRS-2°P and SMART2 from data cohorts (cohort data between 2004 and 2019) from Hong PATENT APPLICATION Attorney Docket No.: 32263/59457 PC Kong, a city in Southeast Asia where over 90% of inhabitants are of Chinese ethnicity. In particular, the ML model (i.e., the P-CARDIAC model) ha