US-12620462-B2 - Information system providing explanation of models
Abstract
A health care information system generates information that describes how different inputs to a model affect the output of the model, by creating a localized model for a given entity, and determining how the output of the localized model for the given entity changes in response to different inputs. The computer system builds the localized model for a given entity based on the model and on data values for the given entity. The computer system inputs one or more different data values for selected input features of the localized model, while data values for the remaining input features of the localized model are fixed to data values for the given entity, and obtains corresponding outputs from the localized model. The results from this localized model for the given entity indicate which of the selected input features have the most impact on the output of the model for that entity.
Inventors
- Constantinos Ioannis Boussios
- Richard Gliklich
- Francis Thomas O'Donovan
Assignees
- OM1, Inc.
Dates
- Publication Date
- 20260505
- Application Date
- 20220112
Claims (20)
- 1 . A computer system, comprising: a. a processing system comprising a processing device and computer storage, wherein the computer storage stores: i. a first data structure including, for a plurality of entities, data values for input features for a trained computational model, wherein the data values for each entity are derived from a respective record for the entity in a data set, and ii. a second data structure including data identifying a selected subset of the input features for the trained computational model, the selected subset having fewer features than the input features; b. computer program code implementing the trained computational model which, when processed by the processing system, configures the processing system to apply input data values for the input features of a given entity to the trained computational model to output a predicted outcome for the given entity in response to the input data values for the input features of the entity; c. computer program code that, when processed by the processing system, configures the processing system to generate a localized model for the given entity, wherein the localized model is distinct from and simplified with respect to the trained computational model and approximates the trained computational model, wherein the processing system generates the localized model based on a. the trained computational model and b. data values for the given entity from the first data structure for input features other than the selected subset of the input features identified by the second data structure, wherein the localized model has inputs and an output, wherein the inputs of the localized model correspond to only the selected subset of input features identified by the second data structure and the output of the localized model provides a predicted outcome for the given entity; d. a sensitivity analysis module comprising computer program code that, when processed by the processing system, configures the processing system to analyze the localized model for the given entity by: applying a plurality of different data values for the input features identified in the selected subset of the input features to the inputs of the localized model for the given entity, wherein the plurality of different data values are different from the data values for the selected subset of input features for the given entity as stored in the first data structure, whereby the localized model outputs respective predicted outcomes for the given entity for the different data values, and storing, in the computer storage, the respective predicted outcomes output from the localized model for the given entity; and e. a graphical user interface comprising computer program code that, when processed by the processing system, is responsive to the sensitivity analysis module to provide an output including human-understandable content describing how data values for the selected subset of input features likely would affect the predicted outcome of the trained computational model as applied to the given entity based on the stored respective predicted outcomes output from the localized model.
- 2 . The computer system of claim 1 , wherein the data set comprises health care information for a plurality of patients, wherein each patient in the plurality of patients has a respective record in the data set, and wherein the given entity is a given patient in the plurality of patients.
- 3 . The computer system of claim 1 , wherein the data set comprises health care information for a plurality of health care providers, wherein each health care provider in the plurality of health care providers has a respective record in the data set, and wherein the given entity is a given health care provider in the plurality of health care providers.
- 4 . The computer system of claim 1 , wherein the trained computational model is trained using a training set derived from the data set.
- 5 . The computer system of claim 1 , wherein the trained computational model performs classification of the entities into categories.
- 6 . The computer system of claim 1 , wherein the trained computational model computes risk factors associated with the entities.
- 7 . The computer system of claim 1 , wherein the entities are patients and the trained computational model computes predictions of outcomes for patients based on the records for the patients in the data set.
- 8 . The computer system of claim 1 , wherein the entities are patients and the trained computational model computes outcome scores for the patients based on the records for the patients in the data set.
- 9 . The computer system of claim 8 , wherein outcome scores are represented using an integer in a range of integer values.
- 10 . The computer system of claim 1 , wherein the entities are patients and the trained computational model computes factor scores for the patients based on the records for the patients in the data set.
- 11 . The computer system of claim 1 , wherein the second data structure comprises a library stored in the computer data storage including data representing the selected subset of input features.
- 12 . The computer system of claim 11 , wherein the selected subset of input features represented by the second data structure further correspond to actions which can be performed for the given entity.
- 13 . The computer system of claim 12 , wherein the library further includes a set of actionable factors, wherein the library stores, for each of the actionable factors, a mapping between the actionable factor and one or more respective input features in the selected subset of input features.
- 14 . The computer system of claim 13 , wherein the library further includes, for each actionable factor in the set of actionable factors, human-understandable content describing actions which can be performed for the given entity for output by the graphical user interface.
- 15 . The computer system of claim 13 , wherein the actionable factors include medically modifiable factors.
- 16 . The computer system of claim 12 , wherein the given entity comprises a patient and the actions include specifying a treatment for the patient.
- 17 . The computer system of claim 12 , wherein the given entity comprises a patient and the actions include specifying a behavior change for the patient.
- 18 . The computer system of claim 1 , wherein the sensitivity analysis module performs a sensitivity analysis on the localized model for the given entity wherein the first set of input features are held constant and sensitivity of the output of the localized model to variations in data values for the second set of input features is determined.
- 19 . The computer system of claim 18 wherein the sensitivity analysis uses a linear regression model.
- 20 . The computer system of claim 18 wherein the sensitivity analysis uses a nonlinear regression model.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a non-provisional application of prior-filed provisional patent application Ser. No. 62/474,587, entitled “Health Care Information System Providing Explanations of Classifiers of Patient Risk”, filed Mar. 21, 2017, which is hereby incorporated by reference. BACKGROUND A challenge in the health care industry is providing meaningful data, on a regular basis, to health care providers, patients, insurers, care managers and other entities, regarding health care outcomes of patients, quality of care by health care providers, and a variety of other health-related metrics. Recently, computer systems have been able to provide measures of health care outcomes and other health-related metrics, such as various types of outcome measurements, scores, categorization and classification, risk identification, and risk factors, for patients, and quality of care and other metrics about health care providers. Such computer systems generally use models which perform analytical computations on patient data, which include, but are not limited to, mathematical operations applied to data, classifiers, clustering algorithms, predictive models, neural networks, deep learning algorithms and systems, artificial intelligence systems, machine learning algorithms, Bayesian systems, natural language processing, or other types of analytical models applied to small or large sets of data. Generally, such a model receives data values for an entity for a plurality of input features, and provides an output for the entity based on the received data values. Such models typically are trained using a training data set in which the outputs corresponding to a set of inputs is known. Such models are complex, are typically non-linear, and generally do not provide sufficient direct information to indicate what input data most affects the output of the model. Without such information, it is difficult to explain and use the output of such models, whether for gaining deeper understanding of the degree to which the different components of entity data affect the output of the model, evaluating possible interventions for a patient with respect to such intervention's effect on the model output, predicting risk, analyzing quality of care, or for other uses. SUMMARY This Summary introduces a selection of concepts, in simplified form, which are described further in the Detailed Description below. This Summary is intended neither to identify key or essential features, nor to limit the scope, of the claimed subject matter. A health care information system generates information that describes how different inputs to a model affect the output of the model, by creating a localized model for a given entity, and determining how the output of the localized model for the given entity changes in response to different inputs. The localized model is an approximation of the model which may or may not differ from the model. This technique can be used for different types of models, such as models that classifies patients into categories or conditions, models that compute risk factors associated with patients, models that predict outcomes for patients, models that compute factor scores or outcome scores for patients, models that evaluate provider performance, models that predict costs, models that compute data values for fields with missing data, models that compute data values for quantities that are estimated based on other data, and yet other models that perform other analyses on health care information. In many cases, the model is trained using supervised machine learning techniques and is a type of machine learning model for which the behavior typically is nonlinear. This technique can be applied to computer models that process other types of data, and is not limited to health care information or the specific types of models described herein. The computer system builds a localized model for a given entity based on the model and on data values for the given entity. The computer system inputs one or more different data values for selected input features of the localized model, while data values for the remaining input features of the localized model are fixed to data values for the given entity. In one implementation, the localized model is created by setting the data values for a set of the input features of the model, other than the selected input features, to data values for those input features for the given entity. In one implementation, the localized model is created by generating another, local model for the given entity, based on the model and on the data values for the given entity. The selected input features can be selected from among input features of the model which correspond to explainable factors for the model. An explainable factor is a factor for which variation in values for corresponding input features of the model may have a relevant effect on the output of the model. The computer system described