US-12626195-B2 - Predicting user attributes using uncertainty estimate modeling

US12626195B2US 12626195 B2US12626195 B2US 12626195B2US-12626195-B2

Abstract

Methods and apparatuses are described for predicting user attributes using uncertainty estimate modeling. A server trains a plurality of machine learning (ML) models to predict a distribution of values for a plurality of user attributes. The server determines an uncertainty measure of the ML models for each user attribute based upon the predicted distribution of values. The server receives a request for prediction of user attributes from a client device and generates for each user attribute a first predicted distribution using one or more of the trained models. The server classifies, for each user attribute, an accuracy of the first predicted distribution based upon the uncertainty measure and provides the first predicted distribution and the accuracy for each user attribute to the client device for presentation. The server updates the first predicted value for the user attributes based upon input received from the client computing device.

Inventors

Christopher Fusting
Lei Zhang
Wenlu Yan

Assignees

FMR LLC

Dates

Publication Date: 20260512
Application Date: 20221129

Claims (20)

1 . A system for predicting user attributes using uncertainty estimate modeling, the system comprising a server computing device with a memory for storing computer-executable instructions and a processor that executes the computer-executable instructions to: train each of a plurality of gradient boosting machine learning models using historical user profile data to predict a distribution of values for each of a plurality of user attributes, wherein the predicted distribution of values for each user attribute comprises a distribution with predicted summary statistics; determine an uncertainty measure of the plurality of gradient boosting machine learning models for each user attribute based upon the predicted distribution of values generated by each machine learning model for the user attribute; receive a request for prediction of one or more user attributes from a client computing device, the request including a user identifier of a user at the client computing device; generate, for each of the one or more user attributes, a first predicted distribution for the user attribute using one or more of the trained gradient boosting machine learning models, comprising: identifying a number of trained machine learning models to execute based upon the request received from the client computing device, determining one or more characteristics of the user based upon the user identifier, executing each of the identified machine learning models using the one or more characteristics as input to generate a predicted distribution for the user attribute, and combining the predicted distributions generated by each identified machine learning model to create the first predicted distribution; classify, for each of the one or more user attributes, an accuracy of the first predicted distribution for the user attribute based upon the uncertainty measure of the plurality of gradient boosting machine learning models for the user attribute; provide the first predicted distribution and the accuracy for each of the one or more user attributes to the client computing device, wherein a first predicted value of the user attribute from the first predicted distribution is displayed in a user interface on the client computing device and one or more sensory features of the displayed first value correspond to the accuracy; update, for each of the one or more user attributes, the first predicted value for the user attribute based upon input received from the client computing device.
2 . The system of claim 1 , wherein the uncertainty measure for the plurality of machine learning models comprises (i) an uncertainty measure of predicted means from each machine learning model for the given user attribute and (ii) a mean of predicted standard deviations from each machine learning model for the given user attribute.
3 . The system of claim 2 , wherein training each of a plurality of machine learning models using historical user profile data to predict a distribution of values for each of a plurality of user attributes comprises training each machine learning model on a different subset of the historical user profile data.
4 . The system of claim 1 , wherein the uncertainty measure of the plurality of machine learning models for each user attribute comprises a coefficient of variation based upon a predicted mean and a predicted standard deviation for each machine learning model.
5 . The system of claim 4 , wherein classifying, for each of the one or more user attributes, an accuracy of the first predicted distribution for the user attribute based upon the uncertainty measure of the plurality of machine learning models for the user attribute comprises: comparing the uncertainty measure of the plurality of machine learning models for the user attribute to a threshold value; selecting a prediction classifier based upon the comparison; and classifying the accuracy of the first predicted distribution for the user attribute using the prediction classifier.
6 . The system of claim 1 , wherein the historical user profile data comprises user demographic attributes, user asset values, user account information, and user behavior attributes.
7 . The system of claim 6 , wherein the plurality of user attributes comprise a salary, a net worth, and a retirement age.
8 . The system of claim 1 , wherein the one or more sensory features of the displayed value comprise a color that changes based upon the accuracy.
9 . The system of claim 1 , wherein the input received from the client computing device comprises a replacement value for the user attribute.
10 . The system of claim 1 , wherein the input received from the client computing device comprises an indicator confirming the accuracy of the first predicted value for the user attribute.
11 . A computerized method of predicting user attributes using uncertainty estimate modeling, the method comprising: training, by a server computing device, each of a plurality of machine learning models using historical user profile data to predict a distribution of values for each of a plurality of user attributes, wherein the predicted distribution of values for each user attribute comprises a distribution with predicted summary statistics; determining, by the server computing device, an uncertainty measure of the plurality of machine learning models for each user attribute based upon the predicted distribution of values generated by each machine learning model for the user attribute; receiving, by the server computing device, a request for prediction of one or more user attributes from a client computing device, the request including a user identifier of a user at the client computing device; generating, by the server computing device for each of the one or more user attributes, a first predicted distribution for the user attribute using one or more of the trained gradient boosting machine learning models, comprising: identifying a number of trained machine learning models to execute based upon the request received from the client computing device, determining one or more characteristics of the user based upon the user identifier, executing each of the identified machine learning models using the one or more characteristics as input to generate a predicted distribution for the user attribute, and combining the predicted distributions generated by each identified machine learning model to create the first predicted distribution; classifying, by the server computing device for each of the one or more user attributes, an accuracy of the first predicted distribution for the user attribute based upon the uncertainty measure of the plurality of machine learning models for the user attribute; providing, by the server computing device, the first predicted distribution and the accuracy for each of the one or more user attributes to the client computing device, wherein a first predicted value of the user attribute from the first predicted distribution is displayed in a user interface on the client computing device and one or more sensory features of the displayed first value correspond to the accuracy; updating, by the server computing device for each of the one or more user attributes, the first predicted value for the user attribute based upon input received from the client computing device.
12 . The method of claim 11 , wherein the uncertainty measure for the plurality of machine learning models comprises (i) an uncertainty measure of predicted means from each machine learning model for the given user attribute and (ii) a mean of predicted standard deviations from each machine learning model for the given user attribute.
13 . The method of claim 12 , wherein training each of a plurality of machine learning models using historical user profile data to predict a distribution of values for each of a plurality of user attributes comprises training each machine learning model on a different subset of the historical user profile data.
14 . The method of claim 11 , wherein the uncertainty measure of the plurality of machine learning models for each user attribute comprises a coefficient of variation based upon a predicted mean and a predicted standard deviation for each machine learning model.
15 . The method of claim 14 , wherein classifying, for each of the one or more user attributes, an accuracy of the first predicted distribution for the user attribute based upon the uncertainty measure of the plurality of machine learning models for the user attribute comprises: comparing the uncertainty measure of the plurality of machine learning models for the user attribute to a threshold value; selecting a prediction classifier based upon the comparison; and classifying the accuracy of the first predicted distribution for the user attribute using the prediction classifier.
16 . The method of claim 11 , wherein the historical user profile data comprises user demographic attributes, user asset values, user account information, and user behavior attributes.
17 . The method of claim 16 , wherein the plurality of user attributes comprise a salary, a net worth, and a retirement age.
18 . The method of claim 11 , wherein the one or more sensory features of the displayed value comprise a color that changes based upon the accuracy.
19 . The method of claim 11 , wherein the input received from the client computing device comprises a replacement value for the user attribute.
20 . The method of claim 11 , wherein the input received from the client computing device comprises an indicator confirming the accuracy of the first predicted value for the user attribute.

Description

TECHNICAL FIELD This application relates generally to methods and apparatuses, including computer program products, for predicting user attributes using uncertainty estimate modeling via an ensemble of gradient boosting machine learning models. BACKGROUND Large consumer-facing companies constantly face a challenge of providing customized, actionable information to their customers. Often, this information is dependent upon evaluation of historical customer information in order to predict user attributes or characteristics for use in generating the customized information. As one example, an organization may provide a retirement planning tool that generates a retirement forecast for a given user based upon, e.g., estimates of the user's salary, age, savings, and other relevant information. However, many times the uncertainty around the predictions generated by such systems may be high—which can result in flawed customer information that is based upon inaccurate predictions. Existing systems attempt to leverage machine learning techniques to generate accurate predictions of user attributes. In some examples, systems use probabilistic methods for regression when generating predictions—such as post-hoc variance, Generalized additive models for location scale and shape (GAMLSS), Bayesian methods, or Bayesian deep learning. However, each of these methods has significant drawbacks; typically they are slow, inflexible, do not scale, and are hard to use. In addition, such techniques may be affected by aleatoric uncertainty (that is, uncertainty around the training data input to the model) and/or epistemic uncertainty (that is, uncertainty around the predictions made by the model). In these cases, it can be difficult to adequately account for such uncertainty in model predictions so that modifications are made to application workflow in view of the relative accuracy or inaccuracy of the predictions. SUMMARY Therefore, what is needed are improved methods and systems for predicting user attributes using uncertainty estimate modeling that leverage gradient boosting based machine learning models to understand whether a prediction made by a model is accurate. The techniques provided herein advantageously utilize uncertainty estimates to engage customers and improve data and predictions, and also use uncertainty estimates to target customers with little or poor-quality data for proactive data collection. The invention, in one aspect, features a system for predicting user attributes using uncertainty estimate modeling. The system comprises a server computing device with a memory for storing computer-executable instructions and a processor that executes the computer-executable instructions. The server computing device trains each of a plurality of machine learning models using historical user profile data to predict a distribution of values for each of a plurality of user attributes. The server computing device determines an uncertainty measure of the plurality of machine learning models for each user attribute based upon the predicted distribution of values generated by each machine learning model for the user attribute. The server computing device receives a request for prediction of one or more user attributes from a client computing device, the request including a user identifier of a user at the client computing device. The server computing device generates, for each of the one or more user attributes, a first predicted distribution for the user attribute using one or more of the trained gradient boosting machine learning models. The server computing device classifies, for each of the one or more user attributes, an accuracy of the first predicted distribution for the user attribute based upon the uncertainty measure of the plurality of machine learning models for the user attribute. The server computing device provides the first predicted distribution and the accuracy for each of the one or more user attributes to the client computing device, where a first predicted value of the user attribute from the first predicted distribution is displayed in a user interface on the client computing device and one or more sensory features of the displayed first value correspond to the accuracy. The server computing device updates, for each of the one or more user attributes, the first predicted value for the user attribute based upon input received from the client computing device. The invention, in another aspect, features a computerized method of predicting user attributes using uncertainty estimate modeling. A server computing device trains each of a plurality of machine learning models using historical user profile data to predict a distribution of values for each of a plurality of user attributes. The server computing device determines an uncertainty measure of the plurality of machine learning models for each user attribute based upon the predicted distribution of values generated by each machine learning model for the user att