US-12620008-B2 - Machine learning techniques for integrating distinct clustering schemes given temporal variations
Abstract
Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing predictive data analysis operations configured to integrate distinct clustering schemes given temporal variations. For example, certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis operations by generating integrative predicted scores based at least in part on at least one of: within-cluster consistency scores determined for clusters determined using a first clustering scheme (e.g., a service clustering scheme), within-cluster consistency scores determined for clusters determined using a second clustering scheme (e.g., a recipient clustering scheme), cross-cluster consistency scores, and cross-temporal consistency scores.
Inventors
- Abhay Shukla
- Deepak Singh
- Srinjay Nath
- Ramprasad Anandam Gaddam
Assignees
- OPTUM, INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20220330
Claims (20)
- 1 . A computer-implemented method comprising: generating, by one or more processors and using a machine learning model, an integrative predicted score for a predictive entity according to a first clustering scheme and a second clustering scheme, wherein (i) the predictive entity is associated with a plurality of performance records, (ii) the first clustering scheme and the second clustering scheme comprise different clustering schemes, (iii) the machine learning model is trained based at least in part on a set of training entries, (iv) a training entry of the set of training entries comprises (a) a sequence of temporal consistency score set and (b) a ground-truth consistency score determined based at least in part on a set of user-provided ratings, and (v) determining the integrative predicted score comprises: generating a plurality of first clusters, comprising the plurality of performance records, according to the first clustering scheme and a plurality of second clusters, comprising the plurality of performance records, according to the second clustering scheme for the predictive entity, determining a plurality of performance measures corresponding to the plurality of performance records, determining a plurality of first cluster variation measures corresponding to the plurality of first clusters, wherein a first cluster variation measure of the plurality of first cluster variation measures is associated with a first cluster of the plurality of first clusters and determined based at least in part on a variation of a performance measure associated with the first cluster, determining a plurality of second cluster variation measures corresponding to the plurality of second clusters, wherein a second cluster variation measure of the plurality of second cluster variation measures is associated with a second cluster of the plurality of second clusters and determined based at least in part on a variation of a performance measure associated with the second cluster, determining, based at least in part on the first cluster variation measure for a target first cluster of the plurality of first clusters, a first within-cluster consistency score, determining, based at least in part on the second cluster variation measure for a target second cluster of the plurality of second clusters, a second within-cluster consistency score, determining a first cross-cluster consistency score based at least in part on a first variation of the plurality of first cluster variation measures, determining a second cross-cluster consistency score based at least in part on a second variation of the plurality of second cluster variation measures, and determining the integrative predicted score based at least in part on (i) the first within-cluster consistency score, (ii) the second within-cluster consistency score, (iii) the first cross-cluster consistency score, and (iv) the second cross-cluster consistency score; and initiating, by the one or more processors, one or more prediction-based actions based at least in part on the integrative predicted score.
- 2 . The computer-implemented method of claim 1 , wherein: a performance record of the plurality of performance records is associated with a first identifier of a plurality of first identifiers and a second identifier of a plurality of second identifiers, generating the first clustering scheme comprises: (i) generating a plurality of first embeddings corresponding to the plurality of first identifiers, and (ii) generating the plurality of first clusters based at least in part on distances across the plurality of first embeddings, and generating the second clustering scheme comprises: (i) generating a plurality of second embeddings corresponding to the plurality of second identifiers, and (ii) generating the plurality of second clusters based at least in part on distances across the plurality of second embeddings.
- 3 . The computer-implemented method of claim 2 , wherein: the performance record is associated with a time unit of a sequence of time units, the time unit is associated with a temporal consistency score set of a plurality of temporal consistency score sets, and the integrative predicted score is determined based at least in part on a cross-temporal consistency score that is determined based at least in part on a variation of the plurality of temporal consistency score sets.
- 4 . The computer-implemented method of claim 3 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit service-based within-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit service clusters for a subset of the plurality of performance records that is associated with the time unit.
- 5 . The computer-implemented method of claim 3 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit recipient-based within-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit recipient clusters for a subset of the plurality of performance records that is associated with the time unit.
- 6 . The computer-implemented method of claim 3 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit service-based cross-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit service clusters for a subset of the plurality of performance records that is associated with the time unit.
- 7 . The computer-implemented method of claim 3 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit recipient-based cross-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit recipient clusters for a subset of the plurality of performance records that is associated with the time unit.
- 8 . The computer-implemented method of claim 3 , wherein determining the cross-temporal consistency score comprises: during an operational timestep of a sequence of operational timesteps, processing per-timestep input data determined based at least in part on a temporal consistency score set for a corresponding time unit and a preceding hidden state using a recurrent neural network machine learning model to determine an updated hidden state, and determining the cross-temporal consistency score based at least in part on the updated hidden state for a final operational timestep.
- 9 . The computer-implemented method of claim 1 , wherein determining the integrative predicted score comprises: determining, based at least in part on the first cross-cluster consistency score and the second cross-cluster consistency score, a cross-cluster consistency score for the predictive entity, and determining the integrative predicted score based at least in part on the first within-cluster consistency score, the second within-cluster consistency score, and the cross-cluster consistency score.
- 10 . A system comprising: one or more processors; and at least one memory storing processor-executable instructions that, when executed by any one or more of the one or more processors, causes the one or more processors to perform operations comprising: generate, using a machine learning model, an integrative predicted score for a predictive entity according to a first clustering scheme and a second clustering scheme, wherein (i) the predictive entity is associated with a plurality of performance records, (ii) the first clustering scheme and the second clustering scheme comprise different clustering schemes, (iii) the machine learning model is trained based at least in part on a set of training entries, (iv) a training entry of the set of training entries comprises (a) a sequence of temporal consistency score set and (b) a ground-truth consistency score determined based at least in part on a set of user-provided ratings, and (v) determining the integrative predicted score comprises: generating a plurality of first clusters, comprising the plurality of performance records, according to the first clustering scheme and a plurality of second clusters, comprising the plurality of performance records, according to the second clustering scheme for the predictive entity, determining a plurality of performance measures corresponding to the plurality of performance records, determining a plurality of first cluster variation measures corresponding to the plurality of first clusters, wherein a first cluster variation measure of the plurality of first cluster variation measures is associated with a first cluster of the plurality of first clusters and determined based at least in part on a variation of a performance measure associated with the first cluster, determining a plurality of second cluster variation measures corresponding to the plurality of second clusters, wherein a second cluster variation measure of the plurality of second cluster variation measures is associated with a second cluster of the plurality of second clusters and determined based at least in part on a variation of a performance measure associated with the second cluster, determining, based at least in part on the first cluster variation measure for a target first cluster of the plurality of first clusters, a first within-cluster consistency score, determining, based at least in part on the second cluster variation measure for a target second cluster of the plurality of second clusters, a second within-cluster consistency score, determining a first cross-cluster consistency score based at least in part on a first variation of the plurality of first cluster variation measures, determining a second cross-cluster consistency score based at least in part on a second variation of the plurality of second cluster variation measures, and determining the integrative predicted score based at least in part on (i) the first within-cluster consistency score, (ii) the second within-cluster consistency score, (iii) the first cross-cluster consistency score, and (iv) the second cross-cluster consistency score; and initiate one or more prediction-based actions based at least in part on the integrative predicted score.
- 11 . The system of claim 10 , wherein: a performance record of the plurality of performance records is associated with a first identifier of a plurality of first identifiers and a second identifier of a plurality of second identifiers, generating the first clustering scheme comprises: (i) generating a plurality of first embeddings corresponding to the plurality of first identifiers, and (ii) generating the plurality of first clusters based at least in part on distances across the plurality of first embeddings, and generating the second clustering scheme comprises: (i) generating a plurality of second embeddings corresponding to the plurality of second identifiers, and (ii) generating the plurality of second clusters based at least in part on distances across the plurality of second embeddings.
- 12 . The system of claim 11 , wherein: the performance record is associated with a time unit of a sequence of time units, the time unit is associated with a temporal consistency score set of a plurality of temporal consistency score sets, and the integrative predicted score is determined based at least in part on a cross-temporal consistency score that is determined based at least in part on a variation of the plurality of temporal consistency score sets.
- 13 . The system of claim 12 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit service-based within-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit service clusters for a subset of the plurality of performance records that is associated with the time unit.
- 14 . The system of claim 12 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit recipient-based within-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit recipient clusters for a subset of the plurality of performance records that is associated with the time unit.
- 15 . The system of claim 12 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit service-based cross-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit service clusters for a subset of the plurality of performance records that is associated with the time unit.
- 16 . The system of claim 12 , wherein a temporal consistency score set for a particular time unit comprises a per-time-unit recipient-based cross-cluster consistency score for the time unit that is determined based at least in part on a plurality of per-time-unit recipient clusters for a subset of the plurality of performance records that is associated with the time unit.
- 17 . The system of claim 12 , wherein determining the cross-temporal consistency score comprises: during an operational timestep of a sequence of operational timesteps, processing per-timestep input data determined based at least in part on a temporal consistency score set for a corresponding time unit and a preceding hidden state using a recurrent neural network machine learning model to determine an updated hidden state, and determining the cross-temporal consistency score based at least in part on the updated hidden state for a final operational timestep.
- 18 . The system of claim 10 , wherein determining the integrative predicted score comprises: determining, based at least in part on the first cross-cluster consistency score and the second cross-cluster consistency score, a cross-cluster consistency score for the predictive entity, and determining the integrative predicted score based at least in part on the first within-cluster consistency score, the second within-cluster consistency score, and the cross-cluster consistency score.
- 19 . One or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors, cause the one or more processors to: generate, using a machine learning model, an integrative predicted score for a predictive entity according to a first clustering scheme and a second clustering scheme, wherein (i) the predictive entity is associated with a plurality of performance records, (ii) the first clustering scheme and the second clustering scheme comprise different clustering schemes, (iii) the machine learning model is trained based at least in part on a set of training entries, (iv) a training entry of the set of training entries comprises (a) a sequence of temporal consistency score set and (b) a ground-truth consistency score determined based at least in part on a set of user-provided ratings, and (v) determining the integrative predicted score comprises: generating a plurality of first clusters, comprising the plurality of performance records, according to the first clustering scheme and a plurality of second clusters, comprising the plurality of performance records, according to the second clustering scheme for the predictive entity, determining a plurality of performance measures corresponding to the plurality of performance records, determining a plurality of first cluster variation measures corresponding to the plurality of first clusters, wherein a first cluster variation measure of the plurality of first cluster variation measures is associated with a first cluster of the plurality of first clusters and determined based at least in part on a variation of a performance measure associated with the first cluster, determining a plurality of second cluster variation measures corresponding to the plurality of second clusters, wherein a second cluster variation measure of the plurality of second cluster variation measures is associated with a second cluster of the plurality of second clusters and determined based at least in part on a variation of a performance measure associated with the second cluster, determining, based at least in part on the first cluster variation measure for a target first cluster of the plurality of first clusters, a first within-cluster consistency score, determining, based at least in part on the second cluster variation measure for a target second cluster of the plurality of second clusters, a second within-cluster consistency score, determining a first cross-cluster consistency score based at least in part on a first variation of the plurality of first cluster variation measures, determining a second cross-cluster consistency score based at least in part on a second variation of the plurality of second cluster variation measures, and determining the integrative predicted score based at least in part on (i) the first within-cluster consistency score, (ii) the second within-cluster consistency score, (iii) the first cross-cluster consistency score, and (iv) the second cross-cluster consistency score; and initiate one or more prediction-based actions based at least in part on the integrative predicted score.
- 20 . The one or more non-transitory computer-readable storage media of claim 19 , wherein: a performance record of the plurality of performance records is associated with a first identifier of a plurality of first identifiers and a second identifier of a plurality of second identifiers, generating the first clustering scheme comprises: (i) generating a plurality of first embeddings corresponding to the plurality of first identifiers, and (ii) generating the plurality of first clusters based at least in part on distances across the plurality of first embeddings, and generating the second clustering scheme comprises: (i) generating a plurality of second embeddings corresponding to the plurality of second identifiers, and (ii) generating the plurality of second clusters based at least in part on distances across the plurality of second embeddings.
Description
BACKGROUND Various embodiments of the present invention address technical challenges related to performing predictive data analysis operations and address the efficiency and reliability shortcomings of various existing predictive data analysis solutions, in accordance with at least some of the techniques described herein. BRIEF SUMMARY In general, embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing predictive data analysis operations. For example, certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis operations by generating integrative predicted scores based at least in part on at least one of: within-cluster consistency scores determined for clusters determined using a first clustering scheme (e.g., a service clustering scheme), within-cluster consistency scores determined for clusters determined using a second clustering scheme (e.g., a recipient clustering scheme), cross-cluster consistency scores, and cross-temporal consistency scores. In accordance with one aspect, a method is provided. In one embodiment, the method comprises: identifying a service clustering scheme and a recipient clustering scheme for a predictive entity, wherein: (i) the service clustering scheme divides the plurality of performance records into a plurality of service clusters, and (ii) the recipient clustering scheme divides the plurality of performance records into a plurality of recipient clusters; for each performance record, determining a performance measure; for each service cluster, determining a service cluster variation measure based at least in part on variation of each performance measure for the service cluster; for each recipient cluster, determining a recipient cluster variation measure based at least in part on variation of each performance measure for the recipient cluster; determining, based at least in part on the service cluster variation measure for a target service cluster of the plurality of service clusters, a service-based within-cluster consistency score; determining, based at least in part on the recipient cluster variation measure for a target recipient cluster of the plurality of recipient clusters, a recipient-based within-cluster consistency score; determining a service-based cross-cluster consistency score based at least in part on variation of each service cluster aggregation measure; determining a recipient-based cross-cluster consistency score based at least in part on variation of each recipient cluster aggregation measure; determining the integrative predicted score based at least in part on the service-based within-cluster consistency score, the recipient-based within-cluster consistency score, the recipient-based cross-cluster consistency score, and the service-based cross-cluster consistency score; and performing one or more prediction-based actions. In accordance with another aspect, a computer program product is provided. The computer program product may comprise at least one computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising executable portions configured to: identify a service clustering scheme and a recipient clustering scheme for a predictive entity, wherein: (i) the service clustering scheme divides the plurality of performance records into a plurality of service clusters, and (ii) the recipient clustering scheme divides the plurality of performance records into a plurality of recipient clusters; for each performance record, determine a performance measure; for each service cluster, determine a service cluster variation measure based at least in part on variation of each performance measure for the service cluster; for each recipient cluster, determine a recipient cluster variation measure based at least in part on variation of each performance measure for the recipient cluster; determine, based at least in part on the service cluster variation measure for a target service cluster of the plurality of service clusters, a service-based within-cluster consistency score; determine, based at least in part on the recipient cluster variation measure for a target recipient cluster of the plurality of recipient clusters, a recipient-based within-cluster consistency score; determine a service-based cross-cluster consistency score based at least in part on variation of each service cluster aggregation measure; determine a recipient-based cross-cluster consistency score based at least in part on variation of each recipient cluster aggregation measure; determine the integrative predicted score based at least in part on the service-based within-cluster consistency score, the recipient-based within-cluster consistency score, the recipient-based cross-cluster consistency score, and the recipient-based cross-cluster consistency sco