EP-4736078-A2 - SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR MODEL MONITORING USING BATCHED INFERENCES

EP4736078A2EP 4736078 A2EP4736078 A2EP 4736078A2EP-4736078-A2

Abstract

Systems, methods, and computer program products for model monitoring using batched inferences are provided. A system includes at least one processor configured to receive processing requests to execute a machine-learning model; execute the at least one machine-learning model for each processing request to generate a decision; communicate the decision to a corresponding system; and further process the processing requests in a batch process by: generating a batch including processing requests; assigning each of the processing requests to a processor of a plurality of processors; and further processing each processing request in parallel with at least one other processing request by executing the machine-learning model for the batch to generate a second output for each processing request, the second output associated with a corresponding decision.

Inventors

HE, Runxin
ZHAO, YONG
LOU, MINGJI
CHETIA, Chiranjeet
KERSTING, Nicholas, Stephen
YU, Junjun
GU, YU
LIU, CAN
Agrawal, Shubham

Assignees

Visa International Service Association

Dates

Publication Date: 20260506
Application Date: 20240625

Claims (20)

1 . A system comprising at least one processor configured to: receive a plurality of processing requests to execute at least one machine-learning model from a plurality of systems; execute the at least one machine-learning model for each processing request of the plurality of processing requests to generate a first output of the machinelearning model for each processing request, the first output comprising a decision; communicate the first output to a corresponding system of the plurality of systems; and further process the plurality of processing requests in a batch process by: generating at least one batch comprising the plurality of processing requests; assigning each of the plurality of processing requests in the at least one batch and/or a batch of the at least one batch to a processor of a plurality of processors; and further processing each processing request of the plurality of processing requests in parallel with at least one other processing request of the plurality of processing requests by executing the at least one machine-learning model for the at least one batch to generate a second output for each processing request, the second output associated with a corresponding decision of the first output.
2. The system of claim 1 , the at least one processor further configured to: receive a plurality of follow-on requests from at least a subset of systems of the plurality of systems, each follow-on request of the plurality of follow-on requests associated with a processing request of the plurality of processing requests; and for each follow-on request: determine the processing request of the plurality of processing requests associated with the follow-on request; retrieve the second output associated with the processing request; and communicate the second output to a corresponding system of the at least a subset of systems.
3. The system of claim 1 , wherein the second output comprises at least one parameter of a plurality of parameters having a higher impact on the decision of the first output.
4. The system of claim 1 , wherein the second output comprises at least one reason code based on inputted transaction data.
5. The system of claim 1 , wherein each processing request is associated with an authorization request for an electronic payment transaction, the decision comprises an authorization decision, and the second output comprises at least one parameter and/or reason code associated with transaction data of the electronic payment transaction.
6. The system of claim 1 , wherein generating the at least one batch comprises generating a matrix comprising data associated with the plurality of processing requests, wherein the matrix is input to the at least one machine-learning model.
7. The system of claim 1 , wherein executing the machine-learning model comprises recording model metadata to an audit log, and wherein executing the at least one machine-learning model for the at least one batch is based on model metadata for the corresponding processing requests of the at least one batch.
8. The system of claim 1 , wherein the at least one processor is further configured to process each processing request of the plurality of processing requests in parallel with at least one other processing request of the plurality of processing requests by: splitting a model decision tree associated with the processing request into a plurality of subtrees; and assigning each subtree to a processing thread of a plurality of parallel processing threads.
9. The system of claim 1 , wherein the at least one processor is further configured to: merge at least two layers of the at least one machine-learning model, resulting in a merged machine-learning model, wherein executing the at least one machine-learning model for the at least one batch comprises executing the merged machine-learning model.
10. The system of claim 1 , wherein processing each processing request of the plurality of processing requests in parallel with at least one other processing request of the plurality of processing requests comprises: monitoring model metrics associated with the at least one machinelearning model while the at least one machine-learning model is executed.
1 1. The system of claim 10, wherein the at least one processor is further configured to: generate an alert while monitoring the model metrics based on the model metrics satisfying at least one alert threshold.
12. The system of claim 10, wherein the at least one processor is further configured to: determine that the model metrics fail to satisfy a threshold; and in response to determining that the model metrics fail to satisfy the threshold, re-train the at least one machine-learning model.
13. The system of claim 1 , wherein generating the second output comprises generating a Shapley value for each parameter of a plurality of parameters associated with the processing request.
14. The system of claim 1 , wherein each of the assigned processors of the plurality of processors comprises a graphics processing unit (GPU) and/or a tensor processing unit (TPU).
15. A computer-implemented method comprising: receiving, with at least one processor, a plurality of processing requests to execute at least one machine-learning model from a plurality of systems; executing, with at least one processor, the at least one machine-learning model for each processing request of the plurality of processing requests to generate a first output of the machine-learning model for each processing request, the first output comprising a decision; communicating, with at least one processor, the first output to a corresponding system of the plurality of systems; and further processing the plurality of processing requests in a batch process by: generating, with at least one processor, at least one batch comprising the plurality of processing requests; assigning, with at least one processor, each of the plurality of processing requests in the at least one batch and/or a batch of the at least one batch to a processor of a plurality of processors; and further processing, with at least one processor, each processing request of the plurality of processing requests in parallel with at least one other processing request of the plurality of processing requests by executing the at least one machine-learning model for the at least one batch to generate a second output for each processing request, the second output associated with a corresponding decision of the first output.
16. The method of claim 15, further comprising: receiving, with at least one processor, a plurality of follow-on requests from at least a subset of systems of the plurality of systems, each follow-on request of the plurality of follow-on requests associated with a processing request of the plurality of processing requests; and for each follow-on request: determining, with at least one processor, the processing request of the plurality of processing requests associated with the follow-on request; retrieving, with at least one processor, the second output associated with the processing request; and communicating, with at least one processor, the second output to a corresponding system of the at least a subset of systems.
17. The method of claim 15, wherein the second output comprises at least one parameter of a plurality of parameters having a higher impact on the decision of the first output.
18. The method of claim 15, wherein the second output comprises at least one reason code based on inputted transaction data.
19. The method of claim 15, wherein each processing request is associated with an authorization request for an electronic payment transaction, the decision comprises an authorization decision, and the second output comprises at least one parameter and/or reason code associated with transaction data of the electronic payment transaction.
20. A computer program product comprising at least one non- transitory computer-readable medium including program instructions that, when executed by at least one processor, causes the at least one processor to: receive a plurality of processing requests to execute at least one machine-learning model from a plurality of systems; execute the at least one machine-learning model for each processing request of the plurality of processing requests to generate a first output of the machinelearning model for each processing request, the first output comprising a decision; communicate the first output to a corresponding system of the plurality of systems; and further process the plurality of processing requests in a batch process by: generating at least one batch comprising the plurality of processing requests; assigning each of the plurality of processing requests in the at least one batch and/or a batch of the at least one batch to a processor of a plurality of processors; and further processing each processing request of the plurality of processing requests in parallel with at least one other processing request of the plurality of processing requests by executing the at least one machine-learning model for the at least one batch to generate a second output for each processing request, the second output associated with a corresponding decision of the first output.

Description

SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR MODEL MONITORING USING BATCHED INFERENCES CROSS-REFERENCE TO RELATED APPLICATION [0001] This application claims the benefit of U.S. Provisional Application No. 63/523,419, filed June 27, 2023, the disclosure of which is hereby incorporated by reference in its entirety. BACKGROUND 1. Field [0002] This disclosure relates generally to machine-learning and, in some nonlimiting embodiments or aspects, to systems, methods, and computer program products for model monitoring using batched inferences. 2. Technical Considerations [0003] Real-time inference platforms to monitor a machine-learning model in production have several technical limitations. For example, some metrics related to a model performance may be difficult and resource-intensive to generate, since it may influence the model’s latency from a service level agreement (SLA) side. Further, existing monitoring systems and dashboards lack the computational resources to run those model-based metrics. SUMMARY [0004] According to non-limiting embodiments or aspects, provided is a system including at least one processor configured to: receive a plurality of processing requests to execute at least one machine-learning model from a plurality of systems; execute the at least one machine-learning model for each processing request of the plurality of processing requests to generate a first output of the machine-learning model for each processing request, the first output including a decision; communicate the first output to a corresponding system of the plurality of systems; and further process the plurality of processing requests in a batch process by: generating at least one batch including the plurality of processing requests; assigning each of the plurality of processing requests in the at least one batch and/or a batch of the at least one batch to a processor of a plurality of processors; and further processing each processing request of the plurality of processing requests in parallel with at least one other processing request of the plurality of processing requests by executing the at least one machine-learning model for the at least one batch to generate a second output for each processing request, the second output associated with a corresponding decision of the first output. [0005] In non-limiting embodiments or aspects, the at least one processor may be further configured to: receive a plurality of follow-on requests from at least a subset of systems of the plurality of systems, each follow-on request of the plurality of follow-on requests associated with a processing request of the plurality of processing requests; and for each follow-on request: determine the processing request of the plurality of processing requests associated with the follow-on request; retrieve the second output associated with the processing request; and communicate the second output to a corresponding system of the at least a subset of systems. [0006] In non-limiting embodiments or aspects, the second output may include at least one parameter of a plurality of parameters having a higher impact on the decision of the first output. [0007] In non-limiting embodiments or aspects, the second output may include at least one reason code based on inputted transaction data. [0008] In non-limiting embodiments or aspects, each processing request may be associated with an authorization request for an electronic payment transaction, the decision may include an authorization decision, and the second output may include at least one parameter and/or reason code associated with transaction data of the electronic payment transaction. [0009] In non-limiting embodiments or aspects, generating the at least one batch may include generating a matrix including data associated with the plurality of processing requests, where the matrix is input to the at least one machine-learning model. [0010] In non-limiting embodiments or aspects, executing the machine-learning model may include recording model metadata to an audit log, and where executing the at least one machine-learning model for the at least one batch may be based on model metadata for the corresponding processing requests of the at least one batch. [0011] In non-limiting embodiments or aspects, the at least one processor may be further configured to process each processing request of the plurality of processing requests in parallel with at least one other processing request of the plurality of processing requests by: splitting a model decision tree associated with the processing request into a plurality of subtrees; and assigning each subtree to a processing thread of a plurality of parallel processing threads. [0012] In non-limiting embodiments or aspects, the at least one processor may be further configured to: merge at least two layers of the at least one machine-learning model, resulting in a merged machine-learning model, where executing the at least one machine-learning model for the at lea