US-20260126765-A1 - Automatic Control Group Generation
Abstract
Techniques are disclosed for automatically generating and updating a control group. In disclosed techniques, a server computer system trains, using a plurality of transactions, a machine learning model. During training the machine learning model learns a feature distribution of both a current set of control group (CG) transactions and a current set of non-control group (non-CG) transactions included in the plurality of transactions. The system inputs the current set of CG transactions into the trained machine learning model. Based on the output of the trained machine learning model for the current set of CG transactions, the system modifies the current set of CG transactions to generate an updated set of CG transactions. Based on the updated set of CG transactions, the server performs one or more preventative measures for a transaction processing system. The disclosed techniques may advantageously improve the accuracy e.g., of a transaction processing system.
Inventors
- Nitin S. Sharma
- Mozhdeh Rouhsedaghat
Assignees
- PAYPAL, INC.
Dates
- Publication Date
- 20260507
- Application Date
- 20251230
Claims (20)
- 1 . A method, comprising: executing, by a server system, a trained machine learning model, wherein the executing includes inputting a current set of control group (CG) transactions into the trained machine learning model, wherein the trained machine learning model is trained by: causing a first portion of the machine learning model to learn a feature distribution of the current set of CG transactions; concatenating output of a second portion of the machine learning model indicating classifications for CG transactions and transactions input to a third portion of the machine learning model for reconstructing CG transactions; and modifying, by the server system based on output of the trained machine learning model for the current set of CG transactions, the current set of CG transactions to generate an updated set of CG transactions; and performing, by the server system based on the updated set of CG transactions, one or more preventative measures.
- 2 . The method of claim 1 , wherein further during training, the machine learning model predicts whether transactions should be included in the current set of CG transactions or a current set of non-CG transactions.
- 3 . The method of claim 1 , wherein the modifying is further based on second output of a second, different trained machine learning model for the current set of CG transactions, wherein the second trained machine learning model executes a non-CG portion on the current set of CG transactions to generate the second output, and wherein the current set of CG transactions predicted by the machine learning model during training are predicted by a CG portion of the machine learning model.
- 4 . The method of claim 1 , wherein transactions in the current set of CG transactions include a first set of transactions used to train the machine learning model and a second, different set of transactions used to test the trained machine learning model.
- 5 . The method of claim 1 , wherein training the machine learning model further includes: concatenating output of the third portion of the machine learning model indicating classifications for non-CG transactions and transactions input to a portion of the machine learning model for reconstructing non-CG transactions.
- 6 . The method of claim 1 , wherein performing the one or more preventative measures includes: training, using the updated set of CG transactions, a machine learning classifier to generate an authorization decision for newly requested transactions.
- 7 . The method of claim 1 , wherein the first portion of the machine learning model is a CG portion of a Dragonnet model, and wherein the second portion of the machine learning model is a non-CG portion of the Dragonnet model.
- 8 . The method of claim 1 , wherein modifying the current set of CG transactions includes: determining reconstruction error of the first portion of the machine learning model by comparing reconstructions of CG transactions output by the first portion with corresponding CG transactions; and removing, based on the reconstruction error, one or more CG transactions from the current set of CG transactions.
- 9 . The method of claim 1 , wherein the machine learning model includes: a branch for reconstructing non-suspicious CG transactions and a branch for reconstructing suspicious CG transactions.
- 10 . A non-transitory computer-readable medium having instructions stored thereon that are executable by a computing device to perform operations comprising: executing a trained machine learning model, including inputting a current set of control group (CG) transactions into the trained machine learning model, wherein during training the machine learning model: learns a feature distribution of the current set of CG transactions; and concatenates output of a portion of the machine learning model indicating classifications for CG transactions and transactions input to a third portion of the machine learning model for reconstructing CG transactions; and modifying, based on output of the trained machine learning model for the current set of CG transactions, the current set of CG transactions to generate an updated set of CG transactions; and performing, based on the updated set of CG transactions, one or more preventative measures for a transaction processing system configured to process newly received transactions.
- 11 . The non-transitory computer-readable medium of claim 10 , wherein further during training the machine learning model predicts whether transactions should be included in the current set of CG transactions or a current set of non-CG transactions.
- 12 . The non-transitory computer-readable medium of claim 11 , wherein transactions in the current set of CG transaction include a first set of transactions used to train the machine learning model and a second, different set of transactions used to test the trained machine learning model.
- 13 . The non-transitory computer-readable medium of claim 10 , wherein the inputting includes inputting the current set of CG transactions into a non-CG portion of the machine learning model.
- 14 . The non-transitory computer-readable medium of claim 13 , wherein modifying the current set of CG transactions includes: determining reconstruction error of the non-CG portion of the machine learning model by comparing reconstructions of CG transactions output by the non-CG portion with corresponding CG transactions; and removing, based on the reconstruction error, one or more CG transactions from the current set of CG transactions.
- 15 . The non-transitory computer-readable medium of claim 14 , wherein modifying the current set of CG transactions further includes: performing a comparison using a divergence algorithm, including comparing transactions in the current set of CG transactions with non-CG transactions; and based on results of the comparison, adding one or more non-CG transactions to the updated set of CG transactions.
- 16 . The non-transitory computer-readable medium of claim 10 , wherein performing the one or more preventative measures includes: training, using the updated set of CG transactions, a machine learning classifier to generate an authorization decision for newly requested transactions.
- 17 . A system, comprising: at least one processor; and a memory having instructions stored thereon that are executable by the at least one processor to cause the system to: execute a trained machine learning model, wherein the executing includes inputting a current set of control group (CG) transactions into the trained machine learning model, and wherein the trained machine learning model is trained by: causing a first portion of the machine learning model to learn a feature distribution of the current set of CG transactions; and concatenating output of a second portion of the machine learning model indicating classifications for CG transactions and transactions input to a third portion of the machine learning model for reconstructing CG transactions; and modify, based on output of the trained machine learning model for the current set of CG transactions, the current set of CG transactions to generate an updated set of CG transactions; and perform, based on the updated set of CG transactions, one or more preventative measures.
- 18 . The system of claim 17 , wherein the inputting includes inputting the current set of CG transactions into a non-CG portion of the machine learning model.
- 19 . The system of claim 17 , wherein training the machine learning model further includes: concatenating output of the second portion of the machine learning model indicating classifications for non-CG transactions and transactions input to a fourth portion of the machine learning model for reconstructing non-CG transactions.
- 20 . The system of claim 17 , wherein the machine learning model is a Dragonnet model, and wherein both a CG portion and a non-CG portion of the Dragonnet model are executed using variational auto encoders (VAEs).
Description
The present application is a continuation of U.S. App. No 17/644,692, entitled “Automatic Control Group Generation” and filed December 16, 2021, the disclosure of which is incorporated by reference herein in its entirety. BACKGROUND TECHNICAL FIELD This disclosure relates generally to data security, and, more specifically, to techniques for automatically detecting anomalous behavior e.g., for improved security. DESCRIPTION OF THE RELATED ART As more and more transactions are conducted electronically via online transaction processing systems, for example, these processing systems become more robust in managing transaction data as well as detecting suspicious and unusual behavior. Many transaction requests, for example, may be generated with malicious in intent, which may result in wasted computer resources, network bandwidth, storage, CPU processing, monetary resources, etc., if those transactions are processed. Some transaction processing systems attempt to analyze various transaction data for previously processed and currently initiated transactions to identify and mitigate malicious behavior such as requests for fraudulent transactions. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating an example system configured to automatically generate control groups, according to some embodiments. FIG. 2 is a block diagram illustrating example training of a representation model, according to some embodiments. FIG. 3 is a block diagram illustrating example execution of a trained representation model, according to some embodiments. FIG. 4 is a block diagram illustrating an example divergence module, according to some embodiments. FIG. 5 is a block diagram illustrating an example Dragonnet variational auto encoder (VAE) with multiple different branches, according to some embodiments. FIG. 6 is a flow diagram illustrating a method for automatically updating a control group, according to some embodiments. FIG. 7 is a block diagram illustrating an example computing device, according to some embodiments. DETAILED DESCRIPTION Traditionally, control groups have been used during experimentation for comparison purposes to test the overall effectiveness of a new feature, characteristic, drug, etc. being introduced to an experimental group. As such, the accuracy of such experimentation depends on the representativeness of control group of examples relative to an overall population of examples. In the context of machine learning, control groups may be used to both train and test the overall accuracy of a machine learning model. Over time, however, a control group representing a given population (e.g., of users, transactions, patients, etc.) may no longer be representative of the overall population. For example, populations are generally temporal in nature and, as such, change with time. As one specific example, a population of transactions may increase in volume (e.g., during holiday months, the number of online electronic transactions increases significantly relative to non-holiday months), the types of transactions being conducted may change, etc. In addition to becoming less representative over time, in some situations, control groups may introduce loss. In the context of online electronic transactions, as the overall population of transactions grows, the potential for loss associated with transactions that are included in the control group for this overall population increases. For example, because fraudulent transactions are often included in the control group (to represent that fraudulent transactions occur within the overall population) and because transactions included in the control group are automatically approved (authorized to proceed), these transactions cause a system processing such transactions to incur loss (e.g., financial loss). In this example, if one or more fraudulent transactions included in the control group are for a high dollar amount relative to other transactions, these transactions cause the transaction processing system to incur even greater loss than if such transactions were for a lower dollar amount. The disclosed techniques use machine learning techniques to automatically generate and update control groups such that these control groups accurately represent the overall population they are intended to represent. In addition, while updating a control group, the system selects examples for the control group based on a particular feature. In the context of online electronic transactions, the system selects transactions for a control group based on a dollar amount feature in addition to selecting transactions that are highly representative of the overall transaction population to avoid loss associated with this feature. In particular, the disclosed techniques combine a neural network (e.g., a Dragonnet) with a VAE to learn the feature distribution of both a current set of control group (CG) of transactions and non-control group (non-CG) transactions (the rest of the transaction p