US-12626260-B2 - Training a machine learning system for transaction data processing
Abstract
A method of training a supervised machine learning system to detect anomalies within transaction data is described. The method includes obtaining a training set of data samples; assigning a label indicating an absence of an anomaly to unlabelled data samples in the training set; partitioning the data of the data samples in the training set into two feature sets, a first feature set representing observable features and a second feature set representing context features; generating synthetic data samples by combining features from the two feature sets that respectively relate to two different uniquely identifiable entities; assigning a label indicating a presence of an anomaly to the synthetic data samples; augmenting the training set with the synthetic data samples; and training a supervised machine learning system with the augmented training set and the assigned labels.
Inventors
- Kenny Wong
- David Sutton
- Iker Perez
- Alec Barns-Graham
Assignees
- Featurespace Limited
Dates
- Publication Date
- 20260512
- Application Date
- 20240911
Claims (10)
- 1 . A method of training a supervised machine learning system to detect anomalies within transaction data stored in a first data storage device using unlabelled data samples of the transaction data, the supervised machine learning system having a binary classifier that outputs a value within a predefined range representing a likelihood of an anomaly, the method comprising: obtaining, using a payment processor server coupled with the machine learning system, a training set of data samples, each data sample being derived, at least in part, from the transaction data and being associated with one of a set of uniquely identifiable entities, the set of uniquely identifiable entities comprising unique identifiers stored in a second data storage device and relating to a user or a merchant, wherein at least a portion of the training set is unlabelled in which data of said portion of the training set comprises feature vectors that do not have an assigned anomaly label and are thus not indicated as being associated with either normal behaviour or anomalous behaviour; assigning, using the payment processor server, a label indicating an absence of an anomaly to the unlabelled data samples in the training set and associating the unlabelled data samples with normal behaviour of data that is expected during transaction processing; partitioning, using the payment processor server, the data of the data samples in the training set into two feature sets, a first feature set representing observable features and a second feature set representing context features, the observable features being derived from a function of at least transaction data for a current transaction, the context features being derived from one or more of a function of historical transaction data that excludes the current transaction and retrieved data relating to the uniquely identifiable entity for the current transaction; generating, using the payment processor server, synthetic data samples by combining features from the two feature sets that respectively relate to two different entities of the set of uniquely identifiable entities; assigning, using the payment processor server, a label indicating a presence of an anomaly to the synthetic data samples; augmenting, using the payment processor server, the training set with the synthetic data samples; training, using the payment processor server, a supervised machine learning system with the augmented training set and the assigned labels to determine a set of parameters for the supervised machine learning system, wherein the trained supervised machine learning system is configured to output the value indicative of a presence of an anomaly when supplied with a new data sample by the payment processor server using the set of parameters, thereby training the binary classifier using the unlabelled data samples.
- 2 . The method of claim 1 , wherein the observable features are derived from a function of transaction data within a predefined temporal window for the current transaction and wherein the context features are derived from transaction data outside of the predefined temporal window.
- 3 . The method of claim 2 , wherein at least one of the observable features comprise comprises aggregate metrics computed from transaction data for a predefined time period that is defined in relation to a time of the current transaction.
- 4 . The method of claim 3 , wherein the step of obtaining, using the payment processor server coupled with the machine learning system, with the at least one server the training set of data samples comprises, for a given data sample: obtaining transaction data for the current transaction, the transaction data comprising an identifier for a uniquely identifiable entity; and obtaining look-up data for the uniquely identifiable entity; wherein the obtained transaction data is used to derive the first feature set and the obtained look-up data is used to derive the second feature set.
- 5 . The method of claim 4 , wherein the step of obtaining, using the payment processor server coupled with the machine learning system, with the at least one server the training set of data samples further comprises, for the given data sample: obtaining transaction data for the uniquely identifiable entity for the predefined temporal window; and computing at least one aggregated metric from the transaction data, wherein the at least one aggregated metric is used to derive the first feature set.
- 6 . The method of claim 4 , wherein the step of obtaining, using the payment processor server coupled with the machine learning system, with the at least one server the training set of data samples further comprises, for the given data sample: obtaining the historical transaction data for the uniquely identifiable entity, the historical transaction data comprising transaction data that is outside the predefined temporal window; and computing at least one aggregated metric from the historical transaction data, wherein at least one aggregated metric is used to derive the second feature set.
- 7 . The method of claim 1 , wherein the second feature set comprises metadata associated with a corresponding uniquely identifiable entity.
- 8 . The method of claim 1 , wherein the supervised machine learning system comprises a binary classifier that outputs a value within a predefined range representing a likelihood of an anomaly and wherein the labels indicating the absence or presence of an anomaly comprise two numeric values representing a binary output.
- 9 . The method of claim 1 , wherein the supervised machine learning system comprises an ensemble system based on a set of decision trees.
- 10 . The method of claim 1 , wherein the supervised machine learning system comprises a recurrent neural network.
Description
CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation of U.S. patent application Ser. No. 17/420,159, filed Jul. 1, 2021, which is a national phase application under 35 U.S.C. § 371 of International Application No. PCT/EP2021/063767, filed May 24, 2021, which claims priority to U.S. Provisional Application No. 63/049,873, filed Jul. 9, 2020. The contents of each of the above-identified applications is hereby fully incorporated herein by reference. TECHNICAL FIELD The present invention relates to systems and methods for applying machine learning systems to transaction data. Certain examples relate to a machine learning system for use in real-time transaction processing. Certain examples relate to a method for training a machine learning system for use in real-time transaction processing. BACKGROUND Digital payments have exploded over the last twenty years, with more than three-quarters of global payments using some form of payment card or electronic wallet. Point of sale systems are progressively becoming digital rather than cash based. Put simply, global systems of commerce are now heavily reliant on electronic data processing platforms. This presents many engineering challenges that are primarily hidden from a lay user. For example, digital transactions need to be completed in real-time, i.e. with a minimal level of delay experienced by computer devices at the point of purchase. Digital transactions also need to be secure and resistant to attack and exploitation. The processing of digital transactions is also constrained by the historic development of global electronic systems for payments. For example, much infrastructure is still configured around models that were designed for mainframe architectures in use over 50 years ago. As digital transactions increase, new security risks also become apparent. Digital transactions present new opportunities for fraud and malicious activity. In 2015, it was estimated that 7% of digital transactions were fraudulent, and that figure has only increased with the transition of more economic activity online. Fraud losses are estimated to be four times the population of the world (e.g., in US dollars) and are growing. While risks like fraud are an economic issue for companies involved in commerce, the implementation of technical systems for processing transactions is an engineering challenge. Traditionally, banks, merchants and card issuers developed “paper” rules or procedures that were manually implemented by clerks to flag or block certain transactions. As transactions became digital, one approach to building technical systems for processing transactions has been to supply computer engineers with these sets of developed criteria and to ask the computer engineers to implement them using digital representations of the transactions, i.e. convert the hand-written rules into coded logic statements that may be applied to electronic transaction data. This traditional approach has run into several problems as digital transaction volumes have grown. First, any applied processing needs to take place at “real-time”. e.g. with millisecond latencies. Second, many thousands of transactions need to be processed every second (e.g., a common “load” may be 1000-2000 per second), with load varying unexpectedly over time (e.g., a launch of anew product or a set of tickets can easily increase an average load level by several multiples). Third, the digital storage systems of transaction processors and banks are often siloed or partitioned for security reasons, yet digital transactions often involve an interconnected web of merchant systems. Fourthly, large scale analysis of actual reported fraud and predicted fraud is now possible. This shows that traditional approaches to fraud detection are found wanting; accuracy is low and false positives are high. This then has a physical effect on digital transaction processing, more genuine point-of-sale and online purchases are declined and those seeking to exploit the new digital systems often get away with it. In the last few years, a more machine learning approach has been taken to the processing of transaction data. As machine learning models mature in academia, engineers have begun to attempt to apply them to the processing of transaction data. However, this again runs into problems. Even if engineers are provided with an academic or theoretical machine learning model and asked to implement it, this is not straightforward. For example, the problems of large-scale transaction processing systems come into play. Machine learning models do not have the luxury of unlimited inference time as in the laboratory. This means that it is simply not practical to implement certain models in a real-time setting, or that they need significant adaptation to allow real-time processing in the volume levels experienced by real-world servers. Moreover, engineers need to contend with the problem of implementing machine learning models on data that