US-12625859-B2 - Methods and systems for detecting spurious data patterns

US12625859B2US 12625859 B2US12625859 B2US 12625859B2US-12625859-B2

Abstract

Disclosed are implementations that include a method for detecting anomalous data, including converting a set of data values representative of a multi-dimensional item into a nodes-and-edges graph representation of the item, applying a graph convolution process to the graph representation to generate a transformed graph representation for the item comprising a resultant transformed configuration of the nodes and edges representing the item, and determining, based on the transformed configuration, a probability that the item is anomalous. Another example method includes receiving input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers. The method further includes removing one or more of the weighted connections at one or more time instances.

Inventors

Louizos Alexandros Louizos
Ayaan Chaudhry
Gary Plunkett
Oliver Clark
Cathy Ross
R. Whitney Anderson

Assignees

Fraud.net, Inc.

Dates

Publication Date: 20260512
Application Date: 20240305

Claims (20)

1 . A method for detection and classification of data, the method comprising: transforming by a plurality of multi-layer perceptrons (MLP's) each 1-dimensional data value of a multi-field record into a respective multi-dimensional vector of a plurality of multi-dimensional vectors, with each data value of each field from the multi-field record being transformed by a respective different MLP, from the plurality of MLP's, that was separately trained to transform the respective field from the multi-field record; converting the plurality of respective multi-dimensional vectors into a first graph representation comprising nodes and edges, wherein each of the plurality of the respective multi-dimensional vectors corresponds to a respective one of the nodes in the first graph representation, with the respective one of the nodes including value information representing feature information for the respective data value and positional information representing a relationship of the respective one of the nodes to other of the nodes in the first graph representation; transforming by a machine learning graph convolution system the first graph representation to a second, transformed, graph representation for the multi-field record, the second graph representation comprising a transformed configuration of transformed nodes and edges in the second graph representation in which the transformed nodes and edges are organized into a clustering configuration indicative of anomality of the nodes and edges of the first graph representation and of the multi-field record, wherein the machine learning graph convolution system comprises at least one edge MLP trained to transform each edge of the first graph representation into a transformed edge, at least one node MLP to transform each node of the first graph representation to a transformed node, and at least one global state MLP to generate a global state for the transformed configuration; and determining, based on the transformed configuration of the transformed nodes and edges of the second graph representation with the clustering configuration indicative of the anomality of the nodes and edges of the first graph representation and of the multi-field record, a probability that the multi-field record is anomalous.
2 . The method of claim 1 , wherein determining the probability that the multi-field record is anomalous comprises: processing the transformed configuration of the transformed nodes and edges for the multi-field record with a global attention module to generate a resultant vector of values; and applying a softmax module to the resultant vector of values to derive the probability that the multi-field record is anomalous.
3 . The method of claim 2 , wherein processing the transformed configuration of the transformed nodes and edges comprises: deriving an output node vector, V output , as an average of weighted products of vector representations of the transformed nodes according to: V output ( v 1 , v 2 , … , v d ) = w a ( a 1 , a 2 , … , a d ) + w b ( b 1 , b 2 , … , b d ) + w n ( n 1 , n 2 , … , n d ) n , where V output is the output node vector, each of a, b, . . . , n is one of the transformed nodes of the transformed graph representation, and w a , . . . w n are the respective weights applied to the d-dimensional vector representation of each of the transformed nodes.
4 . The method of claim 1 , wherein transforming by the machine learning graph convolution system the first graph representation comprises, for a particular edge of the edges of the first graph representation: generating a corresponding edge vector based on an edge value representing the particular edge, node values representative of a respective source node and destination node in the first graph representation for the particular edge, and a current global state vector associated with the first graph representation; and applying the edge MLP to the edge vector to generate a resultant transformed edge corresponding to the particular edge.
5 . The method of claim 1 , wherein transforming by the machine learning graph convolution system the first graph representation comprises, for a particular node of the nodes of the first graph representation: generating a corresponding node vector based on the particular node of the first graph representation and edges in the first graph representation connected to the particular node; and applying the node MLP to the corresponding node vector to generate a resultant transformed node corresponding to the particular node.
6 . The method of claim 1 wherein transforming by the machine learning graph convolution system the first graph representation comprises: generating an average node vector averaging the node representations of the nodes of the first graph representation; generating a global composite vector based on the average node vector and a current global state vector; and providing the global composite vector to the global state MLP to generate a resultant transformed global state vector.
7 . The method of claim 1 , wherein the machine learning graph convolution system comprises at least one graph neural network system.
8 . The method of claim 1 , further comprising: performing preprocessing on a received raw data record to produce the multi-field record, including performing one or more of: Gaussian normalization applied to the received raw data record, or removing one or more data elements of the received raw data record based on at least one of: entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, or a low-effect size associated with the one or more data elements.
9 . The method of claim 8 , wherein removing one or more data elements comprises: identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, wherein the adjustable threshold number is adjusted based on likelihood of occurrence of anomalous values for the particular data element; and removing from runtime data records the particular data element identified as the rare element.
10 . A system for detection and classification of data, the system comprising: an input stage to receive one or more input data records; and a controller, implementing one or more learning engines, in communication with one or more memory devices storing programmable instructions, configured to: transform, by a plurality of multi-layer perceptrons (MLP's) of the controller, each 1-dimensional data value of a multi-field record into a respective multi-dimensional vector of a plurality of multi-dimensional vectors, with each data value of each field from the multi-field record being transformed by a respective different MLP, from the plurality of MLP's, that was separately trained to transform the respective field from the multi-field record; convert the plurality of respective multi-dimensional vectors into a first graph representation comprising nodes and edges, wherein each of the plurality of the respective multi-dimensional vectors corresponds to a respective one of the nodes in the first graph representation, with the respective one of the nodes including value information representing feature information for the respective data value and positional information representing a relationship of the respective one of the nodes to other of the nodes in the first graph representation; transform by a machine learning graph convolution system the first graph representation to a second, transformed, graph representation for the multi-field record, the second graph representation comprising a transformed configuration of transformed nodes and edges in the second graph representation in which the transformed nodes and edges are organized into a clustering configuration indicative of anomality of the nodes and edges of the first graph representation and of the multi-field record, wherein the machine learning graph convolution system comprises at least one edge MLP trained to transform each edge of the first graph representation into a transformed edge, at least one node MLP to transform each node of the first graph representation to a transformed node, and at least one global state MLP to generate a global state for the transformed configuration; and determine, based on the transformed configuration of the transformed nodes and edges of the second graph representation with the clustering configuration indicative of anomality of the nodes and edges of the first graph representation and of the multi-field record, a probability that the multi-field record is anomalous.
11 . The system of claim 10 , wherein the controller configured to determine the probability that the multi-field record is anomalous is configured to: process the transformed configuration of the transformed nodes and edges for the multi-field record with a global attention module to generate a resultant vector of values; and apply a softmax module to the resultant vector of values to derive the probability that the multi-field record is anomalous.
12 . The system of claim 11 , wherein the controller configured to process the transformed configuration of the transformed nodes and edges is configured to: derive an output node vector, V output , as an average of weighted products of vector representations of the transformed nodes according to: V output ( v 1 , v 2 , … , v d ) = w a ( a 1 , a 2 , … , a d ) + w b ( b 1 , b 2 , … , b d ) + w n ( n 1 , n 2 , … , n d ) n , where V output is the output node vector, each of a, b, . . . , n is one of the transformed nodes of the transformed graph representation, and w a , . . . w n are the respective weights applied to the d-dimensional vector representation of each of the transformed nodes.
13 . The system of claim 10 , wherein the controller configured to transform by the machine learning graph convolution system the first graph representation is configured, for a particular edge of the edges of the first graph representation: generate a corresponding edge vector based on an edge value representing the particular edge, node values representative of a respective source node and destination node in the first graph representation for the particular edge, and a current global state vector associated with the first graph representation; and apply the at least one edge MLP to the edge vector to generate a resultant transformed edge corresponding to the particular edge.
14 . The system of claim 10 , wherein the controller configured to transform by the machine learning graph convolution system the first graph representation is configured to, for a particular node of the nodes of the first graph representation: generate a corresponding node vector based on the particular node of the first graph representation and edges in the first graph representation connected to the particular node; and apply the at least one node MLP to the corresponding node vector to generate a resultant transformed node corresponding to the particular node.
15 . The system of claim 10 , wherein the controller configured to transform by the machine learning graph convolution system the first graph representation is configured to: generate an average node vector averaging the node representations of the nodes of the first graph representation; generate a global composite vector based on the average node vector and a current global state vector; and provide the global composite vector to the global state MLP to generate a resultant transformed global state vector.
16 . The system of claim 10 , wherein the controller configured to transform by the machine learning graph convolution system the first graph representation is configured to apply the graph convolution process using at least one graph neural network system.
17 . The system of claim 10 , wherein the controller is further configured to: perform preprocessing on a received raw data record to produce the multi-field record, including to perform one or more of: Gaussian normalization applied to the received raw data record, or remove one or more data elements of the received raw data record based on at least one of: entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, or a low-effect size associated with the one or more data elements.
18 . The system of claim 17 , wherein the controller configured to remove one or more data elements is configured to: identify a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, wherein the adjustable threshold number is adjusted based on likelihood of occurrence of anomalous values for the particular data element; and remove from runtime data records the particular data element identified as the rare element.
19 . A non-transitory computer readable media storing a set of instructions, executable on at least one programmable device, to: transform by a plurality of multi-layer perceptrons (MLP's) each 1-dimensional data value of a multi-field record into a respective multi-dimensional vector of a plurality of multi-dimensional vectors, with each data value of each field from the multi-field record being transformed by a respective different MLP, from the plurality of MLP's, that was separately trained to transform the respective field from the multi-field record; convert the plurality of respective multi-dimensional vectors into a first graph representation comprising nodes and edges, wherein each of the plurality of the respective multi-dimensional vectors corresponds to a respective one of the nodes in the first graph representation, with the respective one of the nodes including value information representing feature information for the respective data value and positional information representing a relationship of the respective one of the nodes to other of the nodes in the first graph representation; transform a machine learning graph convolution system the first graph representation to a second, transformed, graph representation for the multi-field record, the second graph representation comprising a transformed configuration of transformed nodes and edges in the second graph representation in which the transformed nodes and edges are organized into a clustering configuration indicative of anomality of the nodes and edges of the first graph representation and of the multi-field record, wherein the machine learning graph convolution system comprises at least edge MLP trained to transform each edge of the first graph representation into a transformed edge, at least one node MLP to transform each node of the first graph representation to a transformed node, and at least one global state MLP to generate a global state for the transformed configuration; and determine, based on the transformed configuration of the transformed nodes and edges of the second graph representation with the clustering configuration indicative of anomality of the nodes and edges of the first graph representation and of the multi-field record, a probability that the multi-field record is anomalous.
20 . The computer readable media of claim 19 , wherein the set of instructions to determine the probability that the multi-field record is anomalous comprises one or more instructions to: process the transformed configuration of the transformed nodes and edges for the multi-field record with a global attention module to generate a resultant vector of values; and apply a softmax module to the resultant vector of values to derive the probability that the multi-field record is anomalous.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 17/100,195, filed Nov. 20, 2020, which claims priority to, and the benefit of, U.S. Provisional Application No. 62/939,236 entitled “METHODS AND SYSTEMS FOR DETECTING SPURIOUS DATA PATTERNS,” and filed Nov. 22, 2019, the content of all of which are incorporated herein by reference in their entireties. BACKGROUND The ever-growing volume of electronic business and economic activity has been accompanied by a similar sharp increase in fraudulent and harmful electronic activity. Being able to robustly detect rare data patterns is beneficial in cases where anomalous behavior needs to be detected (e.g., through detection of data outliers) to prevent damage to devices or fraud in financial transactions. SUMMARY There is a need for robust detectors of spurious data patterns among a stream of data whether data source is sensors, financial transactions or server logs. In the present disclosure, a method and an apparatus for empowering the robust, fast and real time detection of spurious signals in data with a novel method and a device using a graph network methodology and/or a novel neural network topology is described. Analytical description of data preprocessing before data are fed as input to the system is also described. Disclosed are systems, methods, and other implementations to identify outlier data records from a set of records processed by a learning machine. Examples of such records may be transaction records (e.g., credit card records), with respect to which a learning system is configured to detect anomalous (outlying) activity or behavior. Such anomalous activity may be indicative of possible fraudulent activity. In the present disclosure, methods are described for using a neural network, or other types of learning machines, with specific configurations structured to robustly detect outliers in data streams. The novel neural network architectures of the present disclosure can be combined with a unique data preprocessing methodology to reduce the dimensionality of the input data based on specific data filters that maximize the entropy of the input data. In some embodiments, the implementations described herein use graph network topologies and processing to identify outliers or anomalous data. A method is thus provided to combine graph networks topology of the processed data with a neural network to automatically cluster data in a topological way that separates the spurious data patterns from normal data flow. The example implementations also include apparatus comprising the neural networks (or other types of learning machines), the topological graphs, and the neural network filters described herein. The example implementations additionally include non-transitory computer-readable medium having program code recorded thereon for filtering the input streaming data according to the preprocessing parameters and forwarding this data to the filtering neural network and the graph based topological calculator and neural network. The medium may include program code to, when executed by a processor, select at least one moment of an input of the data, along with the execution of the neural networks and graph topological transformers. The methods and apparatus of the present disclosure include engineered features that are created/generated from the base streaming data. The engineered features measure many aspects of the data instance and may or may not be interesting or germane to human-based analysis. Inputting these features to the neural network modules might or might not supply them with data relationships humans intuitively find interesting. The methods and apparatus include a methodology, device and code to flag specific patterns in data for potential review from a human reviewers (for example, in the case of a financial transaction) if a feature such as comparing the distance between billing and shipping addresses for a transaction is above a certain threshold, the transaction will be automatically flagged for potential review. In some variations, a method for robust detection and classification of data outliers is provided. The method includes converting a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges, applying a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional items comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item, and determining, based on the transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous. Embodiments of the method may include at least some of the features described in the present disclosure, including one or