US-12625949-B2 - Systems and methods for generating explainability for user classifications using motif embeddings

US12625949B2US 12625949 B2US12625949 B2US 12625949B2US-12625949-B2

Abstract

Systems and methods for generating explainability for user classifications using motif embeddings. In some aspects, the system generates a first set of embeddings in an embedding space by inputting, to a motif encoder model, a set of event motifs associated with a set of labels. The system generates a second set of embeddings for a first user by providing, to the motif encoder model, an event sequence of the first user. The system selects a set of search anchor embeddings by determining a first set of distances between the second set of embeddings and the first set of embeddings. The system queries the embedding space to select a candidate profile for association with the first user and concurrently presents the explanatory label and a profile title associated with the candidate profile.

Inventors

Samuel Sharpe

Assignees

CAPITAL ONE SERVICES, LLC

Dates

Publication Date: 20260512
Application Date: 20240909

Claims (18)

1 . A system for using event motif embeddings as search anchors to perform an explainable search for malicious activities based on a user event sequence, the system comprising one or more non-transitory, machine-readable media storing program instructions that, when executed by one or more processors, cause operations comprising: obtaining a first occurrence count of a candidate motif and a second occurrence count of the candidate motif, the first occurrence count being based on an event history of a first training record associated with a user label indicating malicious behavior, and the second occurrence count being based on a set of event histories of a set of training records not associated with the user label; generating a set of event motifs to comprise the candidate motif based on a result indicating a threshold being satisfied by a ratio derived from the first occurrence count and the second occurrence count; generating a set of target motif embeddings in an event sequence embedding space representing event motifs by inputting, to a motif encoder model, the set of event motifs associated with a set of explanatory behavior labels indicating a malicious behavior category; generating a set of user embeddings for a first user by providing, to the motif encoder model, an event sequence of the first user; selecting a set of search anchor embeddings by determining a first set of distances between the set of user embeddings and the set of target motif embeddings; querying the event sequence embedding space to select a candidate profile for association with the first user by (i) determining a second set of distances between the set of search anchor embeddings and a set of candidate profile embeddings comprising a candidate profile embedding associated with the candidate profile and (ii) ranking the second set of distances to select an explanatory embedding of the set of search anchor embeddings, the explanatory embedding mapped to an explanatory label of the set of explanatory behavior labels; and concurrently presenting the explanatory label and a profile title associated with the candidate profile.
2 . A method comprising: obtaining a first occurrence count of a candidate motif and a second occurrence count of the candidate motif, the first occurrence count being based on an event history of a first training record associated with a user label indicating malicious behavior, and the second occurrence count being based on a set of event histories of a set of training records not associated with the user label; generating a set of event motifs to comprise the candidate motif based on a result indicating a threshold being satisfied by a ratio derived from the first occurrence count and the second occurrence count; generating a first set of embeddings in an embedding space by inputting, to a motif encoder model, the set of event motifs associated with a set of labels; generating a second set of embeddings for a first user by providing, to the motif encoder model, an event sequence of the first user; selecting a set of search anchor embeddings by determining a first set of distances between the second set of embeddings and the first set of embeddings; querying the embedding space to select a candidate profile for association with the first user by (i) determining a second set of distances between the set of search anchor embeddings and a set of candidate profile embeddings comprising a candidate profile embedding associated with the candidate profile and (ii) ranking the second set of distances to select an explanatory embedding of the set of search anchor embeddings, the explanatory embedding mapped to an explanatory label of the set of labels; and concurrently presenting the explanatory label and a profile title associated with the candidate profile.
3 . The method of claim 2 , further comprising: determining the first occurrence count of the candidate motif based on the event history of the first training record associated with the user label indicating malicious behavior; and determining the second occurrence count of the candidate motif based on the set of event histories of the set of training records not associated with the user label, wherein generating the set of event motifs comprises updating the set of event motifs to comprise the candidate motif based on the result indicating the threshold being satisfied by the ratio.
4 . The method of claim 3 , wherein the candidate motif is a first candidate motif, further comprising: determining an additional occurrence count of an additional candidate motif based on the event history; determining an additional result indicating that the additional occurrence count does not satisfy the threshold, wherein updating the set of event motifs comprises updating the set of event motifs such that the set of event motifs does not comprise the additional candidate motif based on the additional result.
5 . The method of claim 2 , wherein: generating the first set of embeddings comprises generating the explanatory label based on a first motif; and presenting the explanatory label comprises presenting the first motif.
6 . The method of claim 2 , further comprising: filtering the event history to remove a subset of events labeled with a set of event categories; detecting at least one candidate motif based on repeated event sequences of the filtered event history; and updating the set of event motifs to comprise the at least one candidate motif.
7 . The method of claim 2 , further comprising: detecting that a first candidate motif of the set of event motifs is present in a second candidate motif of the set of event motifs by skipping at least one event indicated by the second candidate motif; and removing the second candidate motif from the set of event motifs based on a detection that the first candidate motif is present in the second candidate motif.
8 . The method of claim 2 , wherein determining the second set of distances comprises: obtaining a demographic category associated with the candidate profile; and updating the second set of distances based on a distance penalty mapped to the demographic category.
9 . The method of claim 2 , wherein the candidate profile is a first candidate profile, further comprising: determining a demographic category based on the first user; retrieving a plurality of candidate profiles mapped to a superset of embeddings of the embedding space; filtering the plurality of candidate profiles to remove a second candidate profile associated with a restricted category, wherein selecting the candidate profile for association with the first user comprises ignoring at least one embedding mapped to the second candidate profile.
10 . The method of claim 2 , wherein the explanatory embedding is a first explanatory embedding, and wherein the candidate profile is a first candidate profile, and wherein the candidate profile embedding is a first candidate profile embedding, and wherein the second set of embeddings comprises a user embedding, further comprising: determining that a distance between the user embedding and a second candidate profile embedding of a second candidate profile satisfies a distance threshold; determining a third set of distances between the set of search anchor embeddings and the second candidate profile embedding; and ranking the third set of distances to select a second explanatory embedding associated with a second explanatory label; and concurrently presenting the second explanatory label and a second profile title associated with the second candidate profile.
11 . One or more non-transitory, machine-readable media storing program instructions that, when executed by one or more processors, cause operations comprising: obtaining a first occurrence count of a candidate motif and a second occurrence count of the candidate motif, the first occurrence count being based on an event history of a first training record associated with a user label indicating malicious behavior, and the second occurrence count being based on a set of event histories of a set of training records not associated with the user label; generating a set of motifs to comprise the candidate motif based on a result indicating a threshold being satisfied by a ratio derived from the first occurrence count and the second occurrence count; generating a first set of embeddings in an embedding space using an encoder model based on the set of motifs; generating a second set of embeddings using the encoder model based on an event sequence; selecting a set of anchor embeddings by determining a first set of distances between the second set of embeddings and the first set of embeddings; querying the embedding space to select a candidate profile for association with a first user by (i) determining a second set of distances between the set of anchor embeddings and a candidate profile embedding associated with the candidate profile and (ii) ranking the second set of distances to select a first embedding of the set of anchor embeddings, wherein the first embedding is mapped to a first label; and concurrently presenting the first label and a profile title associated with the candidate profile.
12 . The one or more non-transitory, machine-readable media of claim 11 , wherein: the set of motifs comprises a first subset of motifs and a second subset of motifs; the first set of embeddings comprises a first subset of motif embeddings and a second subset of motif embeddings, wherein the first subset of motif embeddings is mapped to the first subset of motifs, and wherein the second subset of motif embeddings is mapped to the second subset of motifs; and selecting the set of anchor embeddings comprises: obtaining an indication of a first category associated with the first subset of motifs via a request; and selecting the set of anchor embeddings from the first subset of motif embeddings based on the indication of the first category.
13 . The one or more non-transitory, machine-readable media of claim 11 , wherein the set of motifs comprises a first motif, the first motif comprising at least three distinct event type identifiers.
14 . The one or more non-transitory, machine-readable media of claim 11 , wherein: the set of motifs comprises a first subset of motifs and a second subset of motifs; the first set of embeddings comprises a first subset of motif embeddings and a second subset of motif embeddings, wherein the first subset of motif embeddings is mapped to the first subset of motifs, and wherein the second subset of motif embeddings is mapped to the second subset of motifs; and selecting the set of anchor embeddings comprises: selecting a first category associated with the first subset of motifs based on a category prioritization schedule; and selecting the set of anchor embeddings from the first subset of motif embeddings based on the first category.
15 . The one or more non-transitory, machine-readable media of claim 11 , wherein generating the set of motifs comprises updating the set of motifs to comprise the candidate motif based on the result indicating that the threshold being satisfied by the ratio.
16 . The one or more non-transitory, machine-readable media of claim 11 , the operations further comprising: filtering the event history to remove a subset of events labeled with a set of restricted categories; detecting at least one candidate motif based on repeated event sequences of the filtered event history; and updating the set of motifs to comprise the at least one candidate motif.
17 . The one or more non-transitory, machine-readable media of claim 11 , the operations further comprising: detecting that a first candidate motif of the set of motifs is present in a second candidate motif of the set of motifs; and removing the second candidate motif from the set of motifs based on a detection that the first candidate motif is present in the second candidate motif.
18 . The one or more non-transitory, machine-readable media of claim 11 , wherein determining the second set of distances comprises: obtaining a category associated with the candidate profile; and updating the second set of distances based on a distance penalty associated with the category.

Description

SUMMARY Methods and systems are described herein for novel uses and/or improvements to artificial intelligence applications. As one example, methods and systems are described herein for generating explainability for user classifications based on an embedding space. For example, the system may use action-sequence motif embeddings as anchors to classify users. Conventional systems for classifying users into labels are often opaque with respect to the process of matching users to labels. Such systems often use deep learning neural networks or comparable techniques to generate classifications of users, resulting in low explainability for potentially impactful decisions made based on user classifications. This is a significant drawback in the design of the machine learning architecture. By contrast, the systems and methods described herein use motif embeddings as anchors to compare to user embeddings. This approach uses a user embedding that encodes the behavior sequences of the user to identify the closest motif embedding. Motif embeddings are representative archetypes corresponding to classifications of users and offer explainability not only in terms of which classification most closely resembles a user but also identify the nature of the proximity in one or more aspects of the behavior sequences. By doing so, the system allows for in-depth comparisons from users to classification archetypes, or other users, as well as providing comprehensive explanations for decisions based on user classifications. In some aspects, methods and systems are described herein comprising generating a first set of embeddings in an embedding space by inputting, to a motif encoder model, a set of event motifs associated with a set of labels; generating a second set of embeddings for a first user by providing, to the motif encoder model, an event sequence of the first user; selecting a set of search anchor embeddings by determining a first set of distances between the second set of embeddings and the first set of embeddings; querying the embedding space to select a candidate profile for association with the first user by (i) determining a second set of distances between the set of search anchor embeddings and a set of candidate profile embeddings comprising a candidate profile embedding associated with the candidate profile and (ii) ranking the second set of distances to select an explanatory embedding of the set of search anchor embeddings, the explanatory embedding mapped to an explanatory label of the set of labels; and concurrently presenting the explanatory label and a profile title associated with the candidate profile. Various other aspects, features, and advantages of the systems and methods described herein will be apparent through the detailed description and the drawings attached hereto. It is also to be understood that both the foregoing general description and the following detailed description are examples and are not restrictive of the scope of the systems and methods described herein. As used in the specification and in the claims, the singular forms of “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. In addition, as used in the specification and the claims, the term “or” means “and/or” unless the context clearly dictates otherwise. Additionally, as used in the specification, “a portion” refers to a part of, or the entirety of (i.e., the entire portion), a given item (e.g., data) unless the context clearly dictates otherwise. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows an illustrative diagram for a system for generating explainability for user classifications using motif embeddings, in accordance with one or more embodiments. FIG. 2 shows real-valued embeddings corresponding to text tokens, in accordance with one or more embodiments. FIG. 3 shows illustrative components for generating explainability for user classifications using motif embeddings, in accordance with one or more embodiments. FIG. 4 shows a flowchart of the steps involved in generating explainability for user classifications using motif embeddings, in accordance with one or more embodiments. DETAILED DESCRIPTION In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments described herein. It will be appreciated, however, by those having skill in the art that the embodiments may be practiced without these specific details or with an equivalent arrangement. In other cases, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the embodiments. FIG. 1 shows an illustrative diagram for system 150, which contains hardware and software components used to provide responses to search queries based on adjacent keywords and filters generated using a machine learning model, in accordance with one or more embodiments. For example, Computer System 102, a part of