Search

US-20260127426-A1 - FOUNDATIONAL GENERATIVE PRE-TRAINED TRANSFORMER MODEL WITH TIME-PRESERVING ENCODINGS

US20260127426A1US 20260127426 A1US20260127426 A1US 20260127426A1US-20260127426-A1

Abstract

Aspects of the disclosure include foundational generative pre-trained transformer (GPT) models with time-preserving encodings and methods of using the same. A method includes assigning a token to each activity of a plurality of activities and collecting, for each entity of a plurality of entities, a sequence of activities. Time-preserving encodings are applied to the collected sequences of activities. A training set including sequences of tokens and the time-preserving encodings is created, each sequence of tokens corresponding to a respective sequence of activities for an entity. A foundational GPT model is trained, using the training set, to generate an activity sequence embedding. During training, the positional encodings preserve a relative order of the input sequence of tokens and an amount of time between each input token in the input sequence of tokens.

Inventors

  • Kun Qiu
  • Beibei Wang
  • Shubham Agarwal
  • Osaid Rehman Nasir

Assignees

  • MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date
20260507
Application Date
20241101

Claims (20)

  1. 1 . A method comprising: assigning a token to each activity of a plurality of activities; collecting, for each entity of a plurality of entities, a sequence of activities; applying a transformation to the collected sequences of activities, the transformation comprising an insertion of time-preserving encodings into the collected sequences of activities; creating a training set comprising sequences of tokens and the time-preserving encodings, each sequence of tokens corresponding to a respective sequence of activities for an entity; and training, using the training set, a model to generate an activity sequence embedding from an input sequence of tokens, wherein, during training, the time-preserving encodings encode a relative order of the input sequence of tokens and an amount of time between each input token in the input sequence of tokens.
  2. 2 . The method of claim 1 , wherein, during training, the time-preserving encodings comprise positional embedding values derived from a continuous time space that are concatenated to tokens in the input sequence of tokens.
  3. 3 . The method of claim 1 , wherein, during training, the time-preserving encodings comprise a plurality of time-preserving tokens that are inserted among the input sequence of tokens, each time-preserving token encoding a predetermined time duration.
  4. 4 . The method of claim 1 , further comprising: during an inference phase, receiving a first sequence of activities for a first entity; generating, during the inference phase, a first sequence of tokens corresponding to the first sequence of activities by replacing each activity in the first sequence of activities with the respective token assigned to the activity; inputting, during the inference phase, the first sequence of tokens to the model; and receiving, during the inference phase, an output from the model, the output comprising a first activity embedding for the first sequence of activities.
  5. 5 . The method of claim 4 , further comprising: training a secondary system to generate malicious activity predictions from input activity embeddings; inputting the first activity embedding to the secondary system; and generating, by the secondary system, a first malicious activity prediction.
  6. 6 . The method of claim 1 , wherein each sequence of tokens is generated by replacing each activity in a respective sequence of activities with the respective token assigned to the activity.
  7. 7 . The method of claim 6 , wherein the model comprises a foundational generative pretrained transformer (GPT).
  8. 8 . A system comprising a memory, computer readable instructions, and one or more circuitry for executing the computer readable instructions, the computer readable instructions controlling the one or more circuitry to perform operations comprising: assign a token to each activity of a plurality of activities; collect, for each entity of a plurality of entities, a sequence of activities; apply a transformation to the collected sequences of activities, the transformation comprising an insertion of time-preserving encodings into the collected sequences of activities; create a training set comprising sequences of tokens and the time-preserving encodings, each sequence of tokens corresponding to a respective sequence of activities for an entity; and train, using the training set, a model to generate an activity sequence embedding from an input sequence of tokens, wherein, during training, the time-preserving encodings encode a relative order of the input sequence of tokens and an amount of time between each input token in the input sequence of tokens.
  9. 9 . The system of claim 8 , wherein, during training, the time-preserving encodings comprise positional embedding values derived from a continuous time space that are concatenated to tokens in the input sequence of tokens.
  10. 10 . The system of claim 8 , wherein, during training, the time-preserving encodings comprise a plurality of time-preserving tokens that are inserted among the input sequence of tokens, each time-preserving token encoding a predetermined time duration.
  11. 11 . The system of claim 8 , further comprising: during an inference phase, receive a first sequence of activities for a first entity; generate, during the inference phase, a first sequence of tokens corresponding to the first sequence of activities by replacing each activity in the first sequence of activities with the respective token assigned to the activity; input, during the inference phase, the first sequence of tokens to the model; and receive, during the inference phase, an output from the model, the output comprising a first activity embedding for the first sequence of activities.
  12. 12 . The system of claim 11 , further comprising: train a secondary system to generate malicious activity predictions from input activity embeddings; input the first activity embedding to the secondary system; and generate, by the secondary system, a first malicious activity prediction.
  13. 13 . The system of claim 8 , wherein each sequence of tokens is generated by replacing each activity in a respective sequence of activities with the respective token assigned to the activity.
  14. 14 . The system of claim 8 , wherein the model comprises a foundational generative pretrained transformer (GPT).
  15. 15 . A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by one or more circuitry to cause the one or more circuitry to perform operations comprising: assign a token to each activity of a plurality of activities; collect, for each entity of a plurality of entities, a sequence of activities; apply a transformation to the collected sequences of activities, the transformation comprising an insertion of time-preserving encodings into the collected sequences of activities; create a training set comprising sequences of tokens and the time-preserving encodings, each sequence of tokens corresponding to a respective sequence of activities for an entity; and train, using the training set, a model to generate an activity sequence embedding from an input sequence of tokens, wherein, during training, the time-preserving encodings encode a relative order of the input sequence of tokens and an amount of time between each input token in the input sequence of tokens.
  16. 16 . The computer program product of claim 15 , wherein, during training, the time-preserving encodings comprise positional embedding values derived from a continuous time space that are concatenated to tokens in the input sequence of tokens.
  17. 17 . The computer program product of claim 15 , wherein, during training, the time-preserving encodings comprise a plurality of time-preserving tokens that are inserted among the input sequence of tokens, each time-preserving token encoding a predetermined time duration.
  18. 18 . The computer program product of claim 15 , further comprising: during an inference phase, receive a first sequence of activities for a first entity; generate, during the inference phase, a first sequence of tokens corresponding to the first sequence of activities by replacing each activity in the first sequence of activities with the respective token assigned to the activity; input, during the inference phase, the first sequence of tokens to the model; and receive, during the inference phase, an output from the model, the output comprising a first activity embedding for the first sequence of activities.
  19. 19 . The computer program product of claim 18 , further comprising: train a secondary system to generate malicious activity predictions from input activity embeddings; input the first activity embedding to the secondary system; and generate, by the secondary system, a first malicious activity prediction.
  20. 20 . The computer program product of claim 19 , wherein each sequence of tokens is generated by replacing each activity in a respective sequence of activities with the respective token assigned to the activity.

Description

INTRODUCTION The subject disclosure relates to machine learning and artificial intelligence, and specifically to a foundational generative pre-trained transformer (GPT) model with time-preserving encodings for detecting malicious activities in online platforms. A BRIEF DESCRIPTION OF THE DRAWINGS The specifics of the exclusive rights described herein are particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other features and advantages of the embodiments of the present disclosure are apparent from the following detailed description taken in conjunction with the accompanying drawings in which: FIG. 1 depicts a block diagram for a foundational generative pre-trained transformer (GPT) model with time-preserving encodings in accordance with one or more embodiments; FIG. 2 depicts an example transformer-type implementation for a foundational GPT model with time-preserving encodings in accordance with one or more embodiments; FIG. 3 depicts an example tokenization of activity data with time-preserving tokens to preserve timing between activities in accordance with one or more embodiments; FIG. 4 depicts an example positional encoding over continuous time to preserve timing between activities in accordance with one or more embodiments; FIG. 5 depicts a block diagram of a process for leveraging a foundational GPT model with time-preserving encodings at inference to generate labels in accordance with one or more embodiments; FIG. 6 depicts a block diagram of a computer system in accordance with one or more embodiments; and FIG. 7 depicts a flowchart of a method in accordance with one or more embodiments. The diagrams depicted herein are illustrative. There can be many variations to the diagram or the operations described therein without departing from the spirit of this disclosure. For instance, the actions can be performed in a differing order or actions can be added, deleted or modified. In the accompanying figures and following detailed description of the described embodiments of this disclosure, the various elements illustrated in the figures are provided with two or three-digit reference numbers. With minor exceptions, the leftmost digit(s) of each reference number corresponds to the figure in which its element is first illustrated. DETAILED DESCRIPTION Overview Online platforms such as connections networks face significant challenges in detecting and preventing malicious or otherwise abusive activities in-network, such as fake accounts, account takeovers, and data scraping. Traditional methods for detecting these activities often rely on manually crafted features and heuristic-based detection architectures which have native limitations in processing long sequences of user activities and understanding complex behavioral patterns in those sequences, limiting their effectiveness in identifying sophisticated abuse tactics. Recent advancements in artificial intelligence and machine learning offer new opportunities to address these limitations. For example, recurrent neural networks (RNNs) and long short-term memory (LSTM) architectures can be more capable at capturing long-range dependencies and relationships within sequences, enabling a relatively deeper understanding of user behavior over extended timeframes. Unfortunately, these types of solutions require individual model training schemes for each specific use case (e.g., abuse detection, account takeovers, phishing, etc.), which is resource-intensive and time-consuming. There is a need for a more scalable and reusable approach to model user activities for detecting malicious activities in online platforms and connections networks. This disclosure introduces a foundational generative pre-trained transformer (GPT) model with time-preserving encodings for detecting malicious activities in online platforms. The foundational GPT model described herein differs significantly from conventional large language models (LLMs) in terms of its architecture, training, and application. Conventional LLMs learn to understand and generate human language by processing large corpora of text. Each word or sub-word in the text is converted into a token, and the model learns the relationships and dependencies between these tokens to generate coherent and contextually appropriate text. The positional encodings in these models capture the relative order of words in a sentence, but they do not account for the actual time intervals between words, as the focus is on the syntactic and semantic structure of the language. In contrast, the foundational GPT model described herein is trained directly on activity sequences rather than on word or sub-word tokens. Specifically, each type of user activity on a network, such as logging in to the network, viewing a profile, or sending a message, is treated as a separate token. The foundational GPT model is trained on these tokens (also referred to as non-text activity tokens, or simply