Search

US-12619716-B2 - Multivariate threat detection for a CI/CD pipeline

US12619716B2US 12619716 B2US12619716 B2US 12619716B2US-12619716-B2

Abstract

Example solutions protect a continuous integration/continuous deployment (CI/CD) pipeline. Examples collect data from a CI/CD pipeline execution data source and/or a CI/CD pipeline task data source. Based on the collected data, a feature group comprising a plurality of records is created. Each record in the feature group represents an execution of the CI/CD pipeline. An anomaly score is generated, using a model representing historical feature groups, for the feature group representing the execution of the CI/CD pipeline. If the anomaly score is above a threshold, an alert is generated to indicate that the collected data represents an anomalous activity.

Inventors

  • David TRIGANO
  • Moshe Israel

Assignees

  • MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date
20260505
Application Date
20231227

Claims (20)

  1. 1 . A system operable to protect a continuous integration/continuous deployment (CI/CD) pipeline, the system comprising: a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to: collect data from CI/CD pipeline execution data source and a CI/CD pipeline task data source; based on the collected data, create a feature group comprising a plurality of records, each record in the feature group representing an execution of the CI/CD pipeline; generate, using a model representing historical feature groups of the CI/CD pipeline, an anomaly score for the feature group representing the execution of the CI/CD pipeline, the historical feature groups representing previous executions of the CI/CD pipeline; determine that the anomaly score is above a threshold, indicating that the collected data represents an anomalous activity; and upon determining that the anomaly score is above the threshold, perform a security mitigation action.
  2. 2 . The system of claim 1 , wherein the instructions are further operative to: upon generating the anomaly score, update, using the feature group, the model representing the CI/CD pipeline.
  3. 3 . The system of claim 2 , wherein the data is collected at a first frequency and the model is updated at a second frequency, the first frequency being larger than the second frequency.
  4. 4 . The system of claim 1 , wherein a record in the CI/CD pipeline execution data source is created for each run of the CI/CD pipeline, the record including one or more of: a pipeline name, an identity of a user executing the CI/CD pipeline, an internet protocol (IP) address, and a date and time of executing the CI/CD pipeline.
  5. 5 . The system of claim 1 , wherein a record in the CI/CD pipeline task data source is created for each run of the CI/CD pipeline, the record including one or more of: a task name and coding language.
  6. 6 . The system of claim 1 , wherein the instructions are further operative to display an alert in a user interface (UI).
  7. 7 . The system of claim 6 , wherein the alert is displayed in a first portion of the UI or a second portion of the UI, based on a severity of the alert.
  8. 8 . A computer-implemented method for protecting a continuous integration/continuous deployment (CI/CD) pipeline, the method comprising: collecting data from one or more of a CI/CD pipeline execution data source and a CI/CD pipeline task data source; based on the collected data, creating a feature group comprising a plurality of records, each record in the feature group representing an execution of the CI/CD pipeline; generating, using a model representing historical feature groups of the CI/CD pipeline, an anomaly score for the feature group representing the execution of the CI/CD pipeline; determining that the anomaly score is above a threshold; and upon determining that the anomaly score is above the threshold, generating an alert indicating that the collected data represents an anomalous activity.
  9. 9 . The computer-implemented method of claim 8 , further comprising: upon generating the anomaly score, updating, using the feature group, the model representing the CI/CD pipeline.
  10. 10 . The computer-implemented method of claim 9 , wherein the data is collected at a first frequency and the model is updated at a second frequency, the first frequency being larger than the second frequency.
  11. 11 . The computer-implemented method of claim 8 , wherein a record in the CI/CD pipeline execution data source is created for each run of the CI/CD pipeline, the record including one or more of: a pipeline name, an identity of a user executing the CI/CD pipeline, an internet protocol (IP) address, and a date and time of executing the CI/CD pipeline.
  12. 12 . The computer-implemented method of claim 8 , wherein a record in the CI/CD pipeline task data source is created for each run of the CI/CD pipeline, the record including one or more of: a task name and coding language.
  13. 13 . The computer-implemented method of claim 8 , further comprising displaying the alert in a user interface (UI).
  14. 14 . The computer-implemented method of claim 13 , wherein the alert is displayed in a first portion of the UI or a second portion of the UI, based on a severity of the alert.
  15. 15 . A computer storage device having computer-executable instructions stored thereon, which, on execution by a computer, cause the computer to protect a continuous integration/continuous deployment (CI/CD) pipeline by performing operations comprising: collecting data from one or more of a CI/CD pipeline execution data source and a CI/CD pipeline task data source; based on the collected data, creating a feature group comprising a plurality of records, each record in the feature group representing an execution of the CI/CD pipeline; generating, using a model representing historical feature groups of the CI/CD pipeline, an anomaly score for the feature group representing the execution of the CI/CD pipeline; determining that the anomaly score is above a threshold; and upon determining that the anomaly score is above the threshold, generating an alert indicating that the collected data represents an anomalous activity.
  16. 16 . The computer storage device of claim 15 , wherein the operations further comprise: upon generating the anomaly score, updating, using the feature group, the model representing the CI/CD pipeline.
  17. 17 . The computer storage device of claim 16 , wherein the data is collected at a first frequency and the model is updated at a second frequency, the first frequency being larger than the second frequency.
  18. 18 . The computer storage device of claim 15 , wherein a record in the CI/CD pipeline execution data source is created for each run of the CI/CD pipeline, the record including one or more of: a pipeline name, an identity of a user executing the CI/CD pipeline, an internet protocol (IP) address, and a date and time of executing the CI/CD pipeline.
  19. 19 . The computer storage device of claim 15 , wherein a record in the CI/CD pipeline task data source is created for each run of the CI/CD pipeline, the record including one or more of: a task name and coding language.
  20. 20 . The computer storage device of claim 19 , wherein the operations further comprise: displaying the alert in a user interface (UI), wherein the alert is displayed in a first portion of the UI or a second portion of the UI, based on a severity of the alert.

Description

BACKGROUND A continuous integration/continuous deployment (CI/CD) pipeline is a software development approach that automates the process of integrating code changes into a shared repository, testing those changes, and deploying them to production environments quickly and regularly. The main goal of a CI/CD pipeline is to enable developers to deliver software updates more frequently and with greater reliability. This makes CI/CD pipelines very popular with many solutions and products, enabling developers to deliver the software updates. Due to the increasing popularity of CI/CD pipelines, a new threat vector has raised significant concerns for organizations relying on CI/CD pipelines. Malicious actors are now actively targeting CI/CD pipelines to exploit vulnerabilities in the automation process, potentially compromising the entire software development and deployment lifecycle. SUMMARY The disclosed examples are described in detail below with reference to the accompanying drawing figures listed below. The following summary is provided to illustrate some examples disclosed herein. Example solutions protect a continuous integration/continuous deployment (CI/CD) pipeline. Examples collect data from one or more of a CI/CD pipeline execution data source and a CI/CD pipeline task data source; based on the collected data, create a feature group comprising a plurality of records, each record in the feature group representing an execution of the CI/CD pipeline; generate, using a model representing historical feature groups of the CI/CD pipeline, an anomaly score for the feature group representing the execution of the CI/CD pipeline; determine that the anomaly score is above a threshold; and upon determining that the anomaly score is above the threshold, generate an alert indicating that the collected data represents an anomalous activity. BRIEF DESCRIPTION OF THE DRAWINGS The disclosed examples are described in detail below with reference to the accompanying drawing figures listed below: FIG. 1 illustrates an example architecture that advantageously protects continuous integration/continuous deployment (CI/CD) pipeline; FIG. 2 illustrates an exemplary flow between different components to protect a CI/CD pipeline; FIG. 3 shows a flowchart illustrating exemplary operations that may be performed in an example architecture, such as that of FIG. 1; and FIG. 4 shows a block diagram of an example computing device suitable for implementing some of the various examples disclosed herein. Corresponding reference characters indicate corresponding parts throughout the drawings. DETAILED DESCRIPTION Aspects of the disclosure provide a multivariate threat detection engine for a continuous integration/continuous deployment (CI/CD) pipeline. Examples provide a comprehensive solution that continuously identifies and analyzes activities within the CI/CD pipeline, in real-time, to detect malicious activities or other threats. Examples of the disclosure detect anomalous behavior, and block or mitigate potential threats in real-time. These examples provide organizations with enhanced security measures to safeguard their CI/CD pipelines and ensure the integrity and reliability of their software development and deployment processes. Examples of the disclosure collect relevant data, process the collected data, and trigger an alert to the organization's security team when a suspicious activity is detected. In some examples, data is collected from a CI/CD pipeline execution data source and a CI/CD pipeline task data source. Based on the collected data, a feature group (also referred to as a feature set) comprising a plurality of records is created. Each record in the feature group represents a single execution of the CI/CD pipeline. For example, a record in the CI/CD pipeline execution data source is created for each run of the CI/CD pipeline. The record in the CI/CD pipeline execution data source includes one or more of a pipeline name, an identity of a user executing the CI/CD pipeline, an internet protocol (IP) address, and a date and time of executing the CI/CD pipeline. Similarly, a record in the CI/CD pipeline task data source is created for each run of the CI/CD pipeline. The record in the CI/CD pipeline task data source includes one or more of: a task name and coding language. An anomaly score is generated for the feature group representing the execution of the CI/CD pipeline using a model representing historical feature groups of the CI/CD pipeline. The model may be trained using historical feature groups which were created before the creation of the current feature group. In some examples, the model may be trained using the historical feature groups up to a predefined time before the current feature group (e.g., the model may be trained or updated every hour). In some examples, the current feature group is not used for generating the anomaly score for the CI/CD pipeline, to avoid calculating anomalies based on likely anomalized data.