US-12619734-B2 - Automatic classification of security vulnerabilities

US12619734B2US 12619734 B2US12619734 B2US 12619734B2US-12619734-B2

Abstract

Example methods and systems are directed to the automated assessment of vulnerabilities in the context of information technology (IT) security. A data record of a vulnerability is accessed. The vulnerability includes a vulnerability description and may also identify an application in respect of which the vulnerability was detected by an IT security tool. An input vector is automatically generated based on the vulnerability description. A machine learning model uses the input vector to generate a probability score. A positivity classification for the vulnerability is automatically determined based on the probability score. Output data representing the positivity classification is caused to be presented in a user interface. The positivity classification may indicate whether the vulnerability is deemed to be a false positive or a true positive. Example methods and systems provide a unified dashboard for presenting multiple vulnerabilities and positivity classifications relating to one or more applications.

Inventors

Abhinav Srivastava
Krishna Prasad P
Anurag Negi
Pratim Milind Ugale

Assignees

SAP SE

Dates

Publication Date: 20260505
Application Date: 20230327

Claims (20)

1 . A system comprising: a memory that stores instructions; and one or more processors configured by the instructions to perform operations comprising: accessing first data comprising a data record of a vulnerability, the data record generated by an information technology (IT) security tool and comprising a vulnerability description; automatically generating an input vector based on the vulnerability description in the first data; generating, by a machine learning model and using the input vector, a probability score for the vulnerability; automatically determining, based on the probability score for the vulnerability, a positivity classification for the vulnerability, wherein the positivity classification indicates that the vulnerability is a false positive; in response to determining the false positive: accessing second data comprising historical triaging data stored in a database, the historical triaging data comprising, for each of a plurality of vulnerabilities, a vulnerability identifier, a vulnerability description, a triaging status indicating a severity classification assigned to a corresponding vulnerability, and a triaging reason indicating a rationale as to why the severity classification was assigned to the corresponding vulnerability, and generating, based on the second data, a plurality of candidate triaging reasons for the false positive, each candidate triaging reason being a different rationale as to why the positivity classification indicates that the vulnerability is a false positive; causing presentation, in a user interface, of output data representing the positivity classification for the vulnerability, the output data indicating the false positive and including the plurality of candidate triaging reasons; receiving a user selection of a triaging reason from among the plurality of candidate triaging reasons presented in the user interface; and performing a triaging process to triage the vulnerability using the user selection of the triaging reason provided via the user interface.
2 . The system of claim 1 , wherein the data record of the vulnerability further comprises a severity score, the input vector being generated based on the vulnerability description and the severity score.
3 . The system of claim 2 , wherein the generating of the input vector comprises: applying a natural language processing algorithm to one or more text objects in the vulnerability description to obtain a numerical representation of the one or more text objects; and generating the input vector based on the severity score and the numerical representation of the one or more text objects.
4 . The system of claim 3 , wherein the natural language processing algorithm is a bag-of-words algorithm, the numerical representation of the one or more text objects representing each text object as an input feature and further representing a frequency of the text object within the vulnerability description as a feature value.
5 . The system of claim 3 , further comprising, prior to applying the natural language processing algorithm: pre-processing the vulnerability description to remove one or more predefined text objects from the vulnerability description to obtain a pre-processed vulnerability description, wherein the pre-processed vulnerability description is used to generate the input vector.
6 . The system of claim 1 , wherein the machine learning model comprises a logistic regression model.
7 . The system of claim 1 , wherein the causing presentation of the output data further comprises causing presentation, in the user interface, of a confidence indicator in association with the positivity classification for the vulnerability.
8 . The system of claim 1 , the operations further comprising: causing presentation of a triaging element in the user interface, the triaging element being user-selectable to adjust the positivity classification for the vulnerability.
9 . The system of claim 1 , wherein the data record of the vulnerability identifies an application, the vulnerability having been detected in the application by the IT security tool.
10 . The system of claim 9 , wherein the vulnerability is a first vulnerability, wherein the IT security tool is a first IT security tool, wherein the vulnerability description is a first vulnerability description, wherein the input vector is a first input vector, and wherein the output data is first output data, the operations further comprising: accessing a data record of a second vulnerability generated by a second IT security tool, the data record of the second vulnerability comprising a second vulnerability description and identifying the application, the second vulnerability having been detected in the application by the second IT security tool; automatically generating a second input vector based on the second vulnerability description in the data record of the second vulnerability; generating, by the machine learning model and using the second input vector, a probability score for the second vulnerability; automatically determining, based on the probability score for the second vulnerability, a positivity classification for the second vulnerability; and causing presentation, in the user interface, of second output data representing the positivity classification for the second vulnerability, the user interface presenting the first output data and the second output data in a unified dashboard.
11 . The system of claim 1 , the operations further comprising: selecting the plurality of candidate triaging reasons based on similarities in at least one of vulnerability descriptions or severity scores between the first data and the second data.
12 . A method comprising: accessing first data comprising a data record of a vulnerability, the data record generated by an information technology (IT) security tool and comprising a vulnerability description; automatically generating an input vector based on the vulnerability description in the first data; generating, by a machine learning model and using the input vector, a probability score for the vulnerability; automatically determining, based on the probability score for the vulnerability, a positivity classification for the vulnerability, wherein the positivity classification indicates that the vulnerability is a false positive; in response to determining the false positive: accessing second data comprising historical triaging data stored in a database, the historical triaging data comprising, for each of a plurality of vulnerabilities, a vulnerability identifier, a vulnerability description, a triaging status indicating a severity classification assigned to a corresponding vulnerability, and a triaging reason indicating a rationale as to why the severity classification was assigned to the corresponding vulnerability, and generating, based on the second data, a plurality of candidate triaging reasons for the false positive, each candidate triaging reason being a different rationale as to why the positivity classification indicates that the vulnerability is a false positive; causing presentation, in a user interface, of output data representing the positivity classification for the vulnerability, the output data indicating the false positive and including the plurality of candidate triaging reasons; receiving a user selection of a triaging reason from among the plurality of candidate triaging reasons presented in the user interface; and performing a triaging process to triage the vulnerability using the user selection of the triaging reason provided via the user interface.
13 . The method of claim 12 , wherein the data record of the vulnerability further comprises a severity score, the input vector being generated based on the vulnerability description and the severity score.
14 . The method of claim 13 , wherein the generating of the input vector comprises: applying a natural language processing algorithm to one or more text objects in the vulnerability description to obtain a numerical representation of the one or more text objects; and generating the input vector based on the severity score and the numerical representation of the one or more text objects.
15 . The method of claim 12 , wherein the causing presentation of the output data further comprises causing presentation, in the user interface, of a confidence indicator in association with the positivity classification for the vulnerability.
16 . The method of claim 12 , further comprising: causing presentation of a triaging element in the user interface, the triaging element being user-selectable to adjust the positivity classification for the vulnerability.
17 . A non-transitory computer-readable medium that stores instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: accessing first data comprising a data record of a vulnerability, the data record generated by an information technology (IT) security tool and comprising a vulnerability description; automatically generating an input vector based on the vulnerability description in the first data; generating, by a machine learning model and using the input vector, a probability score for the vulnerability; automatically determining, based on the probability score for the vulnerability, a positivity classification for the vulnerability, wherein the positivity classification indicates that the vulnerability is a false positive; in response to determining the false positive: accessing second data comprising historical triaging data stored in a database, the historical triaging data comprising, for each of a plurality of vulnerabilities, a vulnerability identifier, a vulnerability description, a triaging status indicating a severity classification assigned to a corresponding vulnerability, and a triaging reason indicating a rationale as to why the severity classification was assigned to the corresponding vulnerability, and generating, based on the second data, a plurality of candidate triaging reasons for the false positive, each candidate triaging reason being a different rationale as to why the positivity classification indicates that the vulnerability is a false positive; causing presentation, in a user interface, of output data representing the positivity classification for the vulnerability, the output data indicating the false positive and including the plurality of candidate triaging reasons; receiving a user selection of a triaging reason from among the plurality of candidate triaging reasons presented in the user interface; and performing a triaging process to triage the vulnerability using the user selection of the triaging reason provided via the user interface.
18 . The non-transitory computer-readable medium of claim 17 , wherein the data record of the vulnerability further comprises a severity score, the input vector being generated based on the vulnerability description and the severity score.
19 . The non-transitory computer-readable medium of claim 17 , wherein the causing presentation of the output data further comprises causing presentation, in the user interface, of a confidence indicator in association with the positivity classification for the vulnerability.
20 . The non-transitory computer-readable medium of claim 17 , further comprising: causing presentation of a triaging element in the user interface, the triaging element being user-selectable to adjust the positivity classification for the vulnerability.

Description

TECHNICAL FIELD The subject matter disclosed herein generally relates to information technology (IT) security. Specifically, but not exclusively, the present disclosure addresses systems and methods to assess, classify, and present vulnerabilities reported by IT security tools. BACKGROUND Finding security flaws or other vulnerabilities in computing devices, systems, or applications may require a variety of IT security tools. Analyzing and triaging these vulnerabilities, including initiating remediation tasks and finding false positives, are important tasks to ensure compliance with security standards and protocols. In some cases, each security tool has a separate user interface or Application Programming Interface (API) for reporting vulnerabilities. Reviewing vulnerabilities across different tools, triaging them to ensure that risks are prioritized and appropriate action is taken where it may be needed, and keeping track of the status of such vulnerabilities, may thus be difficult, time-consuming, or expensive. BRIEF DESCRIPTION OF THE DRAWINGS Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings. FIG. 1 is a diagrammatic representation of a network environment suitable for automatic classification of vulnerabilities, according to some examples. FIG. 2 is a block diagram of a vulnerability management system, according to some examples. FIG. 3 is an interaction diagram showing certain interactions between a vulnerability management system, multiple security tools, and other components, according to some examples. FIG. 4 is a flowchart illustrating operations of a method suitable for automatic classification of vulnerabilities, according to some examples. FIG. 5 is a flowchart illustrating operations of a method suitable for initiating different triaging functions based on a positivity classification determined for a vulnerability, according to some examples. FIG. 6 is a diagram illustrating aspects of a dashboard that may be presented in a user interface, according to some examples. FIG. 7 diagrammatically illustrates training and use of a machine learning program, according to some examples. FIG. 8 diagrammatically illustrates training and use of a logistic regression model to generate probability scores, according to some examples. FIG. 9 is a block diagram showing a software architecture for a computing device, according to some examples. FIG. 10 is a block diagram of a machine in the form of a computer system, according to some examples, within which instructions may be executed for causing the machine to perform any one or more of the methodologies discussed herein. DETAILED DESCRIPTION Example methods and systems are directed to the automatic classification of vulnerabilities that are detected by IT security tools. As used herein, the term “vulnerability” refers to any flaw or weakness in a computing device, system, application, or network, that may weaken security in the device, system, application, or network, or make it vulnerable to attack or exploitation. A vulnerability may be a flaw or weakness in software, hardware, security procedures, design, implementation, controls, or combinations thereof, that could lead to an undesirable event compromising the security of the device, system, application, or network. As used herein, the term “IT security tool,” or simply “security tool,” refers to any one or more computer programs designed to assess a computing device, system, application, or network, to identify or detect a vulnerability. A security tool may be installed on a user device or accessed remotely, e.g., made available as a service. A vulnerability scanner is an example of a security tool. For instance, when developing a container-based application, multiple vulnerability scanners may be used to check for vulnerabilities that could be exploited by attackers, e.g., container scanners and static code analysis tools. A user may have a need for an automated system suitable for analyzing vulnerabilities and classifying them automatically to facilitate triaging. As used herein, the term “triaging” refers to a process of prioritizing, categorizing, classifying, or assigning remediation action items to vulnerabilities, e.g., based on their severity and potential impact. Triaging objectives may include ensuring that the most important or significant vulnerabilities are addressed as soon as possible, and allocating resources efficiently to mitigate security risks. Machine learning models are applications that provide computer systems the ability to perform tasks, without explicitly being programmed, by making inferences based on patterns found in the analysis of data. Machine learning explores the study and construction of algorithms, also referred to herein as models, that may learn from existing data and make predictions about new data. In some examples, a machine learning model is employed to provide automated classification functionality.