US-12619736-B2 - Intelligent software composition management with performance and security alerts

US12619736B2US 12619736 B2US12619736 B2US 12619736B2US-12619736-B2

Abstract

An example methodology includes, by a computing device, receiving information regarding a new application from another computing device and determining one or more relevant features from the information regarding the new application, the one or more relevant features influencing predictions of any potential performance issue and any potential security issue. The method also includes, by the computing device, generating, using a multi-target machine learning (ML) model, a first prediction of any potential performance issue for the new application and a second prediction of any potential security issue for the new application based on the determined one or more relevant features, and sending the first and second predictions to the another computing device.

Inventors

Shamik Kacker
Bijan Kumar Mohanty
Hung Dinh

Assignees

DELL PRODUCTS L.P.

Dates

Publication Date: 20260505
Application Date: 20230510

Claims (13)

1 . A method comprising: receiving, by a computing device, information regarding a new application from another computing device; determining, by the computing device, one or more relevant features from the information regarding the new application, the one or more relevant features influencing predictions of any potential performance issue and any potential security issue; simultaneously generating, by the computing device using a multi-target machine learning (ML) model that has been trained using a training dataset generated from historical software composition metadata and information about performance and security issues, wherein the training dataset comprises a plurality of training/testing samples, and wherein each training/testing sample of the plurality of training/testing samples includes one or more features extracted from the historical software composition metadata and information about performance and security issues, further wherein the one or more features includes a feature indicative of an open-source software (OSS) component used in an application: a first prediction, generated by a first parallel branch of the ML model, of any potential runtime performance issue for the new application based on the determined one or more relevant features; and a second prediction, generated by a second parallel branch of the ML model, of any potential security vulnerability issue for the new application based on the determined one or more relevant features; and sending, by the computing device, the first and second predictions to the another computing device.
2 . The method of claim 1 , wherein the multi-target ML model includes a multi-output deep neural network (DNN).
3 . The method of claim 2 , wherein the multi-output DNN predicts a first binary classification response and a second binary classification response, wherein the first binary classification response is the first prediction of any potential performance issue for the new application and the second binary classification response is the second prediction of any potential security issue for the new application.
4 . The method of claim 1 , wherein the one or more features includes a feature indicative of a hosting associated with an application.
5 . The method of claim 1 , wherein the one or more features includes a feature indicative of a consumption associated with an application.
6 . The method of claim 1 , wherein the software component includes a commercial off-the-shelf software (COTS) component.
7 . A system comprising: one or more non-transitory machine-readable mediums configured to store instructions; and one or more processors configured to execute the instructions stored on the one or more non-transitory machine-readable mediums, wherein execution of the instructions causes the one or more processors to carry out a process comprising: receiving information regarding a new application from a computing device; determining one or more relevant features from the information regarding the new application, the one or more relevant features influencing predictions of any potential performance issue and any potential security issue; simultaneously generating, using a multi-target machine learning (ML) model that has been trained using a training dataset generated from historical software composition metadata and information about performance and security issues, wherein the training dataset comprises a plurality of training/testing samples, and wherein each training/testing sample of the plurality of training/testing samples includes one or more features extracted from the historical software composition metadata and information about performance and security issues, further wherein the one or more features includes a feature indicative of an open-source software (OSS) component used in an application: a first prediction, generated by a first parallel branch of the ML model, of any potential runtime performance issue for the new application based on the determined one or more relevant features; and a second prediction, generated by a second parallel branch of the ML model, of any potential security vulnerability issue for the new application based on the determined one or more relevant features; and sending the first and second predictions to the computing device.
8 . The system of claim 7 , wherein the multi-target ML model includes a multi-output deep neural network (DNN).
9 . The system of claim 8 , wherein the multi-output DNN predicts a first binary classification response and a second binary classification response, wherein the first binary classification response is the first prediction of any potential performance issue for the new application and the second binary classification response is the second prediction of any potential security issue for the new application.
10 . The system of claim 7 , wherein the one or more features includes a feature indicative of one of a hosting associated with an application or a consumption associated with the application.
11 . The system of claim 7 , wherein the software component includes a commercial off-the-shelf software (COTS) component.
12 . A non-transitory machine-readable medium encoding instructions that when executed by one or more processors cause a process to be carried out, the process including: receiving information regarding a new application from a computing device; determining one or more relevant features from the information regarding the new application, the one or more relevant features influencing predictions of any potential performance issue and any potential security issue; simultaneously generating, using a multi-target machine learning (ML) model that has been trained using a training dataset generated from historical software composition metadata and information about performance and security issues, wherein the training dataset comprises a plurality of training/testing samples, and wherein each training/testing sample of the plurality of training/testing samples includes one or more features extracted from the historical software composition metadata and information about performance and security issues, further wherein the one or more features includes a feature indicative of an open-source software (OSS) component used in an application: a first prediction, generated by a first parallel branch of the ML model, of any potential runtime performance issue for the new application based on the determined one or more relevant features; and a second prediction, generated by a second parallel branch of the ML model, of any potential security vulnerability issue for the new application based on the determined one or more relevant features; and sending the first and second predictions to the computing device.
13 . The machine-readable medium of claim 12 , wherein the multi-target ML model includes a multi-output deep neural network (DNN), wherein the multi-output DNN predicts a first binary classification response and a second binary classification response, wherein the first binary classification response is the first prediction of any potential performance issue for the new application and the second binary classification response is the second prediction of any potential security issue for the new application.

Description

BACKGROUND Organizations, such as software companies, typically create software using various combinations of custom code, commercial off-the-shelf software (COTS), and open-source software (OSS). The created software may be consumed internally and/or externally by customers. While OSS offers many benefits to organizations, these organizations are challenged with having to comply with the various licenses (e.g., open-source licenses) that govern the use of OSS, as failure to comply with these licenses can but the organization at significant risk of litigation as well as compromise the intellectual property (IP). For instance, the most recent annual Open-Source Security and Risk Analysis (OSSRA) report found that over 53% of the codebases audited contained open-source license conflicts, which typically involved the GNU General Public License (GPL). These conflicts can lead to serious implications with mergers and acquisitions, vendor disputes, and distribution problems for the organization. Open-source vulnerabilities also pose significant risks to application security. Open-source vulnerabilities are security risks contained within or created by open-source components. The vulnerabilities are primarily due to the way OSS is developed, e.g., not being subject to the same level of scrutiny as software that is custom developed. These open-source vulnerabilities can potentially expose an organization to threats such as malware injections, data breaches, and Denial-of-Service (DOS) attacks. SUMMARY This Summary is provided to introduce a selection of concepts in simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features or combinations of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. In accordance with one illustrative embodiment provided to illustrate the broader concepts, systems, and techniques described herein, a method includes, by a computing device, receiving information regarding a new application from another computing device and determining one or more relevant features from the information regarding the new application, the one or more relevant features influencing predictions of any potential performance issue and any potential security issue. The method also includes, by the computing device, generating, using a multi-target machine learning (ML) model, a first prediction of any potential performance issue for the new application and a second prediction of any potential security issue for the new application based on the determined one or more relevant features, and sending the first and second predictions to the another computing device. In some embodiments, the multi-target ML model includes a multi-output deep neural network (DNN). In one aspect, the multi-output DNN predicts a first classification response and a second classification response, wherein the first classification response is the first prediction of any potential performance issue for the new application and the second classification response is the second prediction of any potential security issue for the new application. In some embodiments, the multi-target ML model is generated using a training dataset generated from a corpus of historical software composition metadata and information about performance and security issues of an organization. In some embodiments, the training dataset comprises a plurality of training/testing samples, wherein each training/testing sample of the plurality of training/testing samples includes one or more features extracted from the historical software composition metadata and information about performance and security issues, wherein the one or more features includes a feature indicative of a hosting associated with an application. In some embodiments, the training dataset comprises a plurality of training/testing samples, wherein each training/testing sample of the plurality of training/testing samples includes one or more features extracted from the historical software composition metadata and information about performance and security issues wherein the one or more features includes a feature indicative of a consumption associated with an application. In some embodiments, the training dataset comprises a plurality of training/testing samples, wherein each training/testing sample of the plurality of training/testing samples includes one or more features extracted from the historical software composition metadata and information about performance and security issues, wherein the one or more features includes a feature indicative of a software component used in an application. In one aspect, the software component includes a commercial off-the-shelf software (COTS) component. In one aspect, the software component includes an open-source software (OSS) component. According to another illustrative embodiment provided to illustrate the broader concepts descr