US-12619650-B2 - Techniques for securing computing interfaces using clustering
Abstract
A system and method for clustering computing interface calls. A method includes: determining a plurality of computing interface cluster definitions, the plurality of computing interface cluster definitions including a plurality of parameter type strings; and clustering a plurality of computing interface call instances into a plurality of clusters based on the plurality of computing interface cluster definitions, wherein a number of clusters among the plurality of clusters is fewer than a number of computing interface call instances among the plurality of computing interface call instances, wherein clustering the plurality of computing interface call instances includes determining a plurality of portions of the plurality of computing interface call instances which match types of parameters represented by respective parameter type strings of the plurality of parameter type strings.
Inventors
- Adi Chen ARBIB
- Adi VARDI
- Shai Meir
- Yaniv GABAY
- Yuval Alkalai Tavori
- Idan TAGER
- Itzhak GERSHFELD
Assignees
- NONAME GATE LTD.
Dates
- Publication Date
- 20260505
- Application Date
- 20240514
Claims (20)
- 1 . A method for clustering computing interface calls, comprising: determining a plurality of computing interface cluster definitions by identifying one or more clusterizers in a plurality of segments of a plurality of computing interface examples, wherein a clusterizer is at least a portion of a string that demonstrates a recurring pattern within the plurality of computing interface examples, the plurality of computing interface cluster definitions including a plurality of parameter type strings; and clustering a plurality of computing interface call instances into a plurality of clusters based on the plurality of computing interface cluster definitions, wherein a number of clusters among the plurality of clusters is fewer than a number of computing interface call instances among the plurality of computing interface call instances, wherein clustering the plurality of computing interface call instances includes determining a plurality of portions of the plurality of computing interface call instances which match types of parameters represented by respective parameter type strings of the plurality of parameter type strings.
- 2 . The method of claim 1 , further comprising: establishing baseline behavior for each of the plurality of clusters based on computing interface call data.
- 3 . The method of claim 2 , further comprising: detecting abnormal behavior based on at least one deviation from the established baseline behavior; and securing at least one computing environment by performing at least one mitigation action with respect to the detected abnormal behavior.
- 4 . The method of claim 1 , wherein determining the plurality of computing interface cluster definitions further comprises: matching clusterized string lists between segments of the plurality of segments, wherein each clusterized string list is an ordered list of clusterizers in one of the plurality of segments, wherein the plurality of cluster definitions are determined based on the matching.
- 5 . The method of claim 4 , further comprising: determining whether a set of clusterizers in each computing interface example is a cluster based on whether each of the clusterizers in the set of clusterizer in each computing interface example meets at least one minimum count condition.
- 6 . The method of claim 5 , further comprising: replacing at least one clusterizer among the set of clusterizers with a corresponding portion of a clusterized string list in order to create a replaced segment pattern, wherein each replaced segment pattern is determined as one of the plurality of computing interface cluster definitions.
- 7 . The method of claim 4 , further comprising: creating a character matrix based on the plurality of computing interface name examples, wherein the character matrix includes a plurality of entries representing potential combinations of characters; determining N-gram statistics for each of the plurality of computing interface name examples based on the character matrix, wherein the plurality of clusterizers is identified based on the determined N-gram statistics.
- 8 . The method of claim 7 , further comprising: determining a score of N-grams for each string among the plurality of computing interface examples; and determining whether the score for each string is above a threshold, wherein each string for which the score is below the threshold is identified as a clusterizer.
- 9 . The method of claim 8 , further comprising: extracting at least one N-gram from each of the computing interface name examples, wherein each N-gram is a contiguous sequence of N characters, wherein the score for each string is determined based on the at least one N-grams extracted from the computing interface name example including the string.
- 10 . A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising: determining a plurality of computing interface cluster definitions by identifying one or more clusterizers in a plurality of segments of a plurality of computing interface examples, wherein a clusterizer is at least a portion of a string that demonstrates a recurring pattern within the plurality of computing interface examples, the plurality of computing interface cluster definitions including a plurality of parameter type strings; and clustering a plurality of computing interface call instances into a plurality of clusters based on the plurality of computing interface cluster definitions, wherein a number of clusters among the plurality of clusters is fewer than a number of computing interface call instances among the plurality of computing interface call instances, wherein clustering the plurality of computing interface call instances includes determining a plurality of portions of the plurality of computing interface call instances which match types of parameters represented by respective parameter type strings of the plurality of parameter type strings.
- 11 . A system for efficiently clustering computing interface calls, comprising: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: determine a plurality of computing interface cluster definitions by identifying one or more clusterizers in a plurality of segments of a plurality of computing interface examples, wherein a clusterizer is at least a portion of a string that demonstrates a recurring pattern within the plurality of computing interface examples, the plurality of computing interface cluster definitions including a plurality of parameter type strings; and cluster a plurality of computing interface call instances into a plurality of clusters based on the plurality of computing interface cluster definitions, wherein a number of clusters among the plurality of clusters is fewer than a number of computing interface call instances among the plurality of computing interface call instances, wherein clustering the plurality of computing interface call instances includes determining a plurality of portions of the plurality of computing interface call instances which match types of parameters represented by respective parameter type strings of the plurality of parameter type strings.
- 12 . The system of claim 11 , wherein the system is further configured to: establish baseline behavior for each of the plurality of clusters based on computing interface call data.
- 13 . The system of claim 12 , wherein the system is further configured to: detect abnormal behavior based on at least one deviation from the established baseline behavior; and secure at least one computing environment by performing at least one mitigation action with respect to the detected abnormal behavior.
- 14 . The system of claim 11 , wherein the system is further configured to: match clusterized string lists between segments of the plurality of segments, wherein each clusterized string list is an ordered list of clusterizers in one of the plurality of segments, wherein the plurality of cluster definitions are determined based on the matching.
- 15 . The system of claim 14 , wherein the system is further configured to: determine whether a set of clusterizers in each computing interface example is a cluster based on whether each of the clusterizers in the set of clusterizer in each computing interface example meets at least one minimum count condition.
- 16 . The system of claim 15 , wherein the system is further configured to: replacing at least one clusterizer among the set of clusterizers with a corresponding portion of a clusterized string list in order to create a replaced segment pattern, wherein each replaced segment pattern is determined as one of the plurality of computing interface cluster definitions.
- 17 . The system of claim 14 , wherein the system is further configured to: create a character matrix based on the plurality of computing interface name examples, wherein the character matrix includes a plurality of entries representing potential combinations of characters; determine N-gram statistics for each of the plurality of computing interface name examples based on the character matrix, wherein the plurality of clusterizers is identified based on the determined N-gram statistics.
- 18 . The system of claim 17 , wherein the system is further configured to: determine a score of N-grams for each string among the plurality of computing interface examples; and determine whether the score for each string is above a threshold, wherein each string for which the score is below the threshold is identified as a clusterizer.
- 19 . The system of claim 18 , wherein the system is further configured to: extract at least one N-gram from each of the computing interface name examples, wherein each N-gram is a contiguous sequence of N characters, wherein the score for each string is determined based on the at least one N-grams extracted from the computing interface name example including the string.
- 20 . A method for clustering computing interface calls in a computing system having a computing interface call security system that secures computing interfaces, comprising: determining a plurality of computing interface cluster definitions, at least one computing interface cluster definition including a plurality of parameter type strings; clustering a plurality of computing interface call instances into a plurality of clusters based on the plurality of computing interface cluster definitions, wherein a number of clusters among the plurality of clusters is fewer than a number of computing interface call instances among the plurality of computing interface call instances, wherein clustering the plurality of computing interface call instances includes determining a plurality of portions of the plurality of computing interface call instances that match types of parameters represented by respective parameter type strings of the plurality of parameter type strings; establishing baseline behavior for each of the plurality of clusters based on computing interface call data; detecting abnormal behavior based on at least one deviation from the established baseline behavior; and responsive to detecting abnormal behavior based on at least one deviation from the established baseline behavior, the computing interface call security system performing at least one mitigation action in the computing system with respect to the detected abnormal behavior.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/297,789 filed on Apr. 10, 2023, now pending, the contents of which are hereby incorporated by reference. TECHNICAL FIELD The present disclosure relates generally to computing interface cybersecurity, and more specifically to cybersecurity related to call flows for computing interfaces. BACKGROUND The vast majority of cybersecurity breaches can be traced back to an issue with a computer interface such as an application programming interface (API). API abuses are expected to become the most frequent attack vector in the future, and insecure APIs have been identified as a significant threat to cloud computing. An API is a computing interface. A computing interface is a shared boundary across which two or more separate components of a computer system exchange information. Computing interfaces therefore allow disparate computing components to effectively communicate with each other despite potential differences in communication format, content, and the like. An API defines interactions between software components. A flawed API can lead to exposure of sensitive data, account takeovers, and even denial of service (DOS) attacks. As a result, securing APIs is a top priority of many computing services providers. A call to an API typically includes some form of method verb representing an action to be taken via an API (e.g., GET, POST, PUT, DELETE, etc.), a domain, and a path. Certain portions of API calls may be divided into segments, each of which might include parameters defining paths (or portions thereof), query parameters, or a combination of path parameters and query parameters. Segments are typically defined with respect to one or more bookend characters such as, but not limited to, a pair of slash marks (with one slash mark at the beginning of the segment and another slash mark at the end), a beginning slash mark with no further segments thereafter (i.e., even without an ending slash mark), or an end slash mark without a slash mark preceding it. Each bookend character marks either the beginning or end of a segment such that the bookend characters can be used collectively to define different segments within an API. API calls may be made in malicious attempts to improperly access data. Accordingly, techniques which allow for identifying patterns in API behavior with respect to these API calls would be desirable. SUMMARY A summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term “some embodiments” or “certain embodiments” may be used herein to refer to a single embodiment or multiple embodiments of the disclosure. Certain embodiments disclosed herein include a method for clustering computing interface calls. The method comprises: determining a plurality of computing interface cluster definitions, the plurality of computing interface cluster definitions including a plurality of parameter type strings; and clustering a plurality of computing interface call instances into a plurality of clusters based on the plurality of computing interface cluster definitions, wherein a number of clusters among the plurality of clusters is fewer than a number of computing interface call instances among the plurality of computing interface call instances, wherein clustering the plurality of computing interface call instances includes determining a plurality of portions of the plurality of computing interface call instances which match types of parameters represented by respective parameter type strings of the plurality of parameter type strings. Certain embodiments disclosed herein also include a non-transitory computer readable medium having stored thereon causing a processing circuitry to execute a process, the process comprising: determining a plurality of computing interface cluster definitions, the plurality of computing interface cluster definitions including a plurality of parameter type strings; and clustering a plurality of computing interface call instances into a plurality of clusters based on the plurality of computing interface cluster definitions, wherein a number of clusters among the plurality of clusters is fewer than a number of computing interface call instances among the plurality of computing interface call instances, wherein clustering the plurality of computing interface call instances includes determining a plurality of portions of t