US-20260127280-A1 - USING FEATURE VECTOR WINDOWS TO INFER AND MITIGATE RANSOMWARE ACTIVITY
Abstract
A method, according to one embodiment, includes adding an incoming first feature vector to an outer window. In response to a determination that the first feature vector is a qualifying feature vector, the first feature vector is added into a voting window. In response to a determination that the outer window is full, a relatively oldest feature vector is removed from the outer window. The method further includes using feature vectors in the voting window to infer ransomware activity. A computer program product, according to another embodiment, includes one or more computer-readable storage media, and program instructions stored on the one or more storage media to perform the foregoing method. A computer system, according to another embodiment, includes a processor set, one or more computer-readable storage media, and program instructions stored on the one or more storage media to cause the processor set to perform the foregoing method.
Inventors
- Roman Alexander Pletka
- Dionysios Diamantopoulos
- Nicolás Hernán Reátegui Rodriguez
- Charalampos Pozidis
- Yves Alexandre Beraldo dos Santos
- Andrew D. Walls
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260507
- Application Date
- 20241104
Claims (20)
- 1 . A method comprising: adding an incoming first feature vector to an outer window; in response to a determination that the first feature vector is a qualifying feature vector, adding the first feature vector into a voting window; in response to a determination that the outer window is full, removing a relatively oldest feature vector from the outer window; and using feature vectors in the voting window to infer ransomware activity.
- 2 . The method of claim 1 , further comprising: in response to the determination that the outer window is full, determining whether the relatively oldest feature vector is present in the voting window; and in response to a determination that the relatively oldest feature vector is present in the voting window, removing the relatively oldest feature vector from the voting window.
- 3 . The method of claim 1 , wherein the relatively oldest feature vector is a second feature vector.
- 4 . The method of claim 1 , wherein the first feature vector details feature information about operations performed within a storage system.
- 5 . The method of claim 4 , wherein the feature information is selected from the group consisting of: read transfer size, write transfer size, an entropy of writes, an application tag, a logical block address (LBA) of a write operation, and an LBA of a read operation.
- 6 . The method of claim 4 , further comprising: determining a response for mitigating a ransomware attack associated with the ransomware activity; and causing the response to be performed.
- 7 . The method of claim 6 , wherein the using feature vectors in the voting window to infer the ransomware activity comprises: determining whether a majority of inferred votes on the feature vectors in the voting window exceed a dynamically adjustable threshold; and in response to a determination that the majority of the inferred votes on the feature vectors in the voting window exceed the dynamically adjustable threshold, determining that the storage system is targeted by a ransomware attack.
- 8 . The method of claim 6 , further comprising: training an artificial intelligence (AI) engine to use training feature vectors in a training voting window to infer simulated ransomware activity; determining an accuracy of the AI engine; and in response to a determination that the AI engine exceeds a predetermined threshold of accuracy, causing the AI engine to use the feature vectors in the voting window to infer the ransomware activity and determine the response for mitigating the ransomware attack associated with the ransomware activity.
- 9 . The method of claim 1 , further comprising: determining whether the first feature vector is a qualifying feature vector, wherein the first feature vector is determined to be a qualifying feature vector in response to a determination that write operations are detected in a period covering the first feature vector.
- 10 . A computer program product comprising: one or more computer-readable storage media; and program instructions stored on the one or more storage media to perform operations comprising: adding an incoming first feature vector to an outer window; in response to a determination that the first feature vector is a qualifying feature vector, adding the first feature vector into a voting window; in response to a determination that the outer window is full, removing a relatively oldest feature vector from the outer window; and using feature vectors in the voting window to infer ransomware activity.
- 11 . The computer program product of claim 10 , wherein the operations further comprise: in response to the determination that the outer window is full, determining whether the relatively oldest feature vector is present in the voting window; and in response to a determination that the relatively oldest feature vector is present in the voting window, removing the relatively oldest feature vector from the voting window.
- 12 . The computer program product of claim 10 , wherein the relatively oldest feature vector is a second feature vector.
- 13 . The computer program product of claim 10 , wherein the first feature vector details feature information about operations performed within a storage system.
- 14 . The computer program product of claim 13 , wherein the feature information is selected from the group consisting of: read transfer size, write transfer size, an entropy of writes, an application tag, a logical block address (LBA) of a write operation, and an LBA of a read operation.
- 15 . The computer program product of claim 13 , wherein the operations further comprise: determining a response for mitigating a ransomware attack associated with the ransomware activity; and causing the response to be performed.
- 16 . The computer program product of claim 15 , wherein the using feature vectors in the voting window to infer the ransomware activity comprises: determining whether a majority of inferred votes on the feature vectors in the voting window exceed a dynamically adjustable threshold; and in response to a determination that the majority of the inferred votes on the feature vectors in the voting window exceed the dynamically adjustable threshold, determining that the storage system is targeted by a ransomware attack.
- 17 . The computer program product of claim 15 , wherein the operations further comprise: training an artificial intelligence (AI) engine to use training feature vectors in a training voting window to infer simulated ransomware activity; determining an accuracy of the AI engine; and in response to a determination that the AI engine exceeds a predetermined threshold of accuracy, causing the AI engine to use the feature vectors in the voting window to infer the ransomware activity and determine the response for mitigating the ransomware attack associated with the ransomware activity.
- 18 . The computer program product of claim 15 , wherein the operations further comprise: determining whether the first feature vector is a qualifying feature vector, wherein the first feature vector is determined to be a qualifying feature vector in response to a determination that write operations are detected in a period covering the first feature vector.
- 19 . A computer system comprising: a processor set; one or more computer-readable storage media; and program instructions stored on the one or more storage media to cause the processor set to perform operations comprising: adding an incoming first feature vector to an outer window; in response to a determination that the first feature vector is a qualifying feature vector, adding the first feature vector into a voting window; in response to a determination that the outer window is full, removing a relatively oldest feature vector from the outer window; and using feature vectors in the voting window to infer ransomware activity.
- 20 . The computer system of claim 19 , wherein the operations further comprise: in response to the determination that the outer window is full, determining whether the relatively oldest feature vector is present in the voting window; and in response to a determination that the relatively oldest feature vector is present in the voting window, removing the relatively oldest feature vector from the voting window.
Description
BACKGROUND The present invention relates to storage systems, and more specifically, this invention relates to ransomware detection in storage systems. Ransomware is a type of malware that holds a victim's sensitive data and/or device hostage, threatening to keep the sensitive data and/or device locked and/or exploited (e.g., leaked online to the public) unless the victim pays a ransom to the attacker. The earliest ransomware attacks simply demanded a ransom in exchange for the encryption key needed to regain access to the affected data or use of the infected device. However, these attacks have evolved to include double-extortion and triple-extortion tactics that expand the threat of a ransomware attack beyond the aforementioned victim to customers, family, friends, business partners, etc. Even victims who rigorously maintain data backups of a storage system or pay the initial ransom demand are at risk of ransomware attacks. SUMMARY A method, according to one embodiment, includes adding an incoming first feature vector to an outer window. In response to a determination that the first feature vector is a qualifying feature vector, the first feature vector is added into a voting window. In response to a determination that the outer window is full, a relatively oldest feature vector is removed from the outer window. The method further includes using feature vectors in the voting window to infer ransomware activity. A computer program product, according to another embodiment, includes one or more computer-readable storage media, and program instructions stored on the one or more storage media to perform the foregoing method. A computer system, according to another embodiment, includes a processor set, one or more computer-readable storage media, and program instructions stored on the one or more storage media to cause the processor set to perform the foregoing method. Other aspects and embodiments of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the drawings, illustrate by way of example the principles of the invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram of a computing environment, in accordance with one embodiment of the present invention. FIG. 2 is a flowchart of a method, in accordance with one embodiment of the present invention. FIG. 3 depicts a storage system infrastructure, in accordance with one embodiment of the present invention. FIG. 4 is a flowchart of a method, in accordance with one embodiment of the present invention. FIGS. 5A-5B depict plots of the I/O behavior of various ransomware samples, in accordance with several embodiments of the present invention. FIGS. 6A-6C depict a time progression, in accordance with several embodiments of the present invention. DETAILED DESCRIPTION The following description is made for the purpose of illustrating the general principles of the present invention and is not meant to limit the inventive concepts claimed herein. Further, particular features described herein can be used in combination with other described features in each of the various possible combinations and permutations. Unless otherwise specifically defined herein, all terms are to be given their broadest possible interpretation including meanings implied from the specification as well as meanings understood by those skilled in the art and/or as defined in dictionaries, treatises, etc. It must also be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless otherwise specified. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The following description discloses several preferred embodiments of systems, methods and computer program products for using feature vector windows to infer and mitigate ransomware activity. In one general embodiment, a method includes adding an incoming first feature vector to an outer window. In response to a determination that the first feature vector is a qualifying feature vector, the first feature vector is added into a voting window. In response to a determination that the outer window is full, a relatively oldest feature vector is removed from the outer window. The method further includes using feature vectors in the voting window to infer ransomware activity. In another general embodiment, a computer program product includes one or more computer-readable storage media, and program instructions stored on the one or more storage media to perform the foregoing method. In another general embodiment, a computer system includes a processor set, one or more computer-readable storage media, and progr