Search

US-12619738-B2 - Generating test data

US12619738B2US 12619738 B2US12619738 B2US 12619738B2US-12619738-B2

Abstract

A technique is directed to generating test data to be written in a predefined testing environment. The technique includes receiving an entropy value indicating an amount of entropy of write data of a completed write operation to be mimicked by the test data, and a deduplication value indicating a target deduplication ratio of the test data in the predefined testing environment. The technique further includes identifying, based on the entropy value, a particular data buffer from a predefined plurality of data buffers. The technique further includes, after identifying the particular data buffer, modifying at least part of the particular data buffer based on the deduplication value to generate a modified data buffer as the test data.

Inventors

  • Rustem Rafikov
  • Christopher Jones
  • Philippe Armangau
  • Sathya Krishna Murthy
  • Bruce A. Zimmerman

Assignees

  • DELL PRODUCTS L.P.

Dates

Publication Date
20260505
Application Date
20240126

Claims (18)

  1. 1 . A method of generating test data to be written in a predefined testing environment, the method comprising: receiving an entropy value indicating an amount of entropy of write data of a completed write operation to be mimicked by the test data, and a deduplication value indicating a target deduplication ratio of the test data in the predefined testing environment; identifying, based on the entropy value, a particular data buffer from a predefined plurality of data buffers; and after identifying the particular data buffer, modifying at least part of the particular data buffer based on the deduplication value to generate a modified data buffer as the test data; wherein modifying at least part of the particular data buffer includes: replacing at least part of the particular data buffer with a predefined sequence of bits to increase an amount of deduplication available to be performed on the modified data buffer.
  2. 2 . The method of claim 1 , further comprising: replacing at least part of a second data buffer with a unique sequence of bits to decrease an amount of deduplication available to be performed on the second data buffer as modified, the unique sequence of bits being unique from data stored in the predefined testing environment.
  3. 3 . The method of claim 1 , further comprising: pre-generating the particular data buffer to include a first group of bits and a second group of bits, each bit in the first group of bits having a value of zero, a number of bits in the first group of bits being based on a target entropy range.
  4. 4 . The method of claim 3 wherein modifying at least part of the particular data buffer includes: adjusting the second group of bits while refraining from adjusting the first group of bits to maintain entropy of the particular data buffer within the target entropy range.
  5. 5 . The method of claim 1 , further comprising: generating, as the predefined plurality of data buffers, a first set of data buffers and a second set of data buffers, the first set of data buffers having a respective first set of entropy values within a first range of entropy values, the second set of data buffers having a respective second set of entropy values within a second range of entropy values; wherein identifying the particular data buffer from the predefined plurality of data buffers includes: mapping the entropy value of the write data to the first range of entropy values; and in response to mapping the entropy value of the write data to the first range of entropy values, providing the particular data buffer from the first set of data buffers.
  6. 6 . The method of claim 1 wherein receiving the entropy value includes: obtaining the entropy value from an input/output (IO) trace of the completed write operation, the completed write operation being performed on a first platform running a first operating system (OS); and wherein the method further comprises: writing the test data on, as the predefined testing environment, a second platform running a second OS that is different from the first OS.
  7. 7 . The method of claim 1 , further comprising: receiving an IO size of the write data to be mimicked by the test data, the IO size being larger than the particular data buffer; and combining the modified data buffer with a second modified data buffer to generate a combined data buffer that matches the IO size of the write data.
  8. 8 . The method of claim 1 , further comprising: after modifying at least part of the particular data buffer to generate the modified data buffer, receiving an instruction to generate additional test data; in response to receiving the instruction to generate additional test data, identifying the particular data buffer from the predefined plurality of data buffers; and modifying the particular data buffer to generate a second modified data buffer as the additional test data, the second modified data buffer being different from the modified data buffer.
  9. 9 . The method of claim 1 wherein the completed write operation occurred as part of an actual ransomware attack; and wherein the method further comprises: issuing a write request to store the test data to storage of the testing environment as part of a simulated ransomware attack in the testing environment, the simulated ransomware attack being a simulation of the actual ransomware attack.
  10. 10 . An electronic apparatus, comprising: memory; and control circuitry coupled with the memory, the memory storing instructions that, when carried out by the control circuitry, cause the control circuitry to perform a method of generating test data to be written in a predefined testing environment, the method including: receiving an entropy value indicating an amount of entropy of write data of a completed write operation to be mimicked by the test data, and a deduplication value indicating a target deduplication ratio of the test data in the predefined testing environment; identifying, based on the entropy value, a particular data buffer from a predefined plurality of data buffers; and after identifying the particular data buffer, modifying at least part of the particular data buffer based on the deduplication value to generate a modified data buffer as the test data; wherein modifying at least part of the particular data buffer includes: replacing at least part of the particular data buffer with a unique sequence of bits to decrease an amount of deduplication available to be performed on the modified data buffer, the unique sequence of bits being unique from data stored in the predefined testing environment.
  11. 11 . The electronic apparatus of claim 10 , further comprising: replacing at least part of a second data buffer with a predefined sequence of bits to increase an amount of deduplication available to be performed on the second data buffer as modified.
  12. 12 . The electronic apparatus of claim 10 , further comprising: pre-generating the particular data buffer to include a first group of bits and a second group of bits, each bit in the first group of bits having a value of zero, a number of bits in the first group of bits being based on a target entropy range.
  13. 13 . The electronic apparatus of claim 12 wherein modifying at least part of the particular data buffer includes: adjusting the second group of bits while refraining from adjusting the first group of bits to maintain entropy of the particular data buffer within the target entropy range.
  14. 14 . The electronic apparatus of claim 10 , further comprising: generating, as the predefined plurality of data buffers, a first set of data buffers and a second set of data buffers, the first set of data buffers having a respective first set of entropy values within a first range of entropy values, the second set of data buffers having a respective second set of entropy values within a second range of entropy values; wherein identifying the particular data buffer from the predefined plurality of data buffers includes: mapping the entropy value of the write data to the first range of entropy values; and in response to mapping the entropy value of the write data to the first range of entropy values, providing the particular data buffer from the first set of data buffers.
  15. 15 . The electronic apparatus of claim 10 wherein receiving the entropy value includes: obtaining the entropy value from an input/output (IO) trace of the completed write operation, the completed write operation being performed on a first platform running a first operating system (OS); and wherein the method further comprises: writing the test data on, as the predefined testing environment, a second platform running a second OS that is different from the first OS.
  16. 16 . The electronic apparatus of claim 10 , further comprising: receiving an IO size of the write data to be mimicked by the test data, the IO size being larger than the particular data buffer; and combining the modified data buffer with a second modified data buffer to generate a combined data buffer that matches the IO size of the write data.
  17. 17 . A computer program product having a non-transitory computer readable medium that stores a set of instructions to generate test data to be written in a predefined testing environment, the set of instructions, when carried out by computerized circuitry, causes the computerized circuitry to perform a method of: receiving an entropy value indicating an amount of entropy of write data of a completed write operation to be mimicked by the test data, and a deduplication value indicating a target deduplication ratio of the test data in the predefined testing environment; identifying, based on the entropy value, a particular data buffer from a predefined plurality of data buffers; and after identifying the particular data buffer, modifying at least part of the particular data buffer based on the deduplication value to generate a modified data buffer as the test data; wherein modifying at least part of the particular data buffer includes: replacing at least part of the particular data buffer with a predefined sequence of bits to increase an amount of deduplication available to be performed on the modified data buffer.
  18. 18 . The method of claim 1 , wherein modifying at least part of the particular data buffer further includes: prior to replacing at least part of the particular data buffer with the predefined sequence of bits, acquiring the predefined sequence of bits from bits previously written to storage of the predefined testing environment.

Description

BACKGROUND Providers of ransomware protection tools test their tools in a controlled environment in which IO operations are performed on data. The ransomware protection tools monitor the IO operations to detect and mitigate ransomware attacks. For lack of availability, security concerns, or otherwise, providers of ransomware protection tools use simulated user data during testing, rather than actual user data that was read or written during a ransomware attack. Such simulated user data contains random data that mimics the actual user data. SUMMARY Unfortunately, there are deficiencies in generating simulated user data that accurately reflects the actual user data that was read or written during a ransomware attack. In some cases, the simulated user data contains only some of the properties of the actual user data and fails to reflect other properties that are indicative of a ransomware attack (e.g., high entropy, low dedupability, etc.). Differences between the simulated user data and the actual user data detrimentally affect the development and effectiveness of ransomware protection tools. In contrast to the above-described conventional data storage system, improved techniques are directed to generating test data by modifying a predefined data buffer to accurately reflect actual write data. Along these lines, an IO trace provides an entropy value of write data for a completed write operation. The predefined data buffer is selected based on the entropy value. The predefined data buffer is then modified to generate a modified data buffer that meets a target deduplication ratio. In this manner, test data is quickly and efficiently generated while accurately reflecting the actual write data. One embodiment is directed to a method of generating test data to be written in a predefined testing environment. The method includes receiving an entropy value indicating an amount of entropy of write data of a completed write operation to be mimicked by the test data, and a deduplication value indicating a target deduplication ratio of the test data in the predefined testing environment. The method further includes identifying, based on the entropy value, a particular data buffer from a predefined plurality of data buffers. The method further includes, after identifying the particular data buffer, modifying at least part of the particular data buffer based on the deduplication value to generate a modified data buffer as the test data. Another embodiment is directed to an electronic apparatus that includes memory and control circuitry coupled with the memory. The memory stores instructions that, when carried out by the control circuitry, cause the control circuitry to perform a method of generating test data to be written in a predefined testing environment, the method including: A. receiving an entropy value indicating an amount of entropy of write data of a completed write operation to be mimicked by the test data, and a deduplication value indicating a target deduplication ratio of the test data in the predefined testing environment;B. identifying, based on the entropy value, a particular data buffer from a predefined plurality of data buffers; andC. after identifying the particular data buffer, modifying at least part of the particular data buffer based on the deduplication value to generate a modified data buffer as the test data. Yet another embodiment is directed to a computer program product having a non-transitory computer readable medium that stores a set of instructions to generate test data to be written in a predefined testing environment, the set of instructions, when carried out by computerized circuitry, causes the computerized circuitry to perform a method of: A. receiving an entropy value indicating an amount of entropy of write data of a completed write operation to be mimicked by the test data, and a deduplication value indicating a target deduplication ratio of the test data in the predefined testing environment;B. identifying, based on the entropy value, a particular data buffer from a predefined plurality of data buffers; andC. after identifying the particular data buffer, modifying at least part of the particular data buffer based on the deduplication value to generate a modified data buffer as the test data. In some embodiments, modifying at least part of the particular data buffer includes replacing at least part of the particular data buffer with a predefined sequence of bits to increase an amount of deduplication available to be performed on the modified data buffer. In some embodiments, modifying at least part of the particular data buffer includes replacing at least part of the particular data buffer with a unique sequence of bits to decrease an amount of deduplication available to be performed on the modified data buffer. The unique sequence of bits is unique from data stored in the predefined testing environment. In some embodiments, the method further includes pre-generating the particular