Search

DE-102025115506-A1 - OPTIMIZED KV METADATA STORAGE FOR MACHINE TRAINING APPLICATIONS

DE102025115506A1DE 102025115506 A1DE102025115506 A1DE 102025115506A1DE-102025115506-A1

Abstract

Metadata generated during the lifetime of machine learning and artificial intelligence systems is valuable. However, such metadata can also be generated after data generation and thus written to the storage device later. Further support for KV databases at the storage device level can improve performance in terms of transfers per second. This is due to the removal of the translation layer on the host, which was previously required for data storage. Removing the translation layer eliminates two layers of mapping and transaction information. This increases the number of transactions per second, write gains, and read gains, while reducing latency. Additionally, future metadata additions are accommodated by reserving excess memory when storing the start key value. Future additions are then stored in the reserved memory.

Inventors

  • Alexander Bazarsky
  • DAVID AVRAHAM
  • Ran ZAMIR

Assignees

  • SanDisk Technologies, Inc.

Dates

Publication Date
20260513
Application Date
20250422
Priority Date
20241113

Claims (20)

  1. A data storage device comprising: a storage device; and a controller coupled to the storage device, the controller being configured to: receive data from a host, wherein the data is key-value pair data (KV data); store the data in a KV namespace, wherein: the KV namespace comprises a key and a value; the key addresses the value; the value comprises a plurality of flash management units (FMUs); receive a write command from the host to write second data to the KV namespace, wherein the write command to write metadata is received after receiving the KV pair data; and store the second data in the KV namespace.
  2. Data storage device according to Claim 1 , where the second data are metadata and where a key of the metadata corresponds to a key of the data.
  3. Data storage device according to Claim 1 , wherein the controller is further configured to receive a command from the host to update at least one weight of a machine learning (ML) model.
  4. Data storage device according to Claim 3 , wherein the command further includes a command to update at least one model component that is associated with the at least one weight.
  5. Data storage device according to Claim 4 , wherein the controller is further configured to read only the FMUs of the plurality of FMUs that correspond to the at least one model component.
  6. Data storage device according to Claim 5 , wherein the controller is further configured to select the FMUs from the multitude of FMUs that the to include at least one weight, to write and modify.
  7. Data storage device comprising: a storage device; and a controller coupled to the storage device, the controller being configured to: receive key-value pair data (KV pair data) from a host, where: the KV pair data comprises a key and a value; the key addresses the value; the value comprises a plurality of flash management units (FMUs); create a KV namespace, the KV namespace comprising a key and a value; store the KV pair data in the KV namespace; receive a write command from the host to write metadata to the KV namespace, the write command to write metadata being received after receiving the KV pair data; and store the metadata in the KV namespace.
  8. Data storage device according to Claim 7 , where the size of the KV pair data value is smaller than the size of the KV namespace value.
  9. Data storage device according to Claim 7 , wherein the controller is further configured to reserve a portion of the value of the KV namespace for storing metadata, wherein the portion is a remainder of the value of the KV namespace after storing the KV pair data in the KV namespace.
  10. Data storage device according to Claim 9 , where the controller is further configured to determine whether a metadata size is less than or equal to the part.
  11. Data storage device according to Claim 10 , wherein the controller, based on a determination that the size of the metadata is less than or equal to the part, is further configured to read, modify or write the metadata from the KV namespace.
  12. Data storage device according to Claim 10 , wherein the controller is further configured, based on a determination that the size of the metadata is larger than the part, to create a new KV namespace, the new KV namespace comprising a key and a value.
  13. Data storage device according to Claim 12 , where the controller is further configured, based on a determination that the size of the metadata is larger than the part, to store the metadata in the new KV namespace.
  14. Data storage device according to Claim 13 , wherein the controller is further configured to internally link the key of the new KV namespace with the key of the KV namespace.
  15. Data storage device according to Claim 7 , wherein the controller is further configured to determine whether a write granularity is greater than or equal to a full FMU.
  16. Data storage device according to Claim 15 , wherein the controller is further configured, based on a determination that the write granularity is greater than or equal to a full FMU, to write the metadata to the KV namespace.
  17. Data storage device according to Claim 16 , the controller is further configured to adjust write granularity based on data storage patterns and usage patterns.
  18. Data storage device according to Claim 7 , where the size of the KV pair data value is between 4 bytes and 4 gigabytes.
  19. Data storage device according to Claim 7 , wherein the storage device is a non-volatile memory.
  20. A data storage device comprising: means for storing data; and a controller coupled to the means for storing data, the controller comprising: a metadata translation module configured to determine whether there is sufficient space in a value for metadata; and a flash translation layer communicatively coupled to the metadata translation module and configured to translate KV values or logical block addresses into physical block addresses; the controller is configured to: receive key-value pair data (KV pair data) from a host, wherein: the KV pair data comprises a key and a value; the key addresses the value; the value comprises a plurality of flash management units (FMUs); create a KV namespace, wherein the KV- The namespace comprises a key and a value; storing the KV pair data in the KV namespace; receiving a write command from the host to write metadata to the KV namespace, wherein the write command to write metadata is received after receiving the KV pair data; storing the metadata in the KV namespace; and reserving a portion of the value of the KV namespace for storing metadata, wherein the portion is a remainder of the value of the KV namespace after storing the KV pair data in the KV namespace.

Description

BACKGROUND OF THE REVELATION Territory of Revelation Embodiments of the present disclosure generally relate to data storage devices, such as solid-state drives (SSDs), and in particular to the optimization of the storage of key-value pair data and associated metadata in a data storage device. Description of the state of the art A key-value database (KV database) stores a set of user data associated with a key that can be addressed as a complete entity. Examples of user data that can be stored in a KV database include photos, records, and files. From the perspective of a host device, the photo, record, or file can be retrieved using a single key/address, rather than using multiple addresses that encompass the data of the photo, record, or file. The data is stored as unstructured data and can be addressed using a variable-length key. Storage space on a storage device can be allocated to KV-pair data in byte increments, where a length value of the KV-pair data corresponds to the storage space required to hold the KV-pair data. Using a KV database in a data storage device can improve the device's performance. For example, the number of data transfers per second can be improved because the KV pair data can be removed from the physical location translation layer in the host device. Furthermore, the number of commands sent over the bus can be reduced because complete KV pair data can be transmitted in a single transfer. However, the metadata associated with the KV pair data might not be available for storage in the data storage device when the KV pair data is transferred to the device. In other words, the metadata associated with the KV pair data can be generated after the KV pair data has been programmed into the data storage device. If the metadata is not transmitted simultaneously with the associated KV pair data, additional mappings are required to address the metadata and to map the metadata to KV pair data, which increases the latency in processing commands relating to the KV pair data and associated metadata, and requires additional memory to store the additional mappings. In engineering, there is a need for optimized storage of assignments for KV pair data and associated metadata. SUMMARY OF THE REVELATION Metadata generated during the lifetime of machine learning and artificial intelligence systems is valuable. However, such metadata can also be generated after data generation and thus written to the storage device later. Further support for KV databases at the storage device level can improve performance in terms of transfers per second. This is due to the removal of the translation layer on the host, which was previously required for data storage. Removing the translation layer eliminates two layers of mapping and transaction information. This increases the number of transactions per second, write gains, and read gains, while reducing latency. Additionally, future metadata additions are accommodated by reserving excess memory when storing the start key value. Future additions are then stored in the reserved memory. In one embodiment, a data storage device includes a storage device and a controller coupled to the storage device, the controller being configured to: receive data from a host, wherein the data are key-value pair data (KV pair data); store the data in a KV namespace, wherein the KV namespace comprises a key and a value; the key addresses the value; the value comprises a plurality of flash management units (FMUs); receive a write command from the host to write second data to the KV namespace, wherein the write command to write metadata is received after receiving the KV pair data; and store the second data in the KV namespace. In another embodiment, a data storage device includes a storage device and a controller coupled to the storage device, wherein the controller is configured to receive the key-value pair data (KV pair data) from a host, wherein the KV pair data includes a key and a Include a value; the key addresses the value; the value includes a plurality of Flash Management Units (FMUs); create a KV namespace, wherein the KV namespace includes a key and a value; store the KV pair data in the KV namespace; receive a write command from the host to write metadata to the KV namespace, wherein the write command to write metadata is received after receiving the KV pair data; and store the metadata in the KV namespace. In yet another embodiment, a data storage device includes means for storing data; and a controller coupled to the means for storing data, wherein the controller comprises: a metadata translation module configured to translate metadata; and a flash translation layer communicatively coupled to the metadata translation module and configured to translate KV values or logical block addresses into physical block addresses; wherein the controller is configured to: receive the key-value pair data (KV pair data) from a host, wherein: the KV pair data comprises a key and a value; the key