US-20260127080-A1 - DATA RECOVERY AFTER DIE FAILURE IN MEMORY DEVICE
Abstract
An example method of performing data recovery after a die failure in a memory device includes: identifying, among a plurality of dies of the memory device, a first die experiencing a failure; identifying, by iterating over an address space associated with the first die, a valid data item stored on the first die; responsive to determining that a data recovery workload condition is satisfied, recovering the valid data item; and storing the valid data item on a second die of the plurality of dies.
Inventors
- Ke-Cheng Liu
- Xue-Sen Yang
- Jing-Ping Lu
Assignees
- MICRON TECHNOLOGY, INC.
Dates
- Publication Date
- 20260507
- Application Date
- 20241104
Claims (20)
- 1 . A system, comprising: a memory device; and a processing device, operatively coupled to the memory device, the processing device to: identify among a plurality of dies of the memory device, a first die experiencing a failure; identify, by iterating over an address space associated with the first die, a valid data item stored on the first die; responsive to determining that a data recovery workload condition is satisfied, recover the valid data item; and store the valid data item on a second die of the plurality of dies.
- 2 . The system of claim 1 , wherein the valid data item is a block storing valid data.
- 3 . The system of claim 1 , wherein the data recovery workload condition is represented by a system error count comprising a count of read requests issued for data recovery.
- 4 . The system of claim 3 , wherein the system error count further comprises a count of system errors.
- 5 . The system of claim 1 , wherein the address space is a logical address space of the memory device.
- 6 . The system of claim 1 , wherein the address space is a physical address space of the first die.
- 7 . The system of claim 1 , wherein recovering the valid data utilizes redundancy metadata stored on a third die of the plurality of dies.
- 8 . A method, comprising: identifying, by a processing device, among a plurality of dies of the memory device, a first die experiencing a failure; identifying, by iterating over an address space associated with the first die, a valid data item stored on the first die; responsive to determining that a data recovery workload condition is satisfied, recovering the valid data item; and storing the valid data item on a second die of the plurality of dies.
- 9 . The method of claim 8 , wherein the valid data item is a block storing valid data.
- 10 . The method of claim 8 , wherein the data recovery workload condition is represented by a system error count comprising a count of read requests issued for data recovery.
- 11 . The method of claim 10 , wherein the system error count further comprises a count of system errors.
- 12 . The method of claim 8 , wherein the address space is a logical address space of the memory device.
- 13 . The method of claim 8 , wherein the address space is a physical address space of the first die.
- 14 . The method of claim 8 , wherein recovering the valid data utilizes redundancy metadata stored on a third die of the plurality of dies.
- 15 . A non-transitory computer-readable storage medium comprising instructions that, when executed by a processing device, cause the processing device to: identify, among a plurality of dies of the memory device, a first die experiencing a failure; identify, by iterating over an address space associated with the first die, a valid data item stored on the first die; responsive to determining that a data recovery workload condition is satisfied, recover the valid data item; and store the valid data item on a second die of the plurality of dies.
- 16 . The non-transitory computer-readable storage medium of claim 15 , wherein the valid data item is a block storing valid data.
- 17 . The non-transitory computer-readable storage medium of claim 15 , wherein the data recovery workload condition is represented by a system error count comprising a count of read requests issued for data recovery.
- 18 . The non-transitory computer-readable storage medium of claim 15 , wherein the address space is a logical address space of the memory device.
- 19 . The non-transitory computer-readable storage medium of claim 15 , wherein the address space is a physical address space of the failed die.
- 20 . The non-transitory computer-readable storage medium of claim 15 , wherein recovering the valid data utilizes redundancy metadata stored on a second die of the plurality of dies.
Description
TECHNICAL FIELD Implementations of the disclosure relate generally to memory sub-systems, and more specifically, relate to performing data recovery after a die failure in memory devices. BACKGROUND A memory sub-system may include one or more memory devices that store data. The memory devices may be, for example, non-volatile memory devices and volatile memory devices. In general, a host system may utilize a memory sub-system to store data at the memory devices and to retrieve data from the memory devices. BRIEF DESCRIPTION OF THE DRAWINGS The disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various implementations of the disclosure. The drawings, however, should not be taken to limit the disclosure to the specific implementations, but are for explanation and understanding only. FIG. 1A illustrates an example computing system that includes a memory sub-system in accordance with aspects of the present disclosure. FIG. 1B illustrates a block diagram of a memory device in communication with a memory sub-system in accordance with aspects of the present disclosure. FIG. 2 schematically illustrates an example layout of a memory device, in accordance with aspects of the present disclosure. FIG. 3A is a high-level flow diagram of an example method of identifying valid data blocks by iterating over physical address space of a chosen die, by a memory sub-system controller operating in accordance with aspects of the present disclosure. FIG. 3B is a high-level flow diagram of an example method of identifying valid data blocks by iterating over logical address space of the memory device, by a memory sub-system controller operating in accordance with aspects of the present disclosure. FIG. 4 is a high-level flow diagram of an example method 400 of decoding encoded codewords by a memory sub-system controller operating in accordance with aspects of the present disclosure. FIG. 5 is a block diagram of an example computer system in which implementations of the present disclosure may operate. DETAILED DESCRIPTION Aspects of the present disclosure are related to data recovery after a die failure in memory devices. A memory sub-system may be a storage device, a memory module, or a combination of a storage device and memory module. Examples of storage devices and memory modules are described below in conjunction with FIG. 1. In general, a host system may utilize a memory sub-system that includes one or more components, such as memory devices that store data. The host system may provide data to be stored at the memory sub-system and may request data to be retrieved from the memory sub-system. A memory sub-system may utilize one or more memory devices, including any combination of the different types of non-volatile memory devices and/or volatile memory devices, to store the data provided by the host system. In some implementations, a memory sub-system may be represented by a solid-state drive (SSD), which may include one or more non-volatile memory devices. In some implementations, the non-volatile memory devices may be provided by negative-and (NAND) type flash memory devices. Other examples of non-volatile memory devices are described below in conjunction with FIG. 1. A non-volatile memory device is a package of one or more dies. Each die may include one or more planes. A plane is an independently accessible portion of a die, such that several planes of a die may be accessed concurrently. “Block” is the minimal erasable unit of memory, which includes multiple memory pages. “Page” is the minimal writable unit of memory, which includes multiple memory cells. A memory cell is an electronic circuit that stores information. A memory device may include multiple memory cells arranged in a two-dimensional grid. The memory cells are formed onto a silicon wafer in an array of columns and rows. A memory cell includes a capacitor that holds an electric charge and a transistor that acts as a switch controlling access to the capacitor. Accordingly, the memory cell may be programmed (written to) by applying a certain voltage, which results in an electric charge being held by the capacitor. The memory cells are joined by wordlines, which are conducting lines electrically connected to the control gates of the memory cells, and bitlines, which are conducting lines electrically connected to the drain electrodes of the memory cells. Depending on the cell type, each memory cell may store one or more bits of binary information and has various logic states that correlate to the number of bits being stored. The logic states may be represented by binary values, such as “0” and “1”, or combinations of such values. A memory cell may be programmed (written to) by applying a certain voltage to the memory cell, which results in an electric charge being held by the memory cell, thus allowing modulation of the voltage distributions produced by the memory cell. A set of memory cells referre