US-20260128115-A1 - READ RETRY WITH PHYSICAL DEFECT JUDGEMENT
Abstract
Technology for retrying failed reads of non-volatile memory such as NAND. The memory system retries the failed read using one or more different read techniques than the failed read until the read is successful. The memory system may use different read reference voltages for read retries than the read reference voltages used in the failed read. After a successful read retry the memory system determines whether the original read failed due to an inherent reliability issue or a physical defect. If there is a physical defect the memory cells may be retired. In an embodiment, the entire block of memory cells having the physical defect is retired. However, if the read failed due to an inherent reliability issue then the memory cells may continue to be used.
Inventors
- Xuan Tian
- Liang Li
- Ming Wang
Assignees
- SanDisk Technologies, Inc.
Dates
- Publication Date
- 20260507
- Application Date
- 20241101
Claims (20)
- 1 . An apparatus comprising: one or more control circuits configured to control a three-dimensional memory structure having blocks containing non-volatile memory cells, the one or more control circuits configured to: perform a read retry of a group of the memory cells responsive to a failure of a read of the group of memory cells, wherein the read retry uses a different read technique than the read; and determine whether a physical defect exists in a block containing the group of memory cells responsive to the read retry succeeding in reading data from the group of memory cells.
- 2 . The apparatus of claim 1 , wherein determining whether a physical defect exists in the block comprises the one or more control circuits: determining that a physical defect exists in the block responsive to a determination that errors in the failure of the read are clustered in a region of the block containing a subgroup of the group of the memory cells.
- 3 . The apparatus of claim 1 , wherein determining whether a physical defect exists in the block comprises the one or more control circuits: comparing first read results using first read reference levels that were used in the read that failed with second read results from a successful read retry of the group of memory cells with second read reference levels.
- 4 . The apparatus of claim 3 , wherein determining whether a physical defect exists in the block comprises the one or more control circuits: determining which memory cells in the group have a different result in the second read results than the first read results, the different result being a bit flip; and determining that a physical defect exists in the block responsive to a determination that a first number of bit flips for memory cells in a first physical region of the block is more than a threshold amount greater than a second number of bit flips for memory cells in a second physical region of the block.
- 5 . The apparatus of claim 4 , wherein the first physical region includes those memory cells connected to a first segment of a word line in the block and the second physical region includes those memory cells connected to a second segment of the word line.
- 6 . The apparatus of claim 5 , further comprising: a first word line driver configured to drive the first segment of the word line; and a second word line driver configured to drive to the second segment of the word line.
- 7 . The apparatus of claim 3 , wherein determining whether a physical defect exists in the block comprises the one or more control circuits: determining which memory cells in the group have a different result in the second read results than the first read results, the different result being a bit flip; and determining that a physical defect exists in the block responsive to a determination that a number of bit flips for a physically contiguous subgroup of the group of memory cells is greater than a threshold.
- 8 . The apparatus of claim 1 , wherein the one or more control circuits are configured to: retire the block containing the group of the memory cells responsive to a determination that a physical defect exists in the block; and continue to use the block responsive to a determination that the failure of the read was due to an intrinsic reliability issue as opposed to a physical defect in the block.
- 9 . The apparatus of claim 1 , wherein the memory cells comprise NAND memory cells.
- 10 . A method for operating NAND memory, the method comprising: determining that a read error has occurred when reading a group of NAND memory cells in a block at a first set of read reference levels; determining a second set of read reference levels that are able to successfully read the group of NAND memory cells in response to the read error at the first set of read reference levels; and determining whether the read error is due to a physical defect in the block or intrinsic memory cell reliability based on differences at a memory cell level between first results of reading the group at the first set of read reference levels and second results of reading the group at the second set of read reference levels.
- 11 . The method of claim 10 , wherein determining whether the read error is due to a physical defect in the block or intrinsic memory cell reliability based on differences at a NAND memory cell level between first results of reading the group at the first set of read reference levels and second results of reading the group at the second set of read reference levels comprises: determining whether differences in reading the group of the NAND memory cells at the first set of read reference levels and second set of read reference levels are: i) localized to a region of the block containing a subgroup of the group of the NAND memory cells; or ii) randomly distributed across the group of the NAND memory cells.
- 12 . The method of claim 11 , further comprising: determining that the read error is due to a physical defect responsive to the differences being localized to the region containing the subgroup of the group of the NAND memory cells.
- 13 . The method of claim 11 , further comprising: determining that the read error is due to intrinsic cell reliability responsive to the differences being randomly distributed across the group of the NAND memory cells.
- 14 . The method of claim 10 , wherein determining whether the read error is due to intrinsic cell reliability or a physical defect based on differences at a memory cell level between first results of reading the group at the first set of read reference levels and second results of reading the group at the second set of read reference levels comprises: determining whether a first bit flip rate in a first physical region of the group is more than a threshold amount greater than a second bit flip rate in a second physical region of the group.
- 15 . The method of claim 10 , wherein determining whether the read error is due to a physical defect in the block or intrinsic memory cell reliability based on differences at a memory cell level between first results of reading the group at the first set of read reference levels and second results of reading the group at the second set of read reference levels comprises: determining whether a bit flip rate of a physically contiguous subgroup of the group of memory cells is greater than a threshold.
- 16 . A non-volatile storage system comprising: a three-dimensional memory structure having blocks containing word lines and NAND memory cells; and one or more control circuits in communication with the three-dimensional memory structure, the one or more control circuits configured to: perform one or more read retries of a group of memory cells connected to a selected word line in a block in the three-dimensional memory structure responsive to a read error using a first set of read reference levels to read the group until a second set of read reference levels are found that succeed in reading the group of memory cells; determine whether there is a physical defect in the block based on physical locations of memory cells having a different result when reading the group at the second set of the read reference levels than reading the group at the first set of the read reference levels; and retire the block responsive to a determination that there is the physical defect in the block.
- 17 . The non-volatile storage system of claim 16 , wherein the one or more control circuits configured to: determine which memory cells in the group have a different result when reading the group at the second set of the read reference levels than reading the group at the first set of the read reference levels, the different result being a bit flip; and determine that a physical defect exists in the block responsive to a determination that a first number of bit flips for memory cells in a first physical region of the block is more than a threshold amount greater than a second number of bit flips for memory cells in a second physical region of the block.
- 18 . The non-volatile storage system of claim 17 , wherein the first physical region are those memory cells connected to a first segment of the selected word line in the block and the second physical region are those memory cells connected to a second segment of the selected word line.
- 19 . The non-volatile storage system of claim 18 , further comprising: a first word line driver connected to and driving the first segment of the selected word line; and a second word line driver connected to and driving the second segment of the selected word line.
- 20 . The non-volatile storage system of claim 16 , wherein the one or more control circuits configured to: determine which memory cells in the group have a different result when reading the group at the second set of the read reference levels than reading the group at the first set of the read reference levels, the different result being a bit flip; determine a number of bit flips for each of a plurality of physical segments along the selected word line in the block to which the group of memory cells are connected; and determine that a physical defect exists in the block responsive to a determination that at least one of the plurality of physical segments has a number of bit flips greater than a threshold.
Description
BACKGROUND The present disclosure relates to non-volatile storage. Semiconductor memory is widely used in various electronic devices such as cellular telephones, digital cameras, personal digital assistants, medical electronics, mobile computing devices, servers, solid state drives, non-mobile computing devices and other devices. Semiconductor memory may comprise non-volatile memory or volatile memory. Non-volatile memory allows information to be stored and retained even when the non-volatile memory is not connected to a source of power (e.g., a battery). Modern storage systems such as, for example, solid state drives typically contain a number of semiconductor dies with each die containing memory cells such as NAND strings. Each die may contain one or more planes with each plane containing a large number of blocks. Each block contains a large number of memory cells such as NAND strings. A NAND string contains memory cell transistors connected in series, a drain side select gate at one end, and a source side select gate at the other end. Each NAND string is associated with a bit line. The block typically has many word lines that provide voltages to the control gates of the memory cell transistors. In some architectures, each word line connects to the control gate of one memory cell on each respective NAND string in the block. The block is associated with a source line. The source side select gates are used to connect or disconnect the NAND channels from the source line. The memory cells are programmed one group at a time. The unit of programming is typically referred to as a page. Typically, the memory cells are programmed to a number of data states. Using a greater number of data states allows for more bits to be stored per memory cell. For example, four data states may be used to store two bits per memory cell, eight data states may be used in order to store three bits per memory cell, 16 data states may be used to store four bits per memory cell, etc. Some memory cells may be programmed to a data state by storing charge in the memory cell. For example, the threshold voltage (Vt) of a NAND memory cell can be set to a target Vt by programming charge into a charge storage region such as a charge trapping layer. The amount of charge stored in the charge trapping layer establishes the Vt of the memory cell. At the end of a successful programming process, each memory cell's Vt should be within one of a number of Vt distributions. Once the memory cells in the memory device have been programmed, data may be read from the memory cells by sensing the programmed states of the memory cells. However, sensed states can sometimes vary from the written states due to one or more factors. Error detection and correction decoding can be used to detect and correct data errors resulting from sensed states that do not match written states. Typically, the user data is encoded as ECC (error correction code) codewords prior to programming. Therefore, an ECC engine may be used to correct errors in the encoded user data. However, there is a limit as to how many bits in the data read from the memory cells can be in error in order for the ECC algorithm to successfully correct all errors. Therefore, storage systems typically do not rely only on correcting errors in ECC codewords. Many storage systems employ one or more techniques to recover from read failures such as the failure to decode ECC codewords programmed into that block. One technique to recover from a read failure is to dynamically adjust (i.e., calibrate) the read reference voltages and then retry the read of the memory cells with the calibrated read reference voltages. The read reference voltages can be calibrated using a number of different techniques. One reason for a read failure is due to intrinsic reliability issues. Over time the amount of charge that is stored in the charge storage region of the memory cell may change, thereby leading to a change in the Vt of the memory cell. The amount of charge could change due to program disturb, read disturb, or simply charge leakage over time. Program disturb refers to the unintended change to the Vt of a memory cell when programming a different cell. Read disturb refers to the unintended change to the Vt of a memory cell when reading that cell or a different cell. Charge leakage over time is referred to a data retention issue and more briefly “data retention” (DR). Another reason for a read failure is due to a physical defect associated with the block containing the memory cells. Example physical defects include, but are not limited to, shorts such as word line to word line shorts, word line to memory cell shorts, etc. If such physical defects are detected prior to shipping the device to the customer, the block having the physical defect can be retired. However, physical defects can develop (e.g., grow) over time. It is important to detect a growing physical defect early prior to the physical defect leading to a read error tha