US-12625764-B2 - Memory device bad column identification and compensation
Abstract
A method for managing error correction in a memory device includes identifying, by a memory controller, codewords in a memory array of the memory device that have bad columns. The method includes calculating, by the memory controller, an increased Error Correction Code (ECC) layer for each identified codeword based on a number of errors introduced by the bad columns. The method also includes redistributing, by the memory controller, ECC layers from one or more other codewords that have extra ECC layers to increase the ECC layers for the identified codewords with bad columns to enable implementation of the redistributed ECC layers on each codeword in the memory array.
Inventors
- Jun Wan
- Ying Tai
- Wei Wang
- Zhenming Zhou
Assignees
- MICRON TECHNOLOGY, INC.
Dates
- Publication Date
- 20260512
- Application Date
- 20240827
Claims (20)
- 1 . A method for managing error correction in a memory device, the method comprising: identifying, by a memory controller having an Error Correction Code (ECC) selector, codewords in a memory array of the memory device that are impacted by bad columns detected through testing or monitoring operations; calculating, by the memory controller, a specific number of additional ECC layers for each identified codeword to correct errors introduced by the bad columns, wherein the specific number is determined by assessing an impact of the bad columns on bits of each identified codeword; and redistributing, by the memory controller, ECC layers from one or more other codewords within a same memory page and plane that have extra ECC layers to increase the ECC layers for the identified codewords with bad columns to enable implementation of the redistributed ECC layers on each codeword in the memory array, wherein the redistributing preserves physical locations of data within the memory array.
- 2 . The method of claim 1 , wherein identifying codewords impacted by bad columns further comprises: executing testing and regular monitoring to detect columns that consistently exhibit errors; and assessing an impact of the bad columns on bits of each codeword to determine a number of bits in each identified codeword that have a low reliability.
- 3 . The method of claim 2 , wherein a given codeword of the identified codewords impacted by bad columns has at least 90 bits with a low reliability.
- 4 . The method of claim 1 , wherein calculating the specific number of additional ECC layers further comprises determining a number of additional ECC layers needed to correct the errors introduced by the bad columns.
- 5 . The method of claim 1 , wherein the redistributing ECC layers further comprises adjusting ECC levels across the memory array to allocate additional ECC resources to codewords impacted by bad columns and maintaining an ECC resource balance over the memory array.
- 6 . The method of claim 1 , wherein the memory device is a Not-AND (NAND) memory device.
- 7 . The method of claim 1 , wherein the ECC layers are redistributed such that a physical location of data within the memory array remains constant.
- 8 . The method of claim 1 , wherein each of the other codewords that have extra ECC layers are in a same memory page and plane as a corresponding one of the identified codewords impacted by bad columns.
- 9 . The method of claim 1 , wherein each codeword in the memory array is a low-density parity check (LDPC) codeword.
- 10 . The method of claim 1 , wherein a read window budget (RWB) for the memory array meets a threshold performance responsive to the redistributing.
- 11 . A memory sub-system comprising: a memory controller having an Error Correction Code (ECC) selector and storing data identifying codewords in a memory array of a memory device that are impacted by bad columns detected through testing or monitoring operations and storing data identifying a specific number additional ECC layers for each identified codeword based on errors introduced by the bad columns, wherein the specific number is determined by assessing an impact of the bad columns on bits of each identified codeword; and the ECC selector configured to redistribute ECC layers from one or more other codewords of the memory array within a same memory page and plane that have extra ECC layers to increase the number of ECC layers for the identified codewords impacted by bad columns, wherein the redistributing preserves physical locations of data within the memory array.
- 12 . The memory sub-system of claim 11 , wherein the memory controller is configured to perform testing and monitoring to detect columns that consistently exhibit errors and determines a number of bits with a low reliability for each identified codeword impacted by bad columns.
- 13 . The memory sub-system of claim 11 , wherein the memory controller determines a number of additional parity levels needed to correct the errors introduced by the bad columns for each identified codeword impacted by bad columns.
- 14 . The memory sub-system of claim 11 , wherein each of the other codewords that have extra ECC layers are in a same memory page and plane as a corresponding one of the identified codewords impacted by bad columns.
- 15 . The memory sub-system of claim 11 , wherein the memory controller further comprises an error-handling module configured to apply low density parity check (LDPC) decoding on codewords stored in the memory array based on the redistributed ECC layers.
- 16 . The memory sub-system of claim 11 , wherein the memory device is a Not-AND (NAND) memory device.
- 17 . The memory sub-system of claim 11 , wherein the ECC selector redistributes ECC layers such that a physical location of data within the memory array remains constant.
- 18 . A non-transitory computer-readable medium storing instructions that, when executed by a processor of a memory sub-system, cause the memory sub-system to perform operations comprising: identifying codewords in a memory array of a memory device that are impacted by bad columns detected through testing or monitoring operations; calculating a specific number of additional Error Correction Code (ECC) layers for each identified codeword based on of errors introduced by the bad columns, wherein the specific number is determined by assessing an impact of the bad columns on bits of each identified codeword; and redistributing ECC layers of the memory device from one or more other codewords within a same memory page and plane that have extra ECC layers to increase the number of ECC layers for the identified codewords impacted by bad columns to enable implementation of the redistributed ECC layers on each codeword in the memory array, wherein the redistributing preserves physical locations of data within the memory array.
- 19 . The non-transitory computer-readable medium of claim 18 , wherein the operations further comprise performing testing and monitoring the memory array of the memory device to detect columns that consistently exhibit errors and determining a number of bits with low reliability for each identified codeword impacted by bad columns.
- 20 . The non-transitory computer-readable medium of claim 18 , wherein the operations further comprise determining the number of ECC layers needed to correct the errors introduced by the bad columns for each identified codeword that is impacted by bad columns.
Description
TECHNICAL FIELD This disclosure relates to identifying bad columns on a memory device and compensating for the bad columns by redistributing error correction codes (ECCs). BACKGROUND A memory sub-system includes a memory device designed for data storage. These memory devices are implemented as non-volatile and volatile memory devices in various examples. In some such examples, a host system employs a memory sub-system for the purposes of storing data on the memory devices and for retrieving data from the memory devices. Bad columns in a non-volatile memory device, such as Not-AND (NAND) memory are defects that occur during the manufacturing process and can negatively impact the functionality and reliability of the memory device. These defects can arise from physical imperfections in the semiconductor material, errors in the photolithography process, or electrical failures such as faulty gate oxides and charge leakage. Bad columns compromise data integrity, reduce the overall yield of production batches and can degrade the performance of the memory device. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A illustrates a system for redistributing error correction codes (ECCs) on a memory sub-system. FIG. 1B illustrates a simplified block diagram of an example memory device in communication with a memory sub-system controller. FIG. 2 illustrates a graph that plots read window budget (RWB) gain in and a ratio of the raw bit error rate (RBER) and ECC capability as a function of ECC layers. FIG. 3 illustrates spreadsheets that represent portions of a memory array. FIG. 4 illustrates an example of a table that provides an example of two different deployable ECC. FIG. 5 illustrates a flowchart of an example method redistributing ECC levels to compensate for bad columns of a memory device. FIG. 6 illustrates an example of a computer system (a machine) in which examples of the present description may operate. DETAILED DESCRIPTION This description relates to an innovative approach to enhancing a yield and reliability of a non-volatile memory device (e.g., a memory integrated circuit (IC) chip), such as Not-AND (NAND) memory by dynamically redistributing Error Correction Code (ECC) capabilities across a die of the memory device. This dynamic redistribution includes adjusting a number of ECC layers allocated to different sectors or codewords based on defect densities, thereby increasing ECC protection in areas with higher concentrations of defects while reducing ECC in less affected areas. This redistribution can be managed logically by a memory controller, such that the redistribution does not require physical relocation of ECC data but rather tunes ECC application to improve error correction efficiency where most needed. This strategy not only helps in recovering more usable memory from each production batch but also enhances an overall performance and reliability of the memory device. More generally, some examples of a memory sub-system include high density non-volatile memory devices where retention of data is desired during intervals of time where no power is supplied to the memory device. One example of non-volatile memory devices is a NAND memory device. A non-volatile memory device is a package that includes a die(s). Each such die can include a plane(s). For some types of non-volatile memory devices (e.g., NAND memory devices), each plane includes a set of physical blocks, and each physical block includes a set of pages that are organized in wordlines. Each page includes a set of memory cells, which are commonly referred to as cells. A cell is an electronic circuit that stores information. A cell stores at least one bit of binary information and has various logic states that correlate to the number of bits being stored. The logic states can be represented by binary values, such as ‘0’ and ‘1’, or as combinations of such values, such as ‘00’, ‘01’, ‘10’ and ‘11’. The memory sub-system controller is configured/programmed to encode the host and other data, as part of a write operation, into a format for storage at the memory device(s). Encoding refers to a process of generating parity bits from embedded data (e.g., a sequence of binary bits) using an error correction code (ECC) and combining the parity bits to the embedded data to generate a codeword. Low-density parity check (LDPC) encoding refers to an encoding method that utilizes an LDPC code to generate the parity bits. Additionally, the memory sub-system controller can decode codewords, as part of a read operation, stored at the memory device(s) of the memory sub-system. Decoding refers to a process of reconstructing the original embedded data (e.g., sequence of binary bits) from the codeword (e.g., the encoded original embedded data) received from storage at the memory device(s). LDPC decoding refers to a decoding method that utilizes the LDPC code to reconstruct the original embedded data. An ECC layer refers to a specific implementation level of error correction c