Search

US-12625765-B2 - Data encoding method for 3D NAND flash memory

US12625765B2US 12625765 B2US12625765 B2US 12625765B2US-12625765-B2

Abstract

A computer-implemented method for encoding data to be stored in a solid-state storage device. The method includes the steps of, providing a source data; calculating an entropy of the source data; applying a first encoding scheme to encode the source data into an encoded data, if the entropy is below a threshold; and applying a second data encoding scheme to encode the source data into the encoded data, if the entropy is equal to or higher than the threshold. The first encoding scheme is adapted to encode most frequently combined characters in the source data into predetermined characters to increase ratio of the predetermined characters in the encoded data. The second data encoding scheme is adapted to encode evenly distributed combined characters into a plurality of patterns with the predetermined characters, in order to increase the ratio of the predetermined characters in the encoded data.

Inventors

  • Chun Xue
  • Qiao Li
  • Tei-Wei Kuo
  • Min Ye
  • Shangyu WU
  • Yufei Cui

Assignees

  • CITY UNIVERSITY OF HONG KONG

Dates

Publication Date
20260512
Application Date
20240402

Claims (16)

  1. 1 . A computer-implemented method for encoding data to be stored in a solid-state storage device, comprising the steps of: a) providing a source data; b) calculating an entropy of the source data; c) applying a first encoding scheme to encode the source data into an encoded data, if the entropy is below a threshold; the first encoding scheme adapted to encode most frequently combined characters in the source data into predetermined characters to increase ratio of the predetermined characters in the encoded data; wherein the step of applying a first encoding scheme to encode the source data into an encoded data further comprises: i. counting frequencies of all combinations of characters, wherein each said combination contains a first number of the characters; ii. sorting the combinations according to their frequencies; iii. identifying one said combination that has a highest frequency as the most frequently combined characters; iv. determining a second number of said combinations, of which said frequencies immediately follow that of the most frequently combined characters; wherein the second number equals to or is smaller than a square of the first number minus one; and v. encoding the second number of said combinations and the most frequently combined characters into different combinations of the predetermined characters; d) applying a second data encoding scheme to encode the source data into the encoded data, if the entropy is equal to or higher than the threshold; the second data encoding scheme adapted to encode evenly distributed combined characters into a plurality of patterns with the predetermined characters, in order to increase the ratio of the predetermined characters in the encoded data.
  2. 2 . The computer-implemented data encoding method of claim 1 , wherein the first number is two or four.
  3. 3 . A computer-implemented data encoding method for encoding data to be stored in a solid-state storage device, comprising the steps of: a) providing a source data; b) calculating an entropy of the source data; c) applying a first encoding scheme to encode the source data into an encoded data, if the entropy is below a threshold; the first encoding scheme adapted to encode most frequently combined characters in the source data into predetermined characters to increase ratio of the predetermined characters in the encoded data; wherein the step of applying a first encoding scheme to encode the source data into an encoded data further comprises: i. counting frequencies of all combinations of characters, wherein each said combination contains a first number of the characters; ii. sorting the combinations according to their frequencies; iii. identifying one said combination that has a highest frequency as the most frequently combined characters; iv. determining a second number of said combinations, of which said frequencies immediately follow that of the most frequently combined characters; and v. encoding the second number of said combinations and the most frequently combined characters into different combinations of the predetermined characters; d) applying a second data encoding scheme to encode the source data into the encoded data, if the entropy is equal to or higher than the threshold; the second data encoding scheme adapted to encode evenly distributed combined characters into a plurality of patterns with said predetermined characters, in order to increase the ratio of the predetermined characters in the encoded data; wherein Step v) is conducted based on a mapping table, the mapping table created upon a first write request into the solid-state storage drive.
  4. 4 . The computer-implemented data encoding method of claim 3 , wherein the mapping table is stored in an out-of-band (OOB) area of a page within a flash memory of the solid-state storage drive.
  5. 5 . A computer-implemented data encoding method for encoding data to be stored in a solid-state storage device, comprising the steps of: a) providing a source data; b) calculating an entropy of the source data; c) applying a first encoding scheme to encode the source data into an encoded data, if the entropy is below a threshold; the first encoding scheme adapted to encode most frequently combined characters in the source data into predetermined characters to increase ratio of the predetermined characters in the encoded data; and d) applying a second data encoding scheme to encode the source data into the encoded data, if the entropy is equal to or higher than the threshold; the second data encoding scheme adapted to encode evenly distributed combined characters into a plurality of patterns with the predetermined characters, in order to increase the ratio of the predetermined characters in the encoded data; wherein the step of applying the second data encoding scheme to encode the source data into the encoded data further comprises: i) identifying a pair of combinations of characters including a first combination and a second combination, the first combination having more the predetermined characters than the second combination; ii) encoding the first combination into a first one of the plurality of patterns; and iii) encoding the second combination into a second one of the plurality of patterns; wherein the first combination and the second combination each contain a first number of the characters; the first and second ones of the plurality of patterns each containing a second number of the predetermined characters; the first number being smaller than the second number; and wherein a first number of most significant characters in the first one of the plurality of patterns are identical to a first number of most significant characters in the second one of the plurality of patterns.
  6. 6 . The computer-implemented data encoding method of claim 5 , wherein a difference between the first number and the second number is one.
  7. 7 . The computer-implemented data encoding method of claim 4 , wherein Step ii) and Step iii) are conducted based on a mapping table, the mapping table being pre-defined prior to Step d).
  8. 8 . The computer-implemented data encoding method of claim 5 , wherein the second combination contains none of the predetermined characters, and the first combination consists entirely of the predetermined characters.
  9. 9 . The computer-implemented data encoding method of claim 1 , wherein Step b) further comprises conducting integer operations on the source data based on a log lookup table.
  10. 10 . The computer-implemented data encoding method of claim 1 , wherein Step b) is performed by an embedded processor in a controller of the solid-state storage device.
  11. 11 . The computer-implemented data encoding method of claim 1 , wherein the predetermined characters represent voltage states in the solid-state storage device.
  12. 12 . The computer-implemented data encoding method of claim 2 , wherein a number of the predetermined characters is two.
  13. 13 . The computer-implemented data encoding method of claim 1 , wherein the first encoding scheme is skewed coding scheme, and the second encoding scheme is reversed Huffman coding.
  14. 14 . A non-transitory computer-readable memory recording medium having computer instructions recorded thereon, the computer instructions, when executed on one or more processors, causing the one or more processors to perform operations according to the method according to claim 1 .
  15. 15 . A computing system comprising: a) one or more processors; and b) memory containing instructions that, when executed by the one or more processors, cause the computing system to perform operations according to the method of claim 1 .
  16. 16 . The computing system of claim 15 , further comprising a solid-state storage device controller; said one or more processors being one or more embedded processors in the solid-state storage device controller.

Description

BACKGROUND This disclosure relates to the field of data storage and archiving, and in particular to encoding schemes for cold data storage. 3D NAND (Not-AND) flash memories are now prevailing in storage systems, including data centers and mobile devices, because of the high-performance and high-density characteristics [26, 34]. To reduce the cost per bit, 3D flash memories stack a great number of layers in a flash block and many cells in a horizontal word line for large capacities [12, 30]. The complex connection of flash cells in 3D flash blocks introduces new leakage paths of charges stored in flash cells, where the stored data suffer from high raw bit error rates (RBERs) [14, 20, 40]. RBER variations among pages and among layers raise further challenges for the reliability concerns in modern high-density 3D NAND flash memory [21, 25, 45]. A randomizer is unanimously applied in today's 3D NAND flash products to avoid extreme data patterns with extra-high RBERs. Through randomization, the original data is encoded into evenly distributed data over all voltage states, such as 8 voltage states in 3-bits-per-cell triple-level cell (TLC) flash. Data randomization is a simple but effective method to avoid programming lots of cells to voltage states that are prone to voltage shifting. Existing studies have presented that high voltage states tend to suffer from more voltage shifting [2, 11, 22]. This phenomenon has been well studied in planar flash memories, with different strategies proposed for reliability enhancement. For example, some conventional art observed that more than 90% bit errors occur due to the voltage shifting of the two high voltage states in planar 2-bits-per-cell multi-level cell (MLC) flash memories. Based on this observation, several approaches are proposed to reduce the RBER of flash memories, including flipping the data bits to generate more low voltage states [9, 10, 18, 36, 46], and eliminating one or several high voltage states [11, 35, 43]. The variations among voltage states also exist on 3D TLC NAND flash memories [22], and several studies adopt the same mapping principle to generate more low voltage states on 3D TLC flash memories [35, 42]. However, it is found that reducing high voltage states or increasing low voltage states does not work well on high-density 3DNAND flash memories. The following references are referred to throughout this specification, as indicated by the numbered brackets: [1] Yu Cai, Ning Chen, Yunxiang Wu, Erich F Haratsch, Earl T Cohen, and Timothy L Canepa. Method to distribute user data and error correction data over different page types by leveraging error rate variations, Feb. 16, 2016. U.S. Pat. No. 9,262,268.[2] Yu Cai, Saugata Ghose, Erich F Haratsch, Yixin Luo, and Onur Mutlu. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. Proceedings of the IEEE, 105(9): 1666-1704, 2017.[3] Yu Cai, Saugata Ghose, Yixin Luo, Ken Mai, Onur Mutlu, and Erich FHaratsch. Vulnerabilities in mlc nand flash memory programming: Experimental analysis, exploits, and mitigation techniques. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 49-60, 2017.[4] Yu Cai, Erich F Haratsch, Onur Mutlu, and Ken Mai. Error patterns in mlc nand flash memory: Measurement, characterization, and analysis. In 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 521-526, 2012.[5] Criteo Dataset. https://ailab.criteo.com/ressources/., 2014.[6] Kaggle Datasets. Amazon reviews for sentiment analysis. https://www.kaggle.com/datasets/bittlingmayer/amazonreviews, 2017.[7] Kaggle Datasets. Nips papers. https://www.kaggle.com/datasets/benhamner/nips-papers, 2017.[8] Kaggle Datasets. Wikipedia sentences. https://www.kaggle.com/datasets/mikeortman/wikipedia-sentences, 2018.[9] Masafumi Doi, Shuhei Tanakamaru, and Ken Takeuchi. A scaling scenario of asymmetric coding to reduce both data retention and program disturbance of nand flash memories. Solid-State Electronics, 92:63-69, 2014.[10] Jie Guo, Danghui Wang, Zili Shao, and Yiran Chen. Data-pattern-aware error prevention technique to improve system reliability. IEEE Trans-actions on Very Large Scale Integration (VLSI) Systems, 25(4): 1433-1443, 2017.[11] Jie Guo, Wujie Wen, Jingtong Hu, Danghui Wang, Hai Li, and Yiran Chen. Flex level: A novel nand flash storage system design for ldpc latency reduction. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1-6, 2015.[12] Tsutomu Higuchi, Takuyo Kodama, Koji Kato, Ryo Fukuda, Naoya Tokiwa, Mitsuhiro Abe, Teruo Takagiwa, et al. 30.4 a 1tb 3b/cell 3d-flash memory in a 170+ word-line-layer technology. In Proceedings of the 68th IEEE International Solid-State Circuits Conference (ISSCC), volume 64, 2021.[13] Duwon Hong, Myungsuk Kim, Geonhee Cho, Dusol Lee, and Jihong Kim. Guardederase: Extending ssd lifetimes by protecting weak word-lines. In 20th USENIX Conference on File and Storage Tec