CN-122019461-A - High-throughput RRAM content addressing memory and genome analysis searching method

CN122019461ACN 122019461 ACN122019461 ACN 122019461ACN-122019461-A

Abstract

The invention discloses a high-flux RRAM content addressing memory and a genome analysis searching method, which realize higher storage density, lower storage cost and higher parallel searching throughput rate through the collaborative design of circuit-level structure optimization and searching algorithm, and can be widely applied to gene database retrieval, pattern recognition, edge intelligent calculation and other high-parallel pattern matching processing systems.

Inventors

HA YAJUN
JIANG CHENXIN
ZHANG SHEN

Assignees

上海科技大学

Dates

Publication Date: 20260512
Application Date: 20260113

Claims (5)

1. The high-throughput RRAM content addressing memory and genome analysis searching method is characterized by comprising the following steps: Designing an RRAM CAM cell structure with a time domain comparison function, optimizing a comparison function circuit by using RRAM one-hot coding and pre-charge-discharge logic, so that a single cell can realize multiple period comparison, and fuzzy matching is realized through shift input; And combining a high-throughput DSHD-CAM architecture of the CAM array as a high-throughput RRAM content addressing memory, and adopting a dynamic SHD search algorithm to carry out collaborative optimization so as to realize genome analysis search.
2. The high throughput RRAM content-addressable memory and genome analysis search method of claim 1, wherein the CAM array memory redundancy is reduced by storing multiple reference k-mers in a single row, compressing overlapping segments between adjacent k-mers.
3. The high-throughput RRAM content-addressable memory and genome analysis search method of claim 1, wherein the dynamic SHD search algorithm is performed by directly fast locating inputs to locations of high matching probabilities.
4. The high throughput RRAM content addressable memory and genome analysis search method of claim 1, wherein the single cell achieves comparison of a current reference base and current and adjacent input bases by multicycle comparison by precharging and charge retention of Mid nodes.
5. The high throughput RRAM content addressable memory and genome analysis search method of claim 1, wherein a multi-level threshold is set for the number of matching bases using statistical properties of DNA sequences, low matching potential sequences and high matching potential sequences are distinguished according to the multi-level threshold, the low matching potential sequences are early filtered, and the high matching potential sequences trigger a complete time-consuming SHD calculation.

Description

High-throughput RRAM content addressing memory and genome analysis searching method Technical Field The invention belongs to the technical field of integrated circuit design and in-memory computation, and particularly relates to a high-throughput RRAM content-addressable memory and a genome analysis searching method. Background Genome analysis has become a key support technology in applications such as personalized medicine, virus monitoring and infectious disease prevention and control, one of the core steps is read mapping (READ MAPPING), and the read mapping requires comparing a large number of sequencing read lengths with a large-scale reference genome database, and the reference sequence size can reach billions of base pairs, so the process is usually the most intensive link in calculation in the whole genome analysis pipeline. [1] Content-addressable memory (CAM) is widely used to accelerate read mapping, relying on its massive parallel comparison capability, but existing CAM architectures for genome-oriented analysis are still subject to multiple factors in terms of throughput. [2-6] the related art currently has mainly the following three problems: 1, the cell area is large, in order to support approximate search functions such as Hamming distance, editing distance and the like, the traditional genome analysis CAM cell needs a complex circuit structure, so that the integration density is limited, the on-chip storage capacity is insufficient, and the overall throughput rate is further limited. For example, existing SHD CAMs (e.g., shiftCAM) employ cell structures of about 13T4R1C, which are relatively large in area overhead. [6] 2, Array storage redundancy is high, in order to adapt to a k mer comparison flow, a reference genome is usually preprocessed into fragments with the length of k and stored in a mode of high overlapping of adjacent fragments, and the overlapping between adjacent k mers reaches k1 Base, resulting in storage redundancy in the array that can approach about 98%, severely reduces the effective storage utilization and supportable reference database size. [2-6] The searching algorithm is low in efficiency, high-precision genome analysis generally depends on complex comparison flows such as shift Hamming distance (SHIFTED HAMMING DISTANCE, SHD) and the like, the similarity needs to be repeatedly calculated at a plurality of shift positions, high searching delay is brought, and meanwhile, due to the fact that an effective early filtering mechanism for low-matching potential candidates is lacked, a large number of invalid comparisons are caused, and overall throughput is obviously reduced. [6] In the above-mentioned background, it is necessary to propose a novel CAM architecture synergistically optimized in three aspects of cell structure, array storage organization and search algorithm, so as to significantly improve on-chip storage efficiency and read mapping throughput of genome analysis while ensuring alignment accuracy. [1]A. F. Laguna, H. Gamaarachchi, X. Yin, M. Niemier, S. Parameswaran, and X. S. Hu, "Seed-and-vote based In-memory accelerator for DNA read mapping," in 2020 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2020, pp. 1–9. [2]I. Merlin, E. Garz´on, A. Fish, and L. Yavits, "DIPER: Detection and identification of pathogens using edit distance-tolerant resistive CAM," IEEE Transactions on Computers, vol. 73, no. 10, pp. 2463–2473, 2024. [3]Z. Jahshan, I. Merlin, E. Garz´on, and L. Yavits, "DASH-CAM: Dynamic approximate SearcH content addressable memory for genome classification," in 2023 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, pp. 1453–1465. [4]R. Hanhan, E. Garz´on, Z. Jahshan, A. Teman, M. Lanuzza, and L. Yavits, "EDAM: Edit distance tolerant approximate matching content addressable memory," in Proceedings of the 49th Annual International Symposium on Computer Architecture, ser. ISCA '22. New York, NY, USA: Association for Computing Machinery, 2022, pp. 495–507. [5]H. Zhong, Z. Chen, W. Huangfu, C. Wang, Y. Xu, T. Wang, Y. Yu, Y. Liu, V. Narayanan, H. Yang, and X. Li, "ASMCap: An approximate string matching accelerator for genome sequence analysis based on capacitive content addressable memory," in 2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–6. [6]P. He, R. Mao, K. Shan, Y. Tong, Z. Xu, M. Peng, R. Luo, and C. Li, "ShiftCAM: A time-domain content addressable memory utilizing shifted hamming distance for robust genome analysis," in Proceedings of the 43rd IEEE/ACM International Conference on Computer-aided Design. New York, NY, USA: Association for Computing Machinery, 2025, no. 87, pp. 1–9. Disclosure of Invention The technical scheme of the invention aims to solve the problems of large cell area, high data storage redundancy, low pattern matching efficiency, limited throughput rate and the like of the traditional RRAM-CAM architecture based on Static Random-Access Memory (SRAM) and embedded