CN-122019526-A - R tree space index design method for DRAM-NVM hybrid storage architecture

CN122019526ACN 122019526 ACN122019526 ACN 122019526ACN-122019526-A

Abstract

The invention provides an R Tree space index design method for a DRAM-NVM hybrid storage architecture, which relates to the technical field of computer data storage and has the core design thought that internal nodes are deployed on a DRAM, leaf nodes are placed on the NVM, a targeted data read-write operation flow is designed, a minimum expansion strategy is adopted around a scene of less read and write in an index structure and an operation mechanism, and a leaf node data structure and an in-situ update algorithm on the NVM are designed in a targeted manner, so that the write request processing efficiency can be improved on the premise of ensuring the data consistency. The proposal of multi-word data expansion and efficient fault recovery better meets the actual demands of users by allowing more flexible personalized configuration and shortening service suspension time. The invention is suitable for the application fields of computer technology, database technology, data storage and the like.

Inventors

HAN BING
Shen chengyou
ZHANG KAIQI

Assignees

哈尔滨工业大学
哈尔滨工业大学重庆研究院

Dates

Publication Date: 20260512
Application Date: 20241112

Claims (10)

1. The R tree spatial index design method for the DRAM-NVM hybrid storage architecture is characterized by comprising the following steps: Step 1, building an index engine internal module of an R tree; Step 2, storing the internal nodes and leaf nodes in DRAM and NVM of the index engine internal module respectively; step 3, through structural design of leaf nodes, all the leaf nodes form a persistent linked list; And 4, finishing data updating and data query according to the design.
2. The method for designing the R tree spatial index for the DRAM-NVM hybrid storage architecture according to claim 1, wherein the index engine internal module of the R tree comprises a DRAM & NVM memory management component, a basic configuration and data structure layer, an index internal operation layer and an external interface layer; The DRAM & NVM memory management component is configured to provide a canonical memory allocation and release API for DRAM and NVM; the configuration and data structure layer is used for loading the personalized configuration of the engine during deployment, maintaining all data structures in the running process of the engine, and calling the memory allocation and release API at any time; the index internal operation layer is used for completing insertion, deletion and inquiry, and executing node splitting, merging and redistribution; the external interface layer is used for providing insertion, deletion and inquiry for the user by downwards calling the index internal operation layer and analyzing and presenting the returned result.
3. The method for designing the R-tree spatial index for the DRAM-NVM hybrid memory architecture according to claim 2, wherein step 3 specifically comprises: compared with the internal nodes, the leaf nodes are added with two bit zone bits and four 8-byte pointers, wherein the four 8-byte pointers are respectively candidate pointers next 0 、next 1 , precursor pointers prev and dptr; the metadata of each leaf node is made up of two marker bits alt, del and a bitmap of the remaining bit organization.
4. The method for designing an R-tree spatial index for a DRAM-NVM hybrid memory architecture of claim 3, wherein the flag bit alt is used to indicate a valid successor pointer; the flag bit del is used to indicate whether the current dptr is valid; A valid dptr points to a leaf node that is to be released by the end of merging but not yet released at the present time; The precursor pointer prev always points to the precursor node of the current node.
5. The method for designing the R-tree spatial index for the DRAM-NVM hybrid memory architecture of claim 4, wherein the data update includes insert and delete; the insertion operation is specifically as follows: step 1, determining an insertion path from top to bottom based on a minimum expansion strategy, and expanding corresponding MBRs in each internal node on the path; step 2, writing the new entry into a corresponding leaf node in the NVM, and ensuring the complete writing of the new entry through a persistence operation; step 3, setting corresponding bit in the leaf node bitmap after confirming that the new entry is written in error, so that the new entry is effective, and ensuring that the bitmap is updated again through a persistence operation; the deleting operation specifically includes: step 1, finding out the node where the deleted item is located from top to bottom; Step 2, invalidating the entry to be deleted by the corresponding bit position 0 in the node bitmap, and ensuring that the bitmap is updated by using a persistence operation; and 3, shrinking and deleting MBRs corresponding to each internal node on the path from bottom to top.
6. The method for designing the R tree spatial index for the DRAM-NVM hybrid memory architecture according to claim 5, wherein the leaf node splitting in the NVM is realized by a leaf node splitting method with consistency assurance; and combining the leaf nodes in the NVM through the del flag bit and the dptr in the leaf nodes.
7. The method for designing the R-tree spatial index for the DRAM-NVM hybrid memory architecture according to claim 5, wherein the leaf nodes are designed by sharing the multi-word metadata of the byte.
8. The efficient fault recovery method implemented by the R tree spatial index design method for DRAM-NVM hybrid storage architecture according to any one of claims 1-7, wherein the method is: The mapping and analyzing stage of the NVM memory pool file comprises the steps of permanently storing an NVM layer data structure of an index engine on an NVM hardware medium in the form of the memory pool file, finding and opening a corresponding memory pool file through a path which is pre-configured and hard coded into a program when fault recovery starts, and directly mapping the corresponding memory pool file into a program virtual address space without cache; Traversing leaf node linked list forward, checking each leaf node and modifying to ensure linked list is consistent; the leaf node arrangement stage comprises traversing the linked list again, calculating MBR of each node, and combining the MBR with the offset of the node in the pool to obtain an entry to be inserted into the bottom-layer internal node; And in the reconstruction stage of the DRAM layer index structure, an R tree index structure is constructed in the DRAM by using the entries, and the structure reconstruction can be completed by creating an empty internal node in the DRAM as a new index tree and inserting the entries one by one.
9. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the computer program performs a method for designing an R tree spatial index for a DRAM-NVM hybrid storage architecture according to any one of claims 1 to 7.
10. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and wherein the processor performs an R-tree spatial index design method for a DRAM-NVM hybrid memory architecture according to any one of claims 1-7 when the processor runs the computer program stored in the memory.

Description

R tree space index design method for DRAM-NVM hybrid storage architecture Technical Field The invention relates to the technical field of computer data storage, in particular to an R tree spatial index oriented to a DRAM-NVM hybrid storage architecture. Background As a large basic component of the space-time database system, the R tree space index engine greatly improves the efficiency of inquiring and acquiring specific data under the background of massive multidimensional data. The provided functions such as range query, nearest neighbor query, astronomical line query, top-k dominant query and the like are widely applied to map navigation application, geographic information system, electronic prototype (digital mock-up, responsible for carrying out real computer simulation on products) and machine learning multidimensional feature vector access. However, the classical R-tree index and its variants are mainly considered in the proposal to be constructed and run in the traditional memory (mainly referred to as dynamic random access memory, simply referred to as DRAM), and are intended to provide a query acceleration service for large-scale multidimensional data stored on the high-capacity external memory such as SSD, HDD, and the like. However, the method has to be limited by the defects of high price, poor expandability, power-off data loss and the like of the DRAM while utilizing the high read-write performance of the DRAM, and the limitation gradually becomes the development bottleneck of the current R tree space index engine. With the development of computer storage technology, a new type of memory, i.e., non-volatile memory (NVM or PM) or PERSISTENT MEMORY, also called persistent memory, has appeared and is in business use in recent years. NVM incorporates the byte addressing and memory non-volatile features of DRAM. The current NVM technology mainly includes PCM, STT-RAM, reRAM, 3D XPoint, etc., and 3DXPoint technology is adopted as a first commercial and unique non-volatile Memory widely deployed and applied in industry, namely, intel Optane DC PERSISTENT Memory Modules (DCPMM). Although NVM has certain advantages in terms of non-volatility, addressing mode, storage capacity, delay time, energy consumption, manufacturing cost, etc., it is also clear that the read delay of the currently practically available NVM hardware devices is generally several times that of DRAM, the write endurance is also very limited, and the gap between the two is still not negligible. Thus the DRAM-NVM hybrid architecture will take part in the computer memory architecture as the primary form of NVM over a relatively long period of time, as shown in fig. 1. R trees were proposed by Guttman in 1984. As a mainstream form of the development of B-trees into multidimensional space, it uses a minimum bounding rectangle (minimumbounding rectangle, abbreviated as MBR) to represent multidimensional data objects. As shown in FIG. 2, the non-leaf nodes of the R tree store (R, ptr) entries, where ptr is a pointer to one of its child nodes, R is the MBR (also called directory rectangle or directory rectangle, D1, D2 and D3 in the figure) containing all rectangles in this child node, and similarly the leaf nodes store (R, obj_id) entries, where R is the data rectangle (DATA RECTANGLE, R1-R7 in the figure), obj_id points to a specific description of the data. Many queries on the R-tree index are top-down. For example, a range query retrieves rectangles overlapping the query rectangle from root to leaf nodes to find all records enclosed by the query rectangle, while a nearest neighbor query queries k nearest records to the query point by calculating the shortest distance from the point to the rectangle and the distance from the point to each object within the rectangle. The insert operation may cause leaf nodes to overflow, resulting in node splitting on the insert path. The most classical two node splitting methods proposed by Guttman are the linear method (LINEAR SPLIT method) and the quadratic method (quadratic split method), which are both heuristic methods, and both attempt to minimize the MBR obtained after splitting. However, classical R-trees do not adequately consider the splitting strategy, which makes it possible that when one node is about to split, other nodes of the same level are not yet filled, thereby reducing space utilization. On the other hand, minimizing MBR does not directly avoid overlapping. When a query rectangle overlaps multiple MBRs, a query has to access multiple subtrees, resulting in reduced query efficiency. Therefore, after the R tree, a series of R tree variants are designed and realized in academia and industry to further improve indexes such as query efficiency, space utilization rate and the like, and the development sequence according to time is summarized as follows: The R+ tree ensures that one point is covered by one sub-node at most during searching by splitting one object and respectively storing t