CN-117472959-B - GSKIPLIST-based efficient block chain query system and dynamic construction method

CN117472959BCN 117472959 BCN117472959 BCN 117472959BCN-117472959-B

Abstract

The invention discloses a GSKIPLIST-based efficient block chain query system and a dynamic construction method, and belongs to the field of computers. The GSKIPLIST-based blockchain efficient query system comprises a blockchain subsystem, a 'group jump table' subsystem and a database, wherein the blockchain subsystem comprises all transactions and blocks, the 'group jump table' subsystem comprises a multi-level Node tree, a plurality of MetaD Node sets and a parameter file, and the database is used for storing all data in a durable manner. A dynamic construction method of a high-efficiency block chain query system based on GSKIPLIST comprises the steps of reading a parameter file, reading block chain data and constructing an initial preset 'group jump table' subsystem, starting a query service to the outside, monitoring a new block and a user query request, responding to the user query request and dynamically maintaining the structure of the 'group jump table' subsystem. The invention can effectively accelerate the block chain inquiry efficiency.

Inventors

LI XUDONG
LI ZELIN
REN WEN

Assignees

南开大学

Dates

Publication Date: 20260505
Application Date: 20231102

Claims (10)

1. A GSKIPLIST-based blockchain efficient query system is characterized by comprising a blockchain subsystem, a group jump table subsystem and a database, wherein the blockchain subsystem comprises all transactions and blocks, each transaction at least comprises a transaction ID (transactionID), transaction time, transaction participators, transaction header hashes and transaction content information, each block comprises transactions in a given time, the blocks are related together in a chained mode according to the time generated by the blocks, and each block at least comprises a block ID (BlockID), a block number, a block start-stop time, a block start time and a block stop time, the block chain subsystem is called GSKIPLIST subsystem for short, which comprises a multi-stage Node tree composed of Node nodes, the GSKIPLIST subsystem also comprises a plurality of MetaD Node sets composed of MetaD nodes, each Node at least comprises Node ID, namely NodeID, left key, namely KeyLeft, right key, namely KeyRight, ID of the first MetaD Node, namely MetaDID, Search times is SCount, the previous Node pointer array Prev and the Next Node pointer array Next information, each Node contains MetaD Node sets corresponding to the range which is larger than or equal to the left Key and smaller than the right Key, namely [ KeyLeft, keyRight ], each Node also constructs a plurality of levels of chain association with other Node nodes through a pointer array Prev and a pointer array Next, and each MetaD Node at least comprises an ID of MetaD Node, i.e. MetaDID, a Key, i.e. Key, a corresponding Node NodeID, a pointer array, The corresponding transaction ID in the blockchain, namely the transactionID, the corresponding blockchain blockID, namely BlockID, the MetaDID of the previous MetaD Node, the Prev of the previous MetaD Node, the MetaDID of the Next MetaD Node, the GSKIPLIST subsystem also comprises a GSKIPLIST parameter file to give the definition of the multi-level Node tree, and the GSKIPLIST parameter file at least comprises the ID of GSKIPLIST, namely GSL_ID, and the search field set of Gskip list, namely GSL_ SEARCHFIELDS, The method comprises the steps that the maximum level N, node of the Node tree is the initial preset rule type R, node Node upgrading threshold U, node Node degradation threshold D, metaD Node fission threshold S, metaD Node merging threshold M, the maximum memory space value G occupied and the persistent storage decision threshold P information of an initial preset level Q, node Node tree of the Node tree, and all Node data and MetaD Node data in a GSKIPLIST subsystem are stored in a database; the database is used to store transaction data and block data for the blockchain subsystem, and the database is also used to store Node and MetaD Node data for the GSKIPLIST subsystem.
2. A method for dynamically constructing a GSKIPLIST-based blockchain efficient query system as in claim 1, the method comprising the specific steps of: Step 1, starting a system, and setting GSKIPLIST parameter files to be read; Step 2, reading GSKIPLIST parameter files to obtain an ID of GSKIPLIST, namely gsl_id, a search field set of Gskiplist, namely an initial preset rule type R, node node upgrading threshold U, node node degradation threshold D, metaD node fission threshold S, metaD node merging threshold M of a maximum level N, node node tree of the gsl_ SEARCHFIELDS, NODE node tree, a maximum memory space value G occupied and persistent storage decision threshold P information; Step3, reading block chain data and constructing an initial preset group jump table subsystem, wherein the specific steps are as follows: Step 3.1, reading the blockchain data to construct a level 0 Node tree and a plurality of MetaD Node sets according to the search field set GSKIPLIST in the step 2, namely GSL_ SEARCHFIELDS; step 3.2, constructing Node trees from 1 st to Q th level of initial preset on the basis of the Node tree of the 0 th level according to the Node tree of the initial preset level Q in the step 2; step 4, the system starts to provide the block chain inquiry service to the outside; step 5, monitoring new area block generation of the block chain subsystem, and simultaneously monitoring and receiving a query request of a user; Step 6, judging the monitoring response type, if the new block of the block chain subsystem is generated, executing step 17, and if the new block of the block chain subsystem is a query request of the user side, executing step 7; step 7, analyzing the inquiry request of the user side; Step 8, converting the query request in the step 7 into a key value range [ Vsb, vse ] to be queried; Step 9, gradually searching from the highest level of the GSKIPLIST subsystem to the 0th level, searching all Node nodes corresponding to the key value range [ Vsb, vse ] given in the 8 th step, wherein the highest level of the GSKIPLIST subsystem starts to be K, but the highest level dynamically changes along with the structural change of the GSKIPLIST subsystem; step 10, further searching all MetaD nodes corresponding to the Key value range [ Vsb, vse ] given in step 8, among all Node nodes searched in step 9, namely, searching MetaD nodes whose keys, namely keys, need to be in the Key value range [ Vsb, vse) given in step 8; Step 11, all MetaD nodes retrieved in the step 10 further find related blockchain data in the blockchain according to transaction ID, i.e. TransactionID, blockid, i.e. BlockID information in each MetaD node; Step 12, further calculating a final query result of the query request of the user side according to all the blockchain data searched in the step 11; Step 13, adding 1 to the search times of each Node, namely SCount, for all the Node nodes searched in the step 9; Step 14, dynamically adjusting the structure of GSKIPLIST subsystems by adopting a synchronous or asynchronous mode, wherein the specific steps are as follows: Step 14.1, dynamically adjusting the structure of the GSKIPLIST subsystem in a synchronous or asynchronous mode, namely dynamically adjusting the upgrading or degradation of Node nodes and dynamically adjusting the splitting or merging of MetaD Node sets; step 14.2, adopting a synchronous or asynchronous mode to carry out persistent storage decision and implementation on the GSKIPLIST subsystem, and storing all memory data of the GSKIPLIST subsystem into a database; step 15, returning the final query result calculated in the step 12 to the user side; Step 16, judging whether the system continues to operate, if so, jumping to step 5 to continue to operate, and if not, jumping to step 18; step 17, reading a new block of the block chain, and updating contents of the GSKIPLIST subsystem, namely updating a plurality of MetaD Node sets and Node trees according to transaction information in the new block; Step 18, implementing persistent storage on the GSKIPLIST subsystem in a synchronous mode, and storing all memory data of the GSKIPLIST subsystem into a database; and step 19, ending the system execution.
3. The method of claim 2, wherein the constructing the level 0 Node tree and the plurality of MetaD Node sets for the GSKIPLIST subsystem in step 3.1 is sequentially reading blockchain blocks and transaction data, and the method steps of constructing the level 0 Node tree and the plurality of MetaD Node sets according to the search field set of GSKIPLIST in step 2, namely gsl_ SEARCHFIELDS, can be adopted as follows: Step 3.1.1, sequentially reading block data from the 0 th block to the maximum ID block of the blockchain, wherein each read block is abbreviated as Bi, wherein 0< = i < the maximum ID block, and performing the following treatment on each block Bi; Step 3.1.2, traversing each transaction in the block Bi, wherein each transaction is abbreviated as Tj, wherein 0< = j < the maximum transaction number of the block Bi, and processing each transaction Tj as follows; Step 3.1.3, calculating a key and a value (key, value) for the transaction Tj according to gsl_ SEARCHFIELDS, which is the search field set GSKIPLIST in step 2, and recording the result as (Kj, vj); Step 3.1.4, creating a MetaD node MDj for the transaction Tj, wherein the Key of the MetaD node MDj has a value of Kj, the transaction ID of the TransactionID of the MetaD node MDj is Tj, and the block ID of BlockID of the MetaD node MDj is Tj; step 3.1.5, in GSKIPLIST subsystem, searching the 0 th level Node tree in turn, finding a Node Nm whose key value range contains Kj, the key value range of the Node Nm is marked as [ x, y ], wherein x < = y; Step 3.1.6, inserting MetaD Node MDj information in step 3.1.4 into the MetaD Node set of Node Nm in step 3.1.5; Step 3.1.7, calculating the number of MetaD nodes owned by the Node Nm in the step 3.1.5, if the number of MetaD nodes owned by the Node Nm reaches the MetaD Node fission threshold S read in the step2, calculating a t value so that the Node Nm can be split into two new Node nodes by fission, wherein the key value ranges of the two new Node nodes are [ x, t) and [ t, y ] respectively, and the principle of selecting the t value is MetaD nodes which are approximately the same in number as much as possible; step 3.1.8, judging whether all transactions of the block Bi are traversed, if all transactions of the block Bi are traversed, jumping to the 3.1.9 th step to continue execution, otherwise jumping to the 3.1.2 th step to continue processing the next transaction; Step 3.1.9, judging whether all blocks of the block chain are traversed, if so, jumping to 3.1.10 to continue execution, otherwise, jumping to 3.1.1 to continue processing the next block; step 3.1.10, the program process ends.
4. The method of claim 2, wherein the step 3.2 is to construct a Node tree of 1 st to K-level initial presets for the GSKIPLIST subsystem, and upgrade a part of Node nodes of the Node tree of 0 th level according to an initial preset rule type R of Q, node Node tree of the Node tree of GSKIPLIST parameter files, so as to build Node tree of 1 st to K-level step by step, where the Node tree initial preset rule type R includes rule types including a rule of two-half, a rule of three-half, a rule of N-half and a custom rule, where the rule of two-half refers to upgrade operation of one Node in every two Node nodes, where the rule of three-half refers to upgrade operation of one Node in every three Node, where the rule of N-half refers to upgrade operation of one Node, where the Node is selected by a designer in the Node tree of the same level.
5. The method of claim 2 wherein the step 14.1 of dynamically adjusting Node upgrade for dynamically adjusting the GSKIPLIST subsystem is based on whether the Node upgrade threshold U condition in the GSKIPLIST parameter file is satisfied by SCount, which is the search number of each Node, and if the upgrade condition is satisfied, the Node is upgraded in the Node tree, wherein the Node upgrade method comprises the steps of: Step 14.1.U.1, obtaining the highest level of the Node to be upgraded and marking as g, and upgrading the Node to be upgraded into g+1 level; step 14.1.U.2, in the g level of Node tree, obtaining the left side link pointer array Prev [ g ] of Node to be upgraded to obtain left side adjacent Node, if the left side adjacent Node has no g+1 level, continuing to obtain left side adjacent Node of the left side adjacent Node until obtaining a left side adjacent Node with g+1 level; Step 14.1.U.3, in the g level of Node tree, obtaining the right side link pointer array Next [ g ] of Node to be upgraded to obtain the right side adjacent Node; if the right adjacent Node has no g+1 level, continuing to acquire the right adjacent Node of the right adjacent Node until acquiring a right adjacent Node with g+1 level; Setting the right side linked pointer array Next [ g+1] of the left side adjacent Node obtained in the 14.1.u.2 step as the Node to be upgraded, and further setting the left side linked pointer array Prev [ g+1] of the right side adjacent Node obtained in the 14.1.u.3 step as the Node to be upgraded; step 14.1.U.5, further setting the left side linked pointer array Prev [ g+1] of the Node to be upgraded as the left side adjacent Node obtained in the step 14.1.U.2, and setting the right side linked pointer array Next [ g+1] of the Node to be upgraded as the right side adjacent Node obtained in the step 14.1. U.3; And step 14.1.U.6, finishing upgrading the Node to be upgraded to g+1 level, and finishing the upgrading treatment.
6. The method of claim 2 wherein the step 14.1 of dynamically adjusting the degradation of Node nodes for dynamically adjusting the structure of the GSKIPLIST subsystem is based on whether the number of searches of each Node, SCount, satisfies a Node degradation threshold D condition in a GSKIPLIST parameter file, and if the degradation condition is satisfied, the Node is degraded in a Node tree, wherein the step of degrading the Node is as follows: Step 14.1.D.1, obtaining the highest level of the Node to be demoted and marking as g, and demoting the Node to be demoted to g-1 level; Step 14.1.D.2, in the g level of Node tree, obtaining the left side link pointer array Prev [ g ] of Node to be demoted to obtain the left side adjacent Node; Step 14.1.D.3, in the g level of Node tree, obtaining the right side link pointer array Next [ g ] of Node to be demoted to obtain the right side adjacent Node; Step 14.1.D.4, setting the right side link pointer array Next [ g ] of the left side adjacent Node obtained in the step 14.1.D.2 as the right side adjacent Node obtained in the step 14.1.D.3, and further setting the left side link pointer array Prev [ g ] of the right side adjacent Node obtained in the step 14.1.D.3 as the left side adjacent Node obtained in the step 14.1. D.2; step 14.1.D.5, setting the Prev [ g ] and Next [ g ] of Node nodes to be demoted as null; And 14.1.D.6, the Node to be upgraded is demoted to g-1 level, and the demotion process is finished.
7. The method of claim 2, wherein the splitting of the GSKIPLIST subsystem dynamically adjusting MetaD Node set in step 14.1 is for a given Node, if the number of MetaD nodes of the MetaD Node set pointed by MetaDID in the Node is greater than or equal to MetaD Node fission threshold S in the GSKIPLIST parameter file, the given Node becomes a Node to be split, which can be split into two new Node nodes with disjoint key value ranges, and the original MetaD Node set is further split into two new MetaD Node sets according to the key value ranges of the two new Node nodes, each new Node points to a new MetaD Node set, and the new Node with the key value range on the left inherits the link relation of the Node in the Node tree, and the splitting of the MetaD Node set is further as follows: Step 14.1.s.1, obtaining a key value range [ KeyLeft, keyRight ] of Node nodes to be split, and obtaining a MetaD Node set pointed by MetaDID in the Node nodes to be split; Step 14.1.s.2, sorting Key keys of MetaD nodes for the MetaD node set in the step 14.1.s.1, and selecting a median value Key M of the Key keys; Step 14.1.s.3, dividing the Node nodes to be split into two new Node nodes according to the Key M in the step 14.1.s.2, wherein the key value ranges of the two new Node nodes are [ Key left, key M ], [ Key M, key Right ], the new Node on the left inherits the link relation of the Node given originally in the Node tree, and the new Node on the right is the Node which is completely newly built; step 14.1.s.4, traversing the MetaD Node set pointed by the new Node on the left side in the step 14.1.s.3, and migrating all MetaD nodes of the Key value range [ Key left, key M) of the new Node with the Key Key not on the left side to the MetaD Node set pointed by the new Node on the right side in the step 14.1.s.3; And 14.1.s.5, finishing splitting of the MetaD node set, and finishing splitting treatment.
8. The method of claim 2 wherein the step 14.1 of dynamically adjusting MetaD Node sets of dynamically adjusting the GSKIPLIST Node subsystem is performed with respect to two adjacent Node nodes in the Node tree of level 0, if the number of MetaD nodes of the MetaD Node set pointed by MetaDID in the two Node nodes is smaller than MetaD Node merging threshold M in the GSKIPLIST parameter file, the two Node nodes are merged into a new Node, and the corresponding two MetaD Node sets are also merged together, further, the link relationship of the two Node nodes in the Node tree is inherited by the new Node, and the specific method is that the highest level of each of the two Node nodes in the Node tree is obtained first, then the Node with the highest level is selected as the new Node, if the level is the same, one Node is selected as the new Node, and the link relationship of the other original Node in the Node tree is merged into the new Node.
9. The method according to claim 2, wherein the specific method steps of performing the persistent storage decision and implementing the persistent storage decision for GSKIPLIST subsystem in step 14.2 are as follows: step 14.2.1, firstly, acquiring GSKIPLIST subsystem actual occupied memory space value; Step 14.2.2, calculating the ratio r of the actual occupied memory space value of the GSKIPLIST subsystem to the occupied maximum memory space value G in the GSKIPLIST parameter file; 14.2.3, judging whether the ratio r in 14.2.2 is larger than or equal to a persistent storage decision threshold P in a GSKIPLIST parameter file, if so, implementing persistent storage, otherwise, implementing no persistent storage; and step 14.2.4, finishing the persistent storage decision process.
10. The method of claim 2 wherein the implementing of persistent storage decision and implementation of the GSKIPLIST subsystem in step 14.2 uses a database as a carrier for persistent storage, the database may use a key-value database or a relational database, the database storage format may use a key-value format (key, value) where key is a key and value is a value corresponding to the key, the key in a Node is a Node id, value is a recursive length prefix code of other fields, i.e., RLP code, and value is a result of RLP code processing on a set of all other fields other than Node id in a Node, and key in a MetaD Node is a MetaDID, value value which is a result of RLP code processing on a set of all other fields other than MetaDID in a MetaD Node.

Description

GSKIPLIST-based efficient block chain query system and dynamic construction method Technical Field The invention belongs to the field of computers, and mainly aims to improve the search efficiency of block chain inquiry. Background The blockchain is a decentralized and non-tamperable distributed database, and has wide prospects in finance, internet of things, public service, notarization, digital rights and traceability. As the number of users and data stored in a blockchain increases, the throughput requirements of blockchain systems continue to increase, but the following problems exist: 1. with the rapid growth of the data volume of the blockchain, the structure and algorithm of the existing blockchain system are difficult to support the data efficient query operation in a large-scale data scene. Under the existing block chain structure and algorithm, one query involves accessing a plurality of disk files at the bottom layer, so that a large number of disk IO are generated, and the performance of query operation is greatly influenced; 2. With the rapid growth of blockchain data volume, existing blockchain systems have difficulty in efficiently implementing complex queries, such as Top-k queries, range queries, equivalence queries, and the like. However, there is a great deal of real life demand for blockchain complex queries. For example, a blockchain investor wants to know the data of the last quarter transaction total, the maximum transaction total, the total transaction total, the average transaction amount and the like, and hopes to estimate the current running state of the blockchain system according to the data and judge whether the blockchain system is suitable for buying and selling the investment, so that the user hopes to provide an interface for querying the ethernet data in a certain range by the blockchain system, but the current blockchain system can only search the range data one by one from the underlying database such as LevelDB, and the query efficiency is low and the computing resource is wasted. To solve the above problem, the main solution is to store the blockchain data by constructing an external database, for example EtherQL synchronizes and parses the blockchain data from the blockchain system, and the process of parsing the blockchain data redesigns the storage format of the data and stores the parsing result in the mongo db. MongoDB is an efficient and extensible NoSQL database, and the efficiency of block chain data query in MongoDB can be obviously improved, and various complex queries for data are supported. Although the above two problems can be solved. However, the use of the external database has the limitations that 1) the non-tamper-resistant and decentralizing properties of the data cannot be guaranteed, and meanwhile, the additional overhead of the system is increased, and 2) the query efficiency is determined by the external database, so that the problems of single-point failure, data loss, data tampering and the like can occur, and therefore, a huge security hole exists in the out-of-chain storage. Other typical schemes are as follows: Inquiry optimization of blockchain systems for hybrid indexing [ J ]. Computer science, 2020,47 (10): 301-308, see Zheng Haohan, shen Derong, nie Tiezheng, et al. A new solution is proposed in this document, firstly dividing the blockchain data into different attributes, then according to the different data attributes, combining the Merkle tree of the blockchain itself with various index structures, providing a new index-MHerkle tree, namely, when each block is generated, all transactions in the block are ordered according to time or gas values, and the ordered transactions are reconstructed into corresponding merck prefix trees and stored in the block body, so that the transactions in the block can be ensured to be ordered according to a certain order, and a bloom filter is added to rapidly screen whether the transactions are in the block body of the region, the structure enhances the query performance of the blockchain under the condition of fully ensuring the non-tamper modification of the blockchain, and finally, an index construction algorithm of MHerkle tree is designed, and a query algorithm based on different attributes and a range query algorithm are provided according to indexes. The method has the advantages that local range searching and top-k searching can be achieved, the query efficiency can be improved by using the bloom filter, the defect that global range searching and top-k searching cannot be achieved, the bloom filter has the problem of false positive, and the efficiency improvement is limited. Document 2: zhang Hong, wei Zhongqi A blockchain system query model based on a two-level indexing mechanism [ J ]. Computer application 2022,42 (S2): 129-134. A query model based on account public keys and smart contracts is presented herein. The model establishes a secondary index on hash values of the block and the transa