CN-121984968-A - Layering PBFT efficient consensus method based on random modeling and reinforcement learning technology

CN121984968ACN 121984968 ACN121984968 ACN 121984968ACN-121984968-A

Abstract

The invention relates to a layering PBFT efficient consensus method based on a random modeling and reinforcement learning technology, and aims to solve the problems that the traditional PBFT consensus algorithm has poor expandability and is difficult to adapt to dynamic environment changes in a large-scale consensus node network. The method comprises a hierarchy PBFT architecture building module, a global consensus time calculation module based on random process modeling and an adaptive hierarchical grouping module based on near-end strategy optimization reinforcement learning. First, the hierarchical PBFT architecture building module converts the single-layer large-scale consensus into a hierarchical consensus, and performs PBFT consensus independently inside each layer, and achieves global consensus in a layer-by-layer iterative manner. And then, a global consensus time calculation module based on random process modeling carries out random modeling on the framework, and a quantization relation between the global consensus time and the hierarchical PBFT consensus parameters is established. And finally, according to the quantized relation sum, the hierarchical grouping module based on near-end strategy optimization reinforcement learning performs self-adaptive hierarchical grouping, so that the global consensus time is reduced.

Inventors

CHANG XIAOLIN
LIU ZHILAI
Jiang Yueqi
FAN JUNCHAO
YANG ZHAO
Ju Bocheng
Cai Tingsen

Assignees

北京交通大学

Dates

Publication Date: 20260505
Application Date: 20251231

Claims (5)

1. The hierarchical PBFT efficient consensus method based on the random modeling and reinforcement learning technology is characterized by comprising the following steps of: step 1, constructing a hierarchical PBFT consensus architecture; step 2, a consensus flow of a hierarchical PBFT consensus architecture; step3, semi-Markov process modeling and global consensus time calculation formula derivation; and 4, optimizing self-adaptive hierarchical grouping of reinforcement learning based on a near-end strategy.
2. The hierarchical PBFT efficient consensus method based on stochastic modeling and reinforcement learning techniques according to claim 1, wherein said step 1 comprises the steps of: Step 11, configuring PBFT parameters of the consensus network, and the total node scale Wherein the number of failed nodes ; Step 12, total node number per layer Initially, will Randomly divided into Groups, each group scale to Each group independently executes a complete standard PBFT consensus process; step 13, judging each group according to the result of the step 12, wherein the group which fails to meet the conditions is regarded as an illegal group if the master node receives the commit message exceeding 2/3 backup nodes; Step 14, according to the result of step 12, when the legal group proportion of a certain layer is lower than 2/3 threshold value in a period of time, the system triggers a reorganizing mechanism, the mechanism increases the chance of forming legal groups by reshuffling nodes in illegal groups, and the reorganizing process continues until the system advances to the next layer.
3. The hierarchical PBFT efficient consensus method based on stochastic modeling and reinforcement learning techniques according to claim 2, wherein said step 2 comprises the steps of: Step 21, each group elects the master node as a representative to enter the next layer; step 22, the representative node selected in step 21 re-participates in the grouping in the next layer and performs PBFT consensus, and the process is iterated until the final layer node number is small enough, such as Stopping the iteration; Step 23, iterating steps 21 through 22 until the final layers are no longer grouped, and directly executing PBFT to generate a global consensus result.
4. The hierarchical PBFT efficient consensus method based on stochastic modeling and reinforcement learning techniques according to claim 3, wherein said step 3 comprises the steps of: step 31, build a semi-Markov sub-model for each layer, the states being defined as triples Wherein: represent the first The number of legal groups of layers, Represent the first The number of illegal groups of layers, Represent the first The task number of the layer to be processed requests meets the constraint condition Based on the initial fault distribution, determining the boundary of legal group number, here taking layer 1 as an example, the subsequent higher layers and so on, the maximum legal group number: Minimum legal group number: based on this, an illegal group number boundary and a maximum illegal group number can be determined Minimum illegal group number , Step 32, based on the system state definition of step 31, the system state transition is triggered by the task request arrival event, subject to general distribution Average rate of The triggered state transition is → Consensus completion event, legal group completion obeys general distribution Illegal group completion obeys general distribution The triggered state transition is And the nodes of the layer are regrouped before the consensus flow of the next task is started after completing one task each time, Layer reorganizing event, reorganizing time obeys general distribution Triggering the reorganization mechanism when the layer consensus times out, and the triggered state transition is that , Wherein Representing the probability that an illegal group is converted into legal after recombination; Step 33, semi-Markov process and kernel matrix construction, conditional probability kernel matrix Wherein Through the limit Obtaining a one-step transition probability matrix embedded in a discrete time Markov chain; step 34, solving the half Markov process model obtained in step 33 by adopting a two-step method, namely setting the probability of invalid state to 0 and then passing through Constraint Calculating steady state probabilities of embedded discrete time Markov chains, representing each state as By means of Calculating average residence time of each state, and calculating steady state probability of SMP model as ; Step 35, deriving a single-layer key performance index based on the solution result of the half Markov process model in step 34, wherein the average legal group number is as follows: Average illegal group number: Task blocking probability: Average response delay: Single-layer consensus success rate: , ; Step 36, deducing the overall performance of the multi-layer system based on the single-layer index result in step 35, wherein the interlayer parameter transfer relation is as follows: , , ;: 。
5. the efficient hierarchical PBFT consensus method based on stochastic modeling and reinforcement learning techniques according to claim 4, wherein said step 4 comprises the steps of: step 41, modeling the hierarchical grouping strategy problem as a Markov decision process problem, wherein the state space is the total number of the hierarchical nodes and is expressed as The action space is the selectable grouping size and is expressed as The reward function is expressed as a weighted combination of consensus success rate and average delay The state transition probability is expressed as When the termination condition is satisfied, i.e. the number of remaining nodes is less than 16, Or when the maximum layer depth is reached, the grouping process is switched into a termination state; Step 42, adopting near-end policy optimization algorithm to solve the Markov decision process problem constructed in step 41, firstly, the actor network learns the mapping policy from state to action, and inputs the mapping policy as the current state The output is the action probability distribution The critics network evaluates the state value and inputs the state value as the state Output is state value estimation Masking non-actionable actions using a masking mechanism, then starting policy execution and experience collection, observing the current state, generating an action mask identifying actionable actions, and masking policies from the post-masking policies Middle sampling action And execute to obtain rewards And the next state Memory transfer To experience playback buffer zone, repeating the above process until enough batches of experience data are collected, alternatively updating actor-critic network until convergence condition is satisfied, i.e. policy performance is stable and cost function is converged, and finally outputting learned self-adaptive grouping policy ; And 43, deploying the strategy network trained in the step 42, and adjusting a grouping scheme according to the system state in real time to optimize the hierarchy PBFT consensus time.

Description

Layering PBFT efficient consensus method based on random modeling and reinforcement learning technology Technical Field The invention belongs to the field of blockchains and artificial intelligence, and particularly relates to a layering PBFT efficient consensus method based on a random modeling and reinforcement learning technology. Background The block chain technology is used as the core of a distributed system, and a consensus mechanism is a key technology for guaranteeing data consistency and system safety, so that a foundation stone of application fields such as financial science and technology, internet of things and the like is formed. In blockchain-based financial systems, the consensus mechanism ensures fairness, reliability, and robustness of the de-centralized transaction verification. In the environment of the Internet of things, a consensus mechanism is crucial to challenges such as device isomerism, limited computing resources, trust management and the like, and is an important means for realizing safe and efficient coordination among distributed devices. Among the consensus protocols, PBFT algorithm stands out from the deterministic consensus characteristic, becomes the first consensus mechanism of alliance chains and private chains, and can ensure the safety and activity under the assumption of the Bayesian fault. Standard PBFT systems require at leastIndividual nodes, wherein at most there may beAnd (3) fault nodes. The protocol proceeds sequentially through three stages of pre-preparation, preparation and commit (summit), ensuring agreement between the fault-free copies. However, as the network scale expands, the delay and overhead incurred by conventional PBFT increases significantly, severely limiting its throughput and scalability in practical deployments such as HYPERLEDGER FABRIC. Furthermore, in a realistic distributed environment, the network topology or fault distribution is typically dynamically changing, and static optimization strategies, while simple to implement, cannot be dynamically adjusted based on performance feedback. Thus, there is a need for an efficient, scalable and adaptive PBFT consensus that allows the system to dynamically adjust the optimization strategy based on performance, thus achieving an optimal balance between scalability, delay and fault tolerance. Although the prior art improves the communication overhead, the prior art lacks an accurate quantitative analysis model, so that the system design and the strategy tuning lack theoretical basis, and the performance is difficult to optimize while the fault tolerance is ensured. Most layering schemes usually adopt a static grouping strategy, and cannot be dynamically adjusted according to the real-time state of the system. When the system environment changes (e.g., node failure rate changes, network load changes, etc.), static grouping policies may cause performance degradation or even consensus failure. And static schemes are usually based on simplifying assumptions or heuristics, and random characteristics in century systems cannot be accurately described, which makes performance analysis results deviate from actual system performance, and it is difficult to guide actual system design. Therefore, it is important to suggest a more accurate quantitative analysis model as a theoretical basis and based thereon to achieve PBFT adaptive grouping strategy optimization. Disclosure of Invention The invention aims to provide a layering PBFT efficient consensus method based on a random modeling and reinforcement learning technology aiming at overcoming the defects of the prior art, and aims to realize efficient, extensible and self-adaptive PBFT consensus under a large-scale distributed system. In order to achieve the above object, the present invention mainly comprises the steps of: step 1, constructing a hierarchical PBFT consensus architecture; step 2, a consensus flow of a hierarchical PBFT consensus architecture; step3, semi-Markov process modeling and global consensus time calculation formula derivation; and 4, optimizing self-adaptive hierarchical grouping of reinforcement learning based on a near-end strategy. The invention has the advantages of (1) deducing an expression of a performance index based on semi-Markov process modeling, providing a solid and reliable theoretical basis for system design and strategy optimization, (2) introducing a reinforcement learning technology to realize self-adaptive grouping strategy optimization based on theoretical basis stones of analysis modeling, dynamically adjusting group scale according to real-time state, ensuring fault tolerance and maximizing performance expression, (3) providing a complete framework from theoretical analysis to optimization verification, comprising a plurality of layers of semi-Markov process modeling, numerical analysis, simulation experiment, optimization and the like, and comprehensively evaluating system performance and guiding practical applica