CN-116909475-B - Cross-frame perceived erasure code storage system equalization redundancy conversion method and device

CN116909475BCN 116909475 BCN116909475 BCN 116909475BCN-116909475-B

Abstract

The invention discloses a method and a device for balanced redundancy conversion of an erasure code storage system perceived by a cross-frame, which are used for solving the problems of large cross-frame transmission flow overhead and unbalanced load among frames in the storage redundancy conversion of the erasure code storage system; A heuristic algorithm is provided for selecting a stripe group to perform redundancy conversion in parallel so as to balance the traffic load of each rack. The invention balances the load between the racks while inhibiting the transmission flow of the cross-rack caused by the redundancy conversion, and can shorten the time for completing the redundancy conversion.

Inventors

SHEN ZHIRONG
ZHANG FENG
SHU JIWU

Assignees

厦门大学

Dates

Publication Date: 20260512
Application Date: 20230602

Claims (8)

1. A cross-frame perceived erasure code storage system equalization redundancy conversion method is characterized by comprising the following steps: The method comprises the steps of establishing a block layout, namely aggregating check blocks of all strips in a redundancy conversion group into the same check rack, and uniformly distributing data blocks of each strip in the redundancy conversion group into each data rack, wherein the strips comprise stretching strips and decomposing strips; A check block updating step, namely updating the check block of the stretching strip by directly reading the data block and decomposing the decoupling mode of the check block of the strip; selecting redundant conversion groups with different checking racks to form an execution group, iteratively replacing the execution group by using a heuristic algorithm, selecting the execution group with the minimum uploading load balancing ratio, and executing the redundant conversion groups in the execution group in parallel; The block layout making step specifically comprises the following steps: Step 1.1 for a process comprising Individual stretch strips The redundant conversion groups of the decomposed stripes are used for aggregating the check blocks of all the stripes in the groups into the same checking rack; step 1.2, uniformly distributing the data blocks of each strip in the redundant conversion group in each data rack, wherein the data blocks are uniformly distributed in each data rack, and the data racks are racks except for the verification rack; Step 1.3, after the redundancy operation is initiated, assigning priorities to the data frames according to the number of data blocks in the data frames for each stripe, establishing a priority queue, and sequentially allocating Of separate strips The data blocks are distributed to the front according to the data frame sequence in the priority queue A plurality of stretch strips, the remaining stretch strips being referred to as remaining stretch strips; step 1.4, establishing a network flow diagram for the rest data blocks and the rest stretching strips, wherein in the network flow diagram, each stretching strip, each decomposing strip and each data frame are respectively represented by one vertex, the vertexes of the decomposing strips are connected with directed edges pointing to the vertexes of the data frames, and the vertexes of each decomposing strip are connected with The vertex of each frame, the edge capacity is determined by the number of data blocks of the decomposition band in the corresponding frame, and the decomposition band is represented in the frame The method comprises the steps of providing a data frame, establishing a sink, wherein the vertex of the frame is connected with a directional edge pointing to the vertex of a stretching strip, and the edge capacity is determined by the number of data blocks which can be received by the stretching strip in the corresponding frame; step 1.5, running Dinic algorithm to find the maximum flow, and distributing the rest data blocks and the rest stretching strips in the decomposed strips according to the found maximum flow; the load balancing step specifically comprises the following steps: step 3.1, grouping all redundant conversion groups according to the machine frame where the check block is located, and locating the check block in different machine frames The combination of the redundant switch groups is referred to as an execution group, wherein, Total number of racks; Step 3.2, traversing the redundant conversion groups which are not selected, selecting the redundant conversion groups which are provided with different checking racks from the current execution group, and adding the redundant conversion groups into the execution group; Step 3.3, recording the load condition of the execution group when the redundancy conversion group is newly added each time, and newly adding an uploading load for the execution group according to the number of data blocks uploaded by each data rack of the redundancy conversion group; Step 3.4, calculating the load balancing ratio of the execution group, namely the ratio of the maximum uploading load of the execution group to the average uploading load of the execution group; step 3.5, selecting an execution group with the minimum load balancing ratio, and executing redundancy conversion in parallel by all redundancy conversion groups in the execution group; and 3.6, repeating the steps 3.2-3.5 until all the redundant conversion groups are selected.
2. The across chassis aware erasure code storage system equalization redundancy switching method of claim 1, wherein the data chassis containing the smaller number of data blocks has a higher priority.
3. The across-rack aware erasure code storage system equalization redundancy switching method of claim 1, wherein for the remaining tensile stripes, the allocation of data blocks is performed according to a priority queue until a fully equalized state, i.e., the number of data blocks in each data rack is equal.
4. The across chassis aware erasure code storage system equalization redundancy switching method of claim 1, wherein the number of directed edges comprises A strip.
5. The across-chassis aware erasure code storage system equalization redundancy conversion method of claim 1, wherein the check block updating step specifically comprises: Step 2.1, assigning to the remaining stretched tapes The data blocks are sent to a first check node of a corresponding decomposition strip of the check rack; step 2.2, calculating a check increment block required by the check blocks of the residual tensile strip according to the first check node of the decomposed strip, wherein the check increment block comprises And, putting this The verification increment blocks are sent to the verification nodes of the residual stretching strips; Step 2.3, the first check node of the decomposition stripe forwards the read data block to the rest check nodes, the check nodes perform decoupling operation on the received data block and the local check block, and the first check node calculates A number of check increment blocks, which are added to the data The number of the verification increment blocks is distributed to the verification nodes of the previous stretching strip; and 2.4, for each stretching strip, the check node uses the received check increment block to update the check block stored in the check node.
6. An equalization redundancy conversion device of a cross-rack perceived erasure code storage system, which is characterized by being used for realizing the method as claimed in any one of claims 1-5, comprising: The block layout making module is used for aggregating the check blocks of all the strips in the redundant conversion group into the same check rack and uniformly distributing the data blocks of each strip in the redundant conversion group into each data rack; The check block updating module is used for updating the check blocks of the stretching strips in a mode of directly reading and decomposing the strip check blocks by adopting the data blocks; And the load balancing module is used for selecting the redundant conversion groups with different checking racks to form an execution group, iteratively replacing the execution group by using a heuristic algorithm, selecting the execution group with the minimum uploading load balancing ratio, and executing the redundant conversion groups in the execution group in parallel.
7. A computer device comprising a program or instructions which, when executed, performs the method of any of claims 1 to 5.
8. A storage medium comprising a program or instructions which, when executed, perform the method of any one of claims 1 to 5.

Description

Cross-frame perceived erasure code storage system equalization redundancy conversion method and device Technical Field The invention relates to the technical field of storage, in particular to a method and a device for equalizing redundancy conversion of an erasure code storage system perceived by a cross-frame. Background With the increasing popularity of failures in large-scale storage systems today, to protect the reliability of data from unexpected failures, most existing storage systems often maintain additional data redundancy in advance through "backups" and "erasure codes" in order to recover the original data using pre-stored redundancy in the event of a failure. The backup copies the data in N copies and stores it in N different nodes, respectively, so its storage overhead is N times that of the original data. Erasure codes are lightweight calculations, and data of a fixed size (referred to as a data block) is input, and a small amount of redundant data of the same size (referred to as a check block) is obtained by calculation. Specifically, the erasure code is configured by two parameters k and m, and m check blocks are generated by calculating k data blocks during encoding, and the (k+m) blocks form a stripe and are stored in (k+m) different nodes (each node needs to be guaranteed to have only one block of the stripe), meanwhile, any k blocks in the stripe can decode the rest m blocks, so that the erasure code can ensure that original data can be recovered when faults occur. Compared to backups storing multiple identical copies, erasure codes are more space efficient and can maintain the same fault tolerance as backups with less data redundancy. At present, erasure codes have been widely used in various storage systems, mainly for persistent storage of cold data and for mitigating delays in frequently accessed data. In most cases, storage systems typically employ a series of different levels of redundancy (i.e., storage overhead, calculated as) To gracefully adjust access performance in the face of varying access characteristics and varying reliability requirements. Therefore, conversion between erasure codes of different redundancy levels (referred to as redundancy conversion) is critical to modern storage systems, which process is prone to generate large amounts of conversion traffic (i.e., conversion of data transmitted over a network). Typically, redundancy conversion represents the operation of converting (k, m) erasure codes into (k ', m) erasure codes, and existing methods consider redundancy conversion operations that make k' > k. In particular, redundant switching requires the disassembly (referred to as de-striping) of some old stripes of (k, m) codes, and the allocation of data blocks therein to other (k, m) coded stripes (referred to as stretched stripes) for re-encoding under the new (k', m) codes. Assuming that { D 1,D2,...,Dk } and { P 1,P2,...,Pm } are k data blocks and m check blocks of a stripe, each check block is a linear combination of k data blocks, and can pass through the formula under Galois field [1]Wherein α i,j (1≤i≤k, 1≤j≤m) is the coding coefficient used in the calculation of the check block P j from the data block D i. For the split stripe, the old linear relation needs to be decoupled, i.e. the check blocks are invalidated and the data blocks are regarded as new data blocks to be added into the stretched stripe, for the stretched stripe, the new data blocks need to be received and the check blocks are updated, and the updated check blocks P j are represented by P j', and can be calculated as follows in terms of formulaWherein j is equal to or greater than 1 and is equal to or less than m, and DeltaP j is called a check increment block. The traffic overhead of the redundancy switch consists of two parts, namely data block migration and check block update. In the conversion process, a part of data blocks need to be migrated to other nodes so as to ensure that (k ' +m) blocks of a new stripe coded by (k ', m) are still stored in (k ' +m) different nodes, and the generated traffic is the data block migration traffic. Meanwhile, the (k, m) codes of the stretching strips are converted into (k', m) codes, and as the data blocks are newly added, m check blocks of each stretching strip need to be recalculated to ensure that the fault tolerance is still effective after conversion, and the part of traffic is the update traffic of the check blocks. The existing work mainly accelerates the redundant conversion process by reducing the traffic, such as SRS 2 eliminates the data block migration traffic by designing the data layout, ERS 3 designs the special data layout like SRS, which eliminates the data block migration traffic and further reduces the check block update traffic, STRIPEMERGE 4 achieves the k-to-2 k conversion by directly merging the two stripes. The existing methods suffer from (i) lack of flexibility in the choice of parameters for redundant conversions (SRS and