Search

US-12619372-B2 - Data migration method, system and device for storage server, and non-transitory readable storage medium

US12619372B2US 12619372 B2US12619372 B2US 12619372B2US-12619372-B2

Abstract

Disclosed are a data migration method, system and device for a storage server, and a non-transitory readable storage medium. The method includes: according to storage information of a storage server, determining the number n of strips to be migrated; selecting n stripes to be migrated from among c full stripes according to a preset migration rule, and selecting, from the n stripes to be migrated, data blocks to be migrated and migrating the data blocks to the newly inserted disks, so that after the migration is completed, the difference in stripe usage between any two disks in the storage server does not exceed 1; and determining data content of check data blocks of the strips so as to complete data migration of the storage server, and determining data content of check data blocks of the c-n strips by means of incremental update of an erasure code.

Inventors

  • Ruizhen WU
  • Xiaowei Wang
  • Lin Wang
  • Jingjing Chen
  • Yongxing Zhang
  • Xu Zhang

Assignees

  • SUZHOU METABRAIN INTELLIGENT TECHNOLOGY CO., LTD.

Dates

Publication Date
20260505
Application Date
20221223
Priority Date
20220630

Claims (14)

  1. 1 . A data migration method for a storage server, comprising: determining storage information of a storage server, wherein, the storage information is storage information of the storage server before inserting m disks; determining, according to the storage information, the number n of strips to be migrated, wherein, the strips to be migrated are used to migrate to the m disks which are inserted into the storage server; selecting n strips from among c strips according to a preset migration rule to serve as the strips to be migrated, and selecting, from the n strips to be migrated, data blocks to be migrated and migrating the data blocks from an original disk where the data blocks to be migrated is located to the disks which are newly inserted, so that after migration of the data blocks is completed, difference in strip usage between any two disks of the m disks in the storage server does not exceed 1, wherein the preset migration rule comprises: for any one of the data blocks to be migrated, determining a strip in which a check data block with the lowest strip serial number is located in a disk where the data block to be migrated is located before migration, and when the strip has a storage space that is not occupied by any data block to be migrated in the m disks which are newly inserted, migrating the data block to be migrated to the storage space, thereby a position limitation is imposed on placement of the data blocks to be migrated into the m newly inserted disks, so as to satisfy the requirements of load balancing; and determining data content of check data blocks of the strips so as to complete data migration of the storage server, and for c-n strips among the c strips that are not selected as the strips to be migrated, determining data content of check data blocks of the c-n strips by means of incremental update of an erasure code, user data blocks of the c-n strips which are remaining in original disks are not moved, so as to perform incremental update of the erasure code by combining data migrated to the m newly inserted disks, a type of the erasure code is Reed-Solomon Code (RS code) used for encoding k data blocks into r additional check blocks, parameters of the RS code, when implementing a storage erasure coding function, are determined on the basis of location information, a general algorithm relation of the erasure codes is summarized as: P ⁢ 1 = a 1 , 1 ⁢ D ⁢ 1 ⊕ a 1 , 2 ⁢ D ⁢ 2 ⊕ … ⊕ a 1 , k ⁢ D k … P ⁢ r = a r , 1 ⁢ D ⁢ 1 ⊕ a r , 2 ⁢ D ⁢ 2 ⊕ … ⊕ a r , k ⁢ D k , wherein, a 1,1 to a r,k are selected from an encoding matrix of a Vandermonde matrix or an encoding matrix of a Cauchy matrix, P 1 to Pr are check data blocks, D 1 to Dk are user data blocks, ⊕ is addition in the Galois field, determining whether a total number of user data blocks before the m disks are newly inserted into the storage server is consistent with a total number of user data blocks after the m disks are inserted and data migration is performed, and in a case that the total number of user data blocks before the m disks are newly inserted into the storage server is inconsistent with the total number of user data blocks after the m disks are inserted and data migration is performed, determining a part of the user data blocks is not successfully migrated to the newly inserted disks, wherein n<c, m, n and c are all positive integers, and c represents the number of full strips in use by of the storage server before the m disks are newly inserted into the storage server, wherein selecting the n stripes from among the c stripes according to the preset migration rule to serve as the stripes to be migrated comprises: using the n strips with highest strip serial numbers among the c strips to serve as the strips to be migrated, wherein the preset migration rule further comprises: according to a rule that a stripe with a high stripe serial number takes precedence over a stripe with a low stripe serial number and when stripe serial numbers of two stripes are the same, a disk with a high disk serial number to which a data block to be migrated belongs takes precedence over a disk with a low disk serial number to which a data block to be migrated belongs, successively selecting each data block to be migrated from the n stripes to be migrated, and migrating same to the disks which are newly inserted; and upon completion of migration of any one of the data blocks to be migrated, in a case that the difference in the stripe usage between the any two disks in the storage server does not exceed 1, determining the migration being completed.
  2. 2 . The data migration method for the storage server according to claim 1 , wherein determining the storage information of the storage server comprises: determining a number r of disks, the number c of the full strips in use by the storage server, and a number p of check data blocks of a single strip before the m disks are newly inserted into the storage server.
  3. 3 . The data migration method for the storage server according to claim 1 , wherein determining the storage information of the storage server comprises: determining a current disk space usage of the storage server and whether storage degradation occurs in the storage server.
  4. 4 . The data migration method for the storage server according to claim 1 , wherein determining the storage information of the storage server further comprises: updating the storage information of the storage server in real time or periodically.
  5. 5 . The data migration method for the storage server according to claim 2 , wherein determining, according to the storage information, the number n of the strips to be migrated comprises: determining whether the following formula holds or not: ( c * m r - p + m - ⌊ c * m r - p + m ⌋ ) * ( r - p ) > m ; in a case that the following formula holds, the number of the strips to be migrated which is determined being: n = ⌊ c * m r - p + m ⌋ + 1 ; in a case that the following formula not holds, the number of the strips to be migrated which is determined being: n = ⌊ c * m r - p + m ⌋ .
  6. 6 . The data migration method for the storage server according to claim 1 , wherein after determining the storage information of the storage server, the method further comprises: determining whether a strip that is not fully written exists before the m disks are newly inserted into the storage server; in a case that the strip that is not fully written does not exist, executing the operation of determining, according to the storage information, the number n of the strips to be migrated; and in a case that the strip that is not fully written exists, considering the strip that is not fully written to be an empty strip, and executing the operation of determining, according to the storage information, the number n of the strips to be migrated.
  7. 7 . The data migration method for the storage server according to claim 1 , wherein after determining the data content of the check data blocks of the stripes, the method further comprises: verifying whether a data migration process of the storage server is anomalous, and in a case that the data migration process of the storage server is anomalous, outputting prompt information.
  8. 8 . The data migration method for the storage server according to claim 7 , wherein after determining that the data migration process of the storage server is anomalous, the method further comprises: recording an event log of this data migration process.
  9. 9 . The data migration method for the storage server according to claim 1 , wherein the storage server is a storage server in a distributed storage system.
  10. 10 . The data migration method for the storage server according to claim 1 , wherein the storage server is a storage server in a unified storage system.
  11. 11 . The data migration method for the storage server according to claim 1 , wherein the check data blocks are arranged in a left-handed misalignment manner.
  12. 12 . The data migration method for the storage server according to claim 1 , wherein the same encoding matrix is used to calculate erasure codes of the strips.
  13. 13 . A data migration device of a storage server, comprising: a memory, configured to store a computer program; and a processor, configured to execute the computer program to: determine storage information of a storage server, wherein, the storage information is storage information of the storage server before inserting m disks; determine, according to the storage information, the number n of strips to be migrated, wherein, the strips to be migrated are used to migrate to the m disks which are inserted into the storage server; select n strips from among c strips according to a preset migration rule to serve as the strips to be migrated, and selecting, from the n strips to be migrated, data blocks to be migrated and migrating the data blocks from an original disk where the data blocks to be migrated is located to the disks which are newly inserted, so that after migration of the data blocks is completed, difference in strip usage between any two disks of the m disks in the storage server does not exceed 1, wherein the preset migration rule comprises: for any one of the data blocks to be migrated, determining a strip in which a check data block with the lowest strip serial number is located in a disk where the data block to be migrated is located before migration, and when the strip has a storage space that is not occupied by any data block to be migrated in the m disks which are newly inserted, migrating the data block to be migrated to the storage space, thereby a position limitation is imposed on placement of the data blocks to be migrated into the m newly inserted disks, so as to satisfy the requirements of load balancing; and determine data content of check data blocks of the strips so as to complete data migration of the storage server, and for c-n strips among the c strips that are not selected as the strips to be migrated, determine data content of check data blocks of the c-n strips by means of incremental update of an erasure code, user data blocks of the c-n strips which are remaining in original disks are not moved, so as to perform incremental update of the erasure code by combining data migrated to the m newly inserted disks, a type of the erasure code is Reed-Solomon Code (RS code) used for encoding k data blocks into r additional check blocks, parameters of the RS code, when implementing a storage erasure coding function, are determined on the basis of location information, a general algorithm relation of the erasure codes is summarized as: P ⁢ 1 = a 1 , 1 ⁢ D ⁢ 1 ⊕ a 1 , 2 ⁢ D ⁢ 2 ⊕ … ⊕ a 1 , k ⁢ D k … P ⁢ r = a r , 1 ⁢ D ⁢ 1 ⊕ a r , 2 ⁢ D ⁢ 2 ⊕ … ⊕ a r , k ⁢ D k , wherein, a 1,1 to a r,k are selected from an encoding matrix of a Vandermonde matrix or an encoding matrix of a Cauchy matrix, P 1 to Pr are check data blocks, D 1 to Dk are user data blocks, ⊕ is addition in the Galois field, determine whether a total number of user data blocks before the m disks are newly inserted into the storage server is consistent with a total number of user data blocks after the m disks are inserted and data migration is performed, and in a case that the total number of user data blocks before the m disks are newly inserted into the storage server is inconsistent with the total number of user data blocks after the m disks are inserted and data migration is performed, determine a part of the user data blocks is not successfully migrated to the newly inserted disks, wherein n<c, m, n and c are all positive integers, and c represents the number of full strips in use by of the storage server before the m disks are newly inserted into the storage server, wherein select the n stripes from among the c stripes according to the preset migration rule to serve as the stripes to be migrated comprises: use the n strips with highest strip serial numbers among the c strips to serve as the strips to be migrated, wherein the preset migration rule further comprises: according to a rule that a stripe with a high stripe serial number takes precedence over a stripe with a low stripe serial number and when stripe serial numbers of two stripes are the same, a disk with a high disk serial number to which a data block to be migrated belongs takes precedence over a disk with a low disk serial number to which a data block to be migrated belongs, successively select each data block to be migrated from the n stripes to be migrated, and migrate same to the disks which are newly inserted; and upon completion of migration of any one of the data blocks to be migrated, in a case that the difference in the stripe usage between the any two disks in the storage server does not exceed 1, determine the migration being completed.
  14. 14 . A non-transitory readable storage medium, wherein the non-transitory readable storage medium stores a computer program which, when being executed by a processor, cause the processor to: determine storage information of a storage server, wherein, the storage information is storage information of the storage server before inserting m disks; determine, according to the storage information, the number n of strips to be migrated, wherein, the strips to be migrated are used to migrate to the m disks which are inserted into the storage server; select n strips from among c strips according to a preset migration rule to serve as the strips to be migrated, and selecting, from the n strips to be migrated, data blocks to be migrated and migrating the data blocks from an original disk where the data blocks to be migrated is located to the disks which are newly inserted, so that after migration of the data blocks is completed, difference in strip usage between any two disks of the m disks in the storage server does not exceed 1, wherein the preset migration rule comprises: for any one of the data blocks to be migrated, determining a strip in which a check data block with the lowest strip serial number is located in a disk where the data block to be migrated is located before migration, and when the strip has a storage space that is not occupied by any data block to be migrated in the m disks which are newly inserted, migrating the data block to be migrated to the storage space, thereby a position limitation is imposed on placement of the data blocks to be migrated into the m newly inserted disks, so as to satisfy the requirements of load balancing; and determine data content of check data blocks of the strips so as to complete data migration of the storage server, and for c-n strips among the c strips that are not selected as the strips to be migrated, determine data content of check data blocks of the c-n strips by means of incremental update of an erasure code, user data blocks of the c-n strips which are remaining in original disks are not moved, so as to perform incremental update of the erasure code by combining data migrated to the m newly inserted disks, a type of the erasure code is Reed-Solomon Code (RS code) used for encoding k data blocks into r additional check blocks, parameters of the RS code, when implementing a storage erasure coding function, are determined on the basis of location information, a general algorithm relation of the erasure codes is summarized as: P ⁢ 1 = a 1 , 1 ⁢ D ⁢ 1 ⊕ a 1 , 2 ⁢ D ⁢ 2 ⊕ … ⊕ a 1 , k ⁢ D k … P ⁢ r = a r , 1 ⁢ D ⁢ 1 ⊕ a r , 2 ⁢ D ⁢ 2 ⊕ … ⊕ a r , k ⁢ D k , wherein, a 1,1 to a r,k are selected from an encoding matrix of a Vandermonde matrix or an encoding matrix of a Cauchy matrix, P 1 to Pr are check data blocks, D 1 to Dk are user data blocks, ⊕ is addition in the Galois field, determine whether a total number of user data blocks before the m disks are newly inserted into the storage server is consistent with a total number of user data blocks after the m disks are inserted and data migration is performed, and in a case that the total number of user data blocks before the m disks are newly inserted into the storage server is inconsistent with the total number of user data blocks after the m disks are inserted and data migration is performed, determine a part of the user data blocks is not successfully migrated to the newly inserted disks, wherein n<c, m, n and c are all positive integers, and c represents the number of full strips in use by of the storage server before the m disks are newly inserted into the storage server, wherein select the n stripes from among the c stripes according to the preset migration rule to serve as the stripes to be migrated comprises: use the n strips with highest strip serial numbers among the c strips to serve as the strips to be migrated, wherein the preset migration rule further comprises: according to a rule that a stripe with a high stripe serial number takes precedence over a stripe with a low stripe serial number and when stripe serial numbers of two stripes are the same, a disk with a high disk serial number to which a data block to be migrated belongs takes precedence over a disk with a low disk serial number to which a data block to be migrated belongs, successively select each data block to be migrated from the n stripes to be migrated, and migrate same to the disks which are newly inserted; and upon completion of migration of any one of the data blocks to be migrated, in a case that the difference in the stripe usage between the any two disks in the storage server does not exceed 1, determine the migration being completed.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present application is a National Stage Application of PCT International Application No.: PCT/CN2022/135156 filed on Nov. 29, 2022, which claims priority to Chinese Patent Application 202210462562.2, filed in the China National Intellectual Property Administration on Apr. 29, 2022, the disclosure of which is incorporated herein by reference in its entirety. TECHNICAL FIELD The present disclosure relates to the field of coding technologies, and in particular, to a data migration method, system and device for a storage server, and a non-transitory readable storage medium. BACKGROUND At present, distributed storage has been widely used in the face of storage requirements of mass data. There are a large number of nodes in a distributed storage system, the reliability of a single node is usually not very high, and node failures often occur in the system due to reasons such as software and hardware failures and human errors. Therefore, in order to improve data reliability of a distributed storage system and realize reconstruction of an original file, in addition to storing original data, a certain amount of data redundancy is also stored, so that in cases where some nodes fail, the original file can also be decoded and recovered, thereby guaranteeing normal operation of the system. An EC (Erasure Code) is a forward error correction technology in coding theory, can effectively reduce storage overheads while ensuring the same reliability, and is widely applied to various storage systems and data centers. There are many types of erasure codes, and an RS code (Reed-Solomon Code) is a relatively common type in actual storage systems. The RS code encodes k data blocks into r additional check blocks. The method for obtaining the r check blocks by encoding on the basis of the Vandermonde matrix or the Cauchy matrix is referred to as an RS erasure code encoded by using the Vandermonde matrix or the Cauchy matrix, which can be expressed as: [10…001…0⋮⋮⋮⋮00…1x10x20…xk0⋮⋮⋮⋮x1r-1x2r-1…xkr-1]×[D1D2⋮Dk]=[D1D2⋮DkP1⋮Pr]Formula⁢ (1)[10…001…0⋮⋮⋮⋮00…11x1+y11x1+y2…1x1+yk⋮⋮⋮⋮1xr+y11xr+y2…1xr+yk]×[D1D2⋮Dk]=[D1D2⋮DkP1⋮Pr]Formula⁢ (2) Formula (1) represents performing RS encoding by using the Vandermonde matrix, and formula (2) represents performing RS encoding by using the Cauchy matrix. Taking formula (1) as an example, the upper part of the Vandermonde matrix represents a k×k unit matrix. The unit matrix is multiplied by original data D1 to Dk, and the obtained result is still the original data D1 to Dk. The lower part is an r×k encoding matrix, which is multiplied by the original data D1 to Dk, and P1 to Pr obtained are r pieces of encoded data obtained by means of encoding, or referred to as check data. When at most r pieces of data among D1 to Pr are erroneous or lost, the original data D1 to Dk can be obtained by multiplying remaining data by an inverse matrix of a matrix corresponding to the remaining data. For example, when D1 to Dr are lost, the decoding process may be expressed as formula (3). […1…0⋮⋮⋮⋮…0…1…xr0…xk0⋮⋮⋮⋮…xrr-1…xkr-1]4×[Dr+1Dr+2⋮Pr]=[D1D2⋮Dk]Formula⁢ (3) It can be determined that the core of the erasure code is to construct an invertible encoding matrix, and the original encoded data is recovered by using the encoded inverse matrix after encoding. A common RS erasure code uses the Cauchy matrix or the Vandermonde matrix. An obtained matrix is completely invertible, and the expansion of the size of the matrix is simple. Regardless of distributed storage or unified storage, basic hardware devices are general-purpose storage chips, and the general-purpose storage chips perform storage function control. After construction of storage data is completed and the storage data is protected by means of a redundancy check code, a user needs to perform an operation of newly inserting a disk after the storage server is constructed. In this case, data needs to be migrated without changing a check amount. SUMMARY An object of some embodiments of the present disclosure is to provide a data migration method, system and device for a storage server, and a non-transitory readable storage medium. embodiments of the present disclosure provide the following technical solutions:a data migration method for a storage server, including:storage information of a storage server is determined, wherein, the storage information is storage information of the storage server before inserting m disks;the number n of strips to be migrated is determined according to the storage information, wherein, the strips to be migrated are used to migrate to the m disks which are inserted into the storage server;n strips are selected from among c strips according to a preset migration rule to serve as the strips to be migrated, and data blocks to be migrated are selected from the n strips to be migrated and are migrated to the disks which are newly inserted, so that after the migration of the data blocks is complet