CN-122019474-A - Mixed storage method and system for unified metadata layer

CN122019474ACN 122019474 ACN122019474 ACN 122019474ACN-122019474-A

Abstract

The application provides a mixed storage method and a system for a unified metadata layer, wherein the method comprises the steps of receiving a storage access request of each node, analyzing a protocol type and a target data path corresponding to the request, executing metadata operation corresponding to the protocol type based on a preset directory table, an index node table and an object table, packaging the metadata operation into atomic transaction submission, synchronously updating corresponding records of the directory table, the index node table and the object table, and completing read-write operation of target data.

Inventors

SUN FANGCHEN
ZOU YU
CHEN JI

Assignees

联通数字科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260403

Claims (10)

1. The mixed storage method of the unified metadata layer is characterized by comprising the following steps of: Receiving a storage access request of each node, and analyzing a protocol type and a target data path corresponding to the request; Based on a preset directory table, an index node table and an object table, executing metadata operation corresponding to the protocol type; and packaging the metadata operation into atomic transaction submission, synchronously updating the corresponding records of the directory table, the index node table and the object table, and finishing the read-write operation of the target data.
2. The method of claim 1, wherein the pre-set directory table, inode table, and object table are constructed by: constructing a directory table by taking a file system identifier, a parent directory inode number and a file name as row keys, and storing the inode number corresponding to the sub-file; constructing an index node table by taking a file system identifier and an index node number as row keys, and storing the distribution information of file data blocks; And constructing an object table by taking the name of the storage bucket and the object path as row keys, and storing index node numbers and object expansion information corresponding to the data.
3. The method of claim 1, wherein encapsulating the metadata operation as an atomic transaction commit, synchronously updating the corresponding records of the directory table, the inode table, and the object table, comprises: the metadata writing, updating and deleting operations of the directory table, the index node table and the object table are packaged into the same distributed transaction; executing all metadata operations in the transaction, and checking the operation execution result of each table; If all the operations are successfully executed, submitting the synchronous update of the transaction completion directory table, the index node table and the object table; if any operation fails to execute, the whole rolls back all the operations in the transaction.
4. The method of claim 1, wherein the node is an AI training node, the method further comprising: Allocating corresponding resource tokens for each AI training node, determining the initial number of the resource tokens based on the storage credit limit of the AI training nodes, and incorporating all the AI training nodes and the corresponding resource tokens into a temporary intermediate for management, wherein the higher the storage credit of the AI training nodes is, the more the initial number of the resource tokens in the temporary intermediate is; performing competition selection in all AI training nodes conforming to competition conditions of the temporary intermediate based on a preset random competition function, wherein the random competition function satisfies that the larger the number of resource tokens of the AI training nodes is, the higher the probability of being selected; adding the selected AI training nodes into a scheduling body, and performing task scheduling execution according to a preset sequence; The method comprises the steps that when a selected AI training node is added into a scheduling body, the number of current resource tokens of the AI training node in a temporary intermediate is deducted according to a preset proportion; The method comprises the steps of collecting the convergence degree of training tasks of all the AI training nodes, and updating the quantity of resource tokens of the corresponding AI training nodes in a temporary intermediate based on the change of the training convergence degree, wherein when the convergence degree of training results is increased, the resource tokens of the corresponding AI training nodes are increased, and when the convergence degree of training results is decreased, the resource tokens of the corresponding AI training nodes are reduced until all the AI training tasks are scheduled and executed.
5. The method of claim 4, wherein the random competition function is implemented by a weighted random roulette algorithm, wherein the probability of single competition of a single AI training node being selected is a ratio of the number of current effective resource tokens of the AI training node in a temporary intermediate to the sum of the number of effective resource tokens of all AI training nodes participating in the competition in the temporary intermediate, and each node is assigned a continuous random number interval of a corresponding length based on the calculated selected probability of each AI training node, and the selected AI training node of the competition is determined by the interval in which the generated globally unique random number falls.
6. The method of claim 4, wherein the deducting the current number of the resource tokens in the temporary intermediate by the AI training node according to the preset ratio comprises the configurable range of the preset token deduction ratio being 10% -30%, and synchronously deducting the number of tokens corresponding to the product of the current balance and the preset deduction ratio from the balance of the resource tokens in the temporary intermediate by the AI training node at the moment that the AI training node is selected and the scheduling body enqueuing operation is completed, wherein the deducted balance of the resource tokens is not lower than 0, and the AI training node is marked as a suspension competition state when the deducted balance is 0 until the AI training node resumes the competition qualification after the training convergence degree improves the supplementary resource tokens to be greater than 0.
7. The method of claim 4, wherein updating the number of resource tokens in the temporary intermediary for the corresponding AI training node based on the change in training convergence comprises: The training convergence degree is at least one quantifiable index of a loss function decline rate and a model verification precision improvement rate of a training task; Taking a preset training iteration period as a unit, collecting a training convergence value of the current iteration period of each AI training node, and comparing the training convergence value with a convergence reference value of the previous iteration period; when the convergence value of the current iteration period is higher than the reference value of the previous iteration period, adding a corresponding number of resource tokens in a temporary intermediate for the AI training node; And when the convergence value of the current iteration period is lower than the reference value of the previous iteration period, deducting a corresponding number of resource tokens from the balance of the resource tokens of the temporary intermediate by the AI training node, wherein the change amplitude of the convergence and the increase and decrease number of the resource tokens are in a linear positive correlation relation.
8. The method of claim 4, wherein adding the selected AI training node to the scheduler performs task scheduling according to a preset order, and specifically includes: The scheduling body adopts a first-in first-out ordered queue structure, the selected AI training nodes are sequentially added into the tail part of the ordered queue according to the selected time sequence, and the node sequence in the queue is not tamperable; The scheduler sequentially takes out AI training nodes from the head of the ordered queue, allocates matched training storage resources and computing resources for the AI training nodes, and executes corresponding training tasks; After the scheduling execution of the training task of the AI training node is completed, if the node still has residual effective resource tokens in the temporary intermediate and the task does not reach the preset convergence termination threshold, the task is re-incorporated into the competition selection pool to participate in the random competition of the next round; And if the task reaches the preset convergence termination threshold, clearing the residual resource token of the node, and removing the competition selected pool.
9. The method of claim 4, wherein the temporary intermediate is a token management container with a weight limit management, and is used for maintaining unique identifiers of all online AI training nodes, corresponding resource token balances, storing credit, training convergence real-time data and task execution states, and the scheduler is an ordered scheduling queue with a sequence lock, and is used for storing AI training nodes which have been selected by competition and wait for executing resource scheduling.
10. A hybrid storage system for unifying metadata layers, comprising: The receiving module is used for receiving the storage access request of each node and analyzing the protocol type and the target data path corresponding to the request; The execution module is used for executing metadata operation corresponding to the protocol type based on a preset directory table, an index node table and an object table; and the synchronization module is used for encapsulating the metadata operation into atomic transaction submission, synchronously updating the corresponding records of the directory table, the index node table and the object table, and completing the read-write operation of the target data.

Description

Mixed storage method and system for unified metadata layer Technical Field The application belongs to the field of operating systems, and particularly relates to a hybrid storage method and system for a unified metadata layer. Background Along with the rapid increase of the AI large model training scale, the training system needs to process massive small file sample data sets and large-size model weights and check point files simultaneously, and requirements of POSIX semantic high-frequency random access and S3 semantic high-throughput sequential reading and writing are simultaneously provided for bottom storage. The current industry mainstream adopts a framework of file storage and object storage hierarchical deployment, metadata layers of the file storage and the object storage are mutually independent, two sets of systems of inode/directory entry and object path index are respectively maintained, the existing fusion scheme realizes cross-system access through protocol bridging and middleware synchronization, the bottom layer still does not break the barriers of two sets of metadata management logics, the problems of semantic splitting, complex synchronization, performance loss, difficult guarantee of consistency and the like exist, and the severe requirements of frequent updating of a TB-level sample set and concurrent training of multiple nodes under a large model training scene cannot be adapted. Meanwhile, the existing storage architecture lacks resource management and control capability aiming at a large-model multi-training task parallel scene, and the problems that production task resources are preempted, training performance severely fluctuates and the like easily occur. Disclosure of Invention The embodiment of the application provides a hybrid storage method and a system for a unified metadata layer, which aim to solve the problem of the bottom layer of the conventional AI large model training. The application provides a mixed storage method of a unified metadata layer, which comprises the following steps: Receiving a storage access request of each node, and analyzing a protocol type and a target data path corresponding to the request; Based on a preset directory table, an index node table and an object table, executing metadata operation corresponding to the protocol type; and packaging the metadata operation into atomic transaction submission, synchronously updating the corresponding records of the directory table, the index node table and the object table, and finishing the read-write operation of the target data. Optionally, the preset directory table, the index node table and the object table are constructed in the following manner: constructing a directory table by taking a file system identifier, a parent directory inode number and a file name as row keys, and storing the inode number corresponding to the sub-file; constructing an index node table by taking a file system identifier and an index node number as row keys, and storing the distribution information of file data blocks; And constructing an object table by taking the name of the storage bucket and the object path as row keys, and storing index node numbers and object expansion information corresponding to the data. Optionally, the encapsulating the metadata operation into atomic transaction commit, updating the corresponding records of the directory table, the inode table, and the object table synchronously includes: the metadata writing, updating and deleting operations of the directory table, the index node table and the object table are packaged into the same distributed transaction; executing all metadata operations in the transaction, and checking the operation execution result of each table; If all the operations are successfully executed, submitting the synchronous update of the transaction completion directory table, the index node table and the object table; if any operation fails to execute, the whole rolls back all the operations in the transaction. Optionally, the node is an AI training node, and the method further includes: Allocating corresponding resource tokens for each AI training node, determining the initial number of the resource tokens based on the storage credit limit of the AI training nodes, and incorporating all the AI training nodes and the corresponding resource tokens into a temporary intermediate for management, wherein the higher the storage credit of the AI training nodes is, the more the initial number of the resource tokens in the temporary intermediate is; performing competition selection in all AI training nodes conforming to competition conditions of the temporary intermediate based on a preset random competition function, wherein the random competition function satisfies that the larger the number of resource tokens of the AI training nodes is, the higher the probability of being selected; adding the selected AI training nodes into a scheduling body, and performing task scheduling execution ac