CN-121979723-A - File storage method and device, electronic equipment and storage medium
Abstract
The disclosure provides a file storage method, a file storage device, electronic equipment and a storage medium, and relates to the technical field of Internet, in particular to the technical fields of cloud computing, big data and the like. The method comprises the steps of obtaining a plurality of replication domains of a distributed storage cluster, wherein the replication domains are used for indicating a fault isolation unit, each replication domain comprises a plurality of logic data barrels, the logic data barrels are distributed to different cache nodes of the distributed storage cluster, a file to be stored is divided into a plurality of data blocks according to a set block size, a plurality of copies of any data block are generated, and the copies of any data block are mapped to different logic data barrels in the same replication domain respectively, so that the copies are stored on the different cache nodes corresponding to the same replication domain.
Inventors
- YE ZIHUI
- CHEN ZHIPENG
- Niu Xiangxiang
- Han Yunchang
- Tan tengfei
Assignees
- 北京百度网讯科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251218
Claims (12)
- 1. A method of storing a file, comprising: The method comprises the steps of obtaining a plurality of replication domains of a distributed storage cluster, wherein the replication domains are used for indicating a fault isolation unit, and comprise a plurality of logic data buckets which are distributed to different cache nodes of the distributed storage cluster; dividing a file to be stored into a plurality of data blocks according to the set block size, and generating a plurality of copies of any data block; and mapping the multiple copies of any data block into different logic data barrels in the same replication domain respectively, so that the multiple copies are stored on multiple different cache nodes corresponding to the same replication domain.
- 2. The method of claim 1, wherein mapping the multiple copies of the any one data block to different logical data buckets within a same replication domain, respectively, so that the multiple copies are stored on multiple different cache nodes corresponding to the same replication domain, comprises: Responsive to the first number of the plurality of copies of any one data block being equal to the second number of the plurality of logical data buckets in the same replication domain, mapping the plurality of copies to the plurality of logical data buckets in the same replication domain, respectively, wherein the plurality of copies are in one-to-one correspondence with the plurality of logical data buckets in the same replication domain; And in response to the first number being smaller than the second number, selecting a target logical data bucket which is equal to the first number from a plurality of logical data buckets in the same replication domain, and mapping the plurality of replicas into the target logical data buckets respectively.
- 3. The method of claim 1, wherein the plurality of logical data buckets in the plurality of replication domains are determined by: Dividing a plurality of logic data barrels in the distributed storage cluster into a plurality of fault isolation units, wherein each fault isolation unit corresponds to a replication domain; And distributing the logic data buckets in each replication domain to different cache nodes according to the capacity information of each cache node in the distributed storage cluster.
- 4. The method of claim 1, wherein the generating multiple copies of any one data block comprises: acquiring copy configuration information associated with the file to be stored; determining the number of copies of any one data block according to the copy configuration information; and copying any data block based on the copy number to obtain a plurality of copies of any data block.
- 5. The method of claim 1, wherein the method further comprises: In response to detecting a topology change occurring at a cache node of the distributed storage cluster, reassigning logical data buckets within each of the replication domains based on the updated topology information, and/or, And in response to detecting that the capacity of the cache nodes of the distributed storage cluster is changed, reassigning the logical data buckets in each replication domain based on the updated capacity information.
- 6. The method of claim 1, wherein the method further comprises: determining a logic data bucket carried on a target cache node in response to detecting that the target cache node has a fault and is not recovered to be normal within a set period of time, and determining a first target replication domain to which any carried logic data bucket belongs; for any loaded logical data bucket, acquiring effective copy data from the logical data buckets loaded by other cache nodes in the first target replication domain except the target cache node; And reconstructing missing data in any loaded logical data bucket based on the effective copy data, and writing the reconstructed data into idle logical data buckets of other cache nodes.
- 7. The method of any of claims 1-6, wherein the method further comprises: responding to reading any data block from the distributed storage cluster, and acquiring a cache node list where each copy in a second target replication domain mapped by the any data block is located; based on the cache node list, initiating a read request to the cache node according to a preset multi-copy access strategy; and in response to detecting that any cache node returns valid data, terminating the rest of read requests and using the valid data.
- 8. The method of claim 7, wherein the method further comprises: in response to detecting that any cache node is overtime or returns an error, marking the any cache node as abnormal; and skipping the cache node marked as abnormal in the subsequent reading operation until the abnormal cache node is recovered to be normal.
- 9. A file storage device, comprising: the system comprises an acquisition module, a fault isolation unit and a fault isolation unit, wherein the acquisition module is used for acquiring a plurality of replication domains of a distributed storage cluster, wherein the replication domains are used for indicating the fault isolation unit, and comprise a plurality of logic data buckets which are distributed to different cache nodes of the distributed storage cluster; The generating module is used for dividing the file to be stored into a plurality of data blocks according to the set block size and generating a plurality of copies of any data block; And the mapping module is used for mapping the multiple copies of any data block into different logic data barrels in the same replication domain respectively so that the multiple copies are stored on multiple different cache nodes corresponding to the same replication domain.
- 10. An electronic device, wherein the electronic device comprises: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
- 11. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
- 12. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1-8.
Description
File storage method and device, electronic equipment and storage medium Technical Field The disclosure relates to the technical field of internet, in particular to the technical fields of cloud computing, big data and the like, and particularly relates to a file storage method, a file storage device, electronic equipment and a storage medium. Background With the continuous expansion of service scale and continuous improvement of system complexity, a file storage architecture constructed based on distributed cache nodes becomes a high-efficiency implementation mode gradually. By distributing the data to a plurality of cache nodes, the throughput capacity and the expandability of the system are improved, and the possibility is provided for high availability and disaster recovery. Therefore, how to efficiently and reliably store files has become a technical problem to be solved in such a distributed cache storage system. Disclosure of Invention The disclosure provides a file storage method, a file storage device, electronic equipment and a storage medium. According to an aspect of the present disclosure, there is provided a file storage method including: The method comprises the steps of obtaining a plurality of replication domains of a distributed storage cluster, wherein the replication domains are used for indicating a fault isolation unit, and comprise a plurality of logic data buckets which are distributed to different cache nodes of the distributed storage cluster; dividing a file to be stored into a plurality of data blocks according to the set block size, and generating a plurality of copies of any data block; and mapping the multiple copies of any data block into different logic data barrels in the same replication domain respectively, so that the multiple copies are stored on multiple different cache nodes corresponding to the same replication domain. According to another aspect of the present disclosure, there is provided a command processing apparatus including: the system comprises an acquisition module, a fault isolation unit and a fault isolation unit, wherein the acquisition module is used for acquiring a plurality of replication domains of a distributed storage cluster, wherein the replication domains are used for indicating the fault isolation unit, and comprise a plurality of logic data buckets which are distributed to different cache nodes of the distributed storage cluster; The generating module is used for dividing the file to be stored into a plurality of data blocks according to the set block size and generating a plurality of copies of any data block; And the mapping module is used for mapping the multiple copies of any data block into different logic data barrels in the same replication domain respectively so that the multiple copies are stored on multiple different cache nodes corresponding to the same replication domain. According to still another aspect of the present disclosure, there is provided an electronic apparatus including: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method set forth in the above aspect of the disclosure. According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method set forth in the above aspect of the present disclosure. According to a further aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method set forth in the above aspect of the present disclosure. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: FIG. 1 is a schematic diagram of a file storage in the related art; FIG. 2 is a flowchart illustrating a method for storing files according to an embodiment of the present disclosure; Fig. 3 is a flowchart of a file storage method according to a second embodiment of the disclosure; fig. 4 is a flow chart of a file storage method according to a third embodiment of the disclosure; fig. 5 is a flowchart of a file storage method according to a fourth embodiment of the present disclosure; FIG. 6 is a schematic diagram of a file storage method according to an embodiment of the disclosure; fig. 7 is a schematic structural diagram of a file storage device according to a fifth embodiment of the disclosure; FIG. 8 illustrates a sche