Search

US-12625844-B2 - File management method and apparatus, device, and system

US12625844B2US 12625844 B2US12625844 B2US 12625844B2US-12625844-B2

Abstract

A file management method includes obtaining a storage path of a file; obtaining an attribute of the file through calculation based on the storage path and an attribute prediction model; and determining a storage policy for the file based on the attribute.

Inventors

  • Yibin Wang
  • Lingchuan Sun
  • Xinxiang Lin

Assignees

  • HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date
20260512
Application Date
20231208
Priority Date
20210610

Claims (20)

  1. 1 . A method implemented by a storage system, comprising: obtaining, from a client, a first storage path of a file to be stored; obtaining attribute data and a second storage path that are of a stored file; training, using the attribute data and the second storage path, an initial model to obtain a trained attribute prediction model; predicting an attribute of the file through calculation based on the first storage path and the trained attribute prediction model, wherein the attribute indicates a life cycle value of the file or a size of the file; and determining a storage policy for the file based on the attribute.
  2. 2 . The method of claim 1 , wherein the attribute prediction model is based on a convolutional neural network (CNN).
  3. 3 . The method of claim 1 , wherein the file is a newly created file, wherein the attribute is the size of the file, and wherein the storage policy is a quantity of strips used when storing data of the file.
  4. 4 . The method of claim 1 , wherein the file is a client file, wherein the attribute is the life cycle value of the file, and wherein determining the storage policy comprises determining a first storage medium used when storing data of the file.
  5. 5 . The method of claim 4 , wherein determining the first storage medium comprises: obtaining a second storage medium in which the file is currently stored; obtaining a storage duration of the file in the second storage medium; identifying that the storage duration has not exceeded a duration threshold; and determining, in response to identifying that the storage duration has not exceeded the duration threshold, that the first storage medium is the second storage medium.
  6. 6 . The method of claim 4 , further comprising: obtaining a second storage medium in which the file is currently stored; obtaining a storage duration of the file in the second storage medium; identifying that the storage duration has exceeded a duration threshold; and determining, in response to identifying that the storage duration has exceeded the duration threshold, that the first storage medium is a third storage medium, wherein a first read speed of the third storage medium is less than a second read speed of the second storage medium.
  7. 7 . A device comprising: a memory configured to store an attribute prediction model and computer-executable instructions; and a processor coupled to the memory and configured to execute the computer-executable instructions to cause the device to: obtain, from a client, a first storage path of a file; obtain attribute data and a second storage path that are of a stored file; train, using the attribute data and the second storage path, an initial model to obtain a trained attribute prediction model; predict an attribute of the file through calculation based on the first storage path and the trained attribute prediction model, wherein the attribute indicates a life cycle value of the file or a size of the file; and determine a storage policy for the file based on the attribute.
  8. 8 . The device of claim 7 , wherein the attribute prediction model is based on a convolutional neural network (CNN).
  9. 9 . The device of claim 7 , wherein the file is a newly created file, wherein the attribute is the size of the file, and wherein the storage policy is a quantity of strips used when storing data of the file.
  10. 10 . The device of claim 7 , wherein the file is a client file, wherein the attribute is the life cycle value of the file, and wherein the processor is further configured to execute the computer-executable instructions to cause the device to determine a first storage medium used when storing data of the file.
  11. 11 . The device of claim 10 , wherein the processor is further configured to execute the computer-executable instructions to cause the device to: obtain a second storage medium in which the file is currently stored; obtain a storage duration of the file in the second storage medium; identify that the storage duration has not exceeded a duration threshold; and determine, in response to identifying that the storage duration has not exceeded the duration threshold, the first storage medium is the second storage medium.
  12. 12 . The device of claim 10 , wherein the processor is further configured to execute the computer-executable instructions to cause the device to: obtain a second storage medium in which the file is currently stored; obtain a storage duration of the file in the second storage medium; identify that the storage duration has exceeded a duration threshold; and determine, in response to identifying that the storage duration has exceeded the duration threshold, that the first storage medium is a third storage medium, wherein a first read speed of the third storage medium is less than a second read speed of the second storage medium.
  13. 13 . A computer program product comprising computer-executable instructions that are stored on a non-transitory computer-readable storage medium and that, when executed by a processor, cause a device to: obtain, from a client, a first storage path of a file; obtain attribute data and a second storage path that are of a stored file; train, using the attribute data and the second storage path, an initial model to obtain a trained attribute prediction model; predict an attribute of the file through calculation based on the first storage path and the trained attribute prediction model, wherein the attribute indicates a life cycle value of the file or a size of the file; and determine a storage policy for the file based on the attribute.
  14. 14 . The computer program product of claim 13 , wherein the first storage medium comprises a dynamic random-access memory (DRAM).
  15. 15 . The computer program product of claim 14 , wherein the computer-executable instructions, when executed by the processor, further cause the device to: build a sample set; and initialize the initial model to obtain the attribute prediction model.
  16. 16 . The computer program product of claim 15 , wherein the attribute prediction model is based on a convolutional neural network (CNN).
  17. 17 . The computer program product of claim 13 , wherein the file is a newly created file, wherein the attribute is the size of the file, and wherein the storage policy is a quantity of strips used when storing data of the file.
  18. 18 . The computer program product of claim 13 , wherein the file is a client file, wherein the attribute is the life cycle value of the file, and wherein the file management apparatus is further configured to determine a first storage medium used when storing data of the file.
  19. 19 . The computer program product of claim 18 , wherein the computer-executable instructions, when executed by the processor, further cause the device to: obtain a second storage medium in which the file is currently stored; obtain a storage duration of the file in the second storage medium; identify that the storage duration has not exceeded a duration threshold; and determine, in response to identifying that the storage duration has not exceeded the duration threshold, the first storage medium is the second storage medium.
  20. 20 . The computer program product of claim 18 , wherein the computer-executable instructions, when executed by the processor, further cause the device to: obtain a second storage medium in which the file is currently stored; obtain a storage duration of the file in the second storage medium; identify that the storage duration has exceeded a duration threshold; and determine, in response to identifying that the storage duration has exceeded the duration threshold, that the first storage medium is a third storage medium, wherein a first read speed of the third storage medium is less than a second read speed of the second storage medium.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This is a continuation of International Patent Application No. PCT/CN2022/097925 filed on Jun. 9, 2022, which claims priority to Chinese Patent Application No. 202110649924.4 filed on Jun. 10, 2021, both of which are hereby incorporated by reference in their entireties. TECHNICAL FIELD This disclosure relates to the field of computer technologies, and in particular, to a file management method, a device, and a system. BACKGROUND With an explosive growth of data in recent years, intelligent file management becomes one of mainstream modes in the future. Importance of different files can be measured based on value of file information, and important files require a faster access speed and a stricter storage policy. A current mainstream distributed file system mainly relies on querying metadata information of a file to measure value of the file, so that a storage method and a storage area that are of the file are measured. However, during distributed storage of large-scale files, accessing metadata information of each file is a quite time-consuming process. In addition, for a same file on different nodes, because both a quantity of accesses and access requirements in a period of time are different, value of the file also varies with time and node. Consequently, when an original file is uploaded to a distributed system, or migrated from one distributed system to another distributed system, great time consumption is caused, which severely affects storage performance of the distributed file system. Therefore, how to provide a method for efficiently perceiving value of a file and improving a file storage speed and performance in the distributed file system becomes a technical problem to be urgently resolved. SUMMARY This disclosure provides a file management method, a device, and a system, to provide a method for efficiently detecting a size and a life cycle that are of a file and recommending a storage policy for the file, which improves a file storage speed and performance in a distributed file system. According to a first aspect, a file management method is provided, where the method includes obtaining a storage path of a file, obtaining an attribute of the file through calculation based on the storage path and an attribute prediction model, and determining a storage policy for the file based on the attribute. According to the foregoing method, the storage path of the file may be used for predicting the attribute of the file, to obtain the storage policy for the file. This reduces a quantity of accesses to metadata of the file, and improves a speed of storing and reading the file. In a possible implementation, before obtaining an attribute of the file through calculation, the method further includes obtaining attribute data and a file path that are of a stored file, and using the attribute data and the file path to train an initial model to obtain the attribute prediction model. According to the foregoing method, the trained attribute prediction model may be used for predicting the attribute of the file, which improves precision of prediction. In another possible implementation, the managed file is a newly created file, the attribute is a size of the file, and the storage policy for the file is a quantity of strips used when data of the file is stored. According to the foregoing method, the storage path of the file may be used for predicting the size of the file, and obtaining a layout policy during file storage. This reduces the quantity of accesses to the metadata of the file, and improves the speed of storing the file. In another possible implementation, the managed file is a client file, the attribute is a life cycle value of the file, and the storage policy for the file is a storage medium used when data of the file is stored. According to the foregoing method, the storage path of the file may be used for predicting the life cycle value of the file, and obtaining the storage medium during file storage. This reduces the quantity of accesses to the metadata of the file, and improves the speed and performance of when a client reads the file. In another possible implementation, a method for determining the storage medium used when the data of the file is stored includes obtaining a first storage medium in which the file is currently stored, and obtaining storage duration of the file in the first storage medium. When the storage duration does not exceed a first duration threshold, the first storage medium is still used for storage. When the storage duration exceeds the first duration threshold, a second storage medium is used for storage, and a read speed of the second storage medium is less than a read speed of the first storage medium. According to the foregoing method, a file whose storage duration exceeds a predicted life cycle value of the file may be stored in a storage medium with a low read speed, so that limited resources are preferentially used for satisfying a fr