CN-118035185-B - Method, apparatus, electronic device and program product for caching data
Abstract
Embodiments of the present disclosure relate to methods, apparatuses, electronic devices, and program products for caching data. The method includes monitoring requests in a distributed file system, wherein the distributed file system is configured with a storage node and a plurality of computing nodes, and the computing nodes of the plurality of computing nodes are configured with accelerator resources and a storage container group. The method further includes adding a dynamic computing node in the distributed file system in response to the request meeting a predetermined condition, and caching data using a set of storage containers in the added dynamic computing node. According to the embodiment of the disclosure, by configuring the computing nodes, the storage nodes and the dynamic computing nodes in the system, the stability of the storage system is ensured, the flexibility and expansibility of the system are improved, and efficient file storage and access experience is provided for users.
Inventors
- YANG ZIYE
Assignees
- 抖音视界有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20240227
Claims (10)
- 1. A method for caching data, comprising: monitoring requests in a distributed file system, the distributed file system configured with a storage node and a plurality of computing nodes, and the computing nodes of the plurality of computing nodes configured with accelerator resources and a set of storage containers, the computing nodes having a set of working containers running thereon, the set of working containers for executing the requests; Adding a dynamic computing node in the distributed file system in response to the request meeting a predetermined condition; Caching data using the added set of storage containers in the dynamic computing node, wherein the cached data includes data generated by the set of working containers in the dynamic computing node, and Removing the dynamic computing node in response to completion of the request; wherein removing the dynamic computing node comprises: Removing the dynamic computing node in response to at least one of the storage node and the computing node storing data in the dynamic computing node; Wherein the method further comprises: Determining data associated with the request; accessing, by the computing node, data associated with the request in response to the computing node storing the data, and In response to the computing node not storing data associated with the request, the data in the storage node is transmitted to the dynamic computing node.
- 2. The method of claim 1, further comprising: running the set of working containers on the dynamic computing node, and Loading the data to the set of working containers of the dynamic computing node, the data comprising at least a portion of a machine learning model file.
- 3. The method of claim 1, wherein caching data using the added set of storage containers in the dynamic computing node comprises: data associated with the request is cached using a set of storage containers in the dynamic computing node.
- 4. The method of claim 1, wherein removing the dynamic computing node further comprises: responsive to neither the storage node nor the computing node storing data for the dynamic computing node: transmitting data of the dynamic computing node to the computing node, or Transmitting data of the dynamic computing node to the storage node, and The dynamic computing node is removed.
- 5. The method of claim 4, wherein transmitting data of the dynamic computing node to the computing node comprises: The data is loaded from the dynamic computing node to the computing node through remote direct memory access without passing through the dynamic computing node and a processor of the computing node.
- 6. The method of claim 4, wherein transmitting data of the dynamic computing node to the storage node comprises: The data is loaded from the dynamic computing node to the storage node via a transport control protocol based storage protocol.
- 7. The method of claim 1, further comprising: synchronizing data of the storage nodes to a remote object storage service, and Synchronizing data in the compute nodes to the remote object storage service.
- 8. An apparatus for caching data, comprising: A request monitoring module configured to monitor requests in a distributed file system, the distributed file system configured with storage nodes and a plurality of compute nodes, and the compute nodes of the plurality of compute nodes configured with accelerator resources and a set of storage containers, the compute nodes having a set of working containers running thereon, the set of working containers for executing the requests; A dynamic computing node addition module configured to add a dynamic computing node in the distributed file system in response to the request meeting a predetermined condition; A data caching module configured to cache data using the added set of storage containers in the dynamic computing node, wherein the cached data includes data generated by the set of working containers in the dynamic computing node; a removal module configured to remove the dynamic computing node in response to completion of the request; wherein the removal module comprises: A dynamic computing node removal module configured to remove the dynamic computing node in response to at least one of the storage node and the computing node storing data in the dynamic computing node; wherein the apparatus further comprises: a data determination module configured to determine data associated with the request; an access module configured to access data associated with the request by the computing node in response to the computing node storing the data, and A transmission module configured to transmit the data in the storage node to the dynamic computing node in response to the computing node not storing the data associated with the request.
- 9. An electronic device, comprising: processor, and A memory coupled with the processor, the memory having instructions stored therein, which when executed by the processor, cause the electronic device to perform the method of any of claims 1-7.
- 10. A computer program product comprising computer executable instructions, wherein the computer executable instructions are executed by a processor to implement the method of any one of claims 1 to 7.
Description
Method, apparatus, electronic device and program product for caching data Technical Field The present disclosure relates generally to the field of computer technology, and more particularly, to a method, apparatus, electronic device, and program product for caching data. Background In the course of computer execution of tasks, storage systems play a vital role in maintaining and managing all the data required for application program operation. These data include not only user generated files, pictures, videos, etc., but also key system files such as operating systems, applications, and hardware drivers. For example, in executing a model service using Container technology, the model service needs to load huge model files and also needs to keep checkpoint files continuously during training. The influence of the storage system on task execution is also reflected in the aspects of data access speed, data consistency, data reliability and the like. The high-speed data access speed can ensure the efficient execution of the task, and the consistency and the reliability of the data can ensure the accuracy and the stability of the task result. For example, in the course of model reasoning, the model can generate high quality responses in a short time, and rapid data access speed needs to be improved. Disclosure of Invention Embodiments of the present disclosure provide a method, apparatus, electronic device, and program product for caching data. According to a first aspect of the disclosure, a method for caching data is provided. The method includes monitoring requests in a distributed file system, wherein the distributed file system is configured with a storage node and a plurality of computing nodes, and the computing nodes of the plurality of computing nodes are configured with accelerator resources and a storage container group. The method further includes adding a dynamic computing node in the distributed file system in response to the request meeting a predetermined condition. Furthermore, the method includes caching data using the set of storage containers in the added dynamic computing node. In a second aspect of the disclosure, an apparatus for caching data is provided. The apparatus includes a request monitoring module configured to monitor requests in a distributed file system, wherein the distributed file system is configured with a storage node and a plurality of computing nodes, and the computing nodes of the plurality of computing nodes are configured with accelerator resources and a storage container group. The apparatus also includes a dynamic computing node addition module configured to add a dynamic computing node in the distributed file system in response to the request meeting a predetermined condition. The apparatus further includes a data caching module configured to cache data using the set of storage containers in the added dynamic computing node. In a third aspect of the present disclosure, an electronic device is provided. The electronic device comprises a processor and a memory coupled to the processor, the memory having instructions stored therein, which when executed by the processor, cause the electronic device to perform the method according to the first aspect. In a fourth aspect of the present disclosure, a computer program product is provided. The computer-readable storage medium has stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method according to the first aspect. The summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Drawings The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which: FIG. 1 illustrates a schematic diagram of an example environment in which some embodiments of the present disclosure may be implemented; FIG. 2 illustrates a flow chart of a method for caching data of some embodiments of the present disclosure; FIG. 3 illustrates a schematic diagram of an architecture for caching data in accordance with some embodiments of the present disclosure; FIG. 4 illustrates a schematic diagram for remote direct memory access (Remote Direct Memory Access, RDMA) between computing nodes according to some embodiments of the present disclosure; FIG. 5 illustrates a schematic diagram of a flow for caching data according to some embodiments of the present disclosure; FIG. 6A illustrates a schematic diagram for adding dynamic computing nodes in accordan