CN-115221130-B - File processing system, file processing method and device

CN115221130BCN 115221130 BCN115221130 BCN 115221130BCN-115221130-B

Abstract

The application discloses a file processing system, a file processing method and a file processing device. The system comprises a client, wherein an application program is arranged in the client, the application program initiates a file data processing request to a key value pair system and/or a persistent system through the client so as to provide services to the outside, the key value pair system, the persistent file system is used for storing file data in a persistent mode, the application program interacts file data with the client through a portable operating system interface, the client interacts file data with the key value pair system through an interface of an adaptive key value pair system, and the client interacts file data with the persistent file system through a portable operating system interface. The application solves the technical problem that the error rate is higher in data processing because the system needs to modify and adapt the source code of the application program by using the key value in the related technology.

Inventors

ZHU LINGYU
LIU XIAOLI
DU YUNFEI

Assignees

阿里巴巴（中国）有限公司

Dates

Publication Date: 20260512
Application Date: 20220606

Claims (13)

1. A document processing system, comprising: A client, wherein an application program is deployed in the client, and the application program initiates a request of file data processing to a system and/or a persistent system through a key value to provide services to the outside; The key value pair system is used as a cache database between the client and the persistent file system; The persistent file system is used for storing file data in a persistent manner; The application program performs file data interaction with the client through a portable operating system interface, the client performs file data interaction with the key value pair system through adapting the key value pair system interface, and the client performs file data interaction with the persistent file system through the portable operating system interface.
2. The system according to claim 1, wherein the files in the key-value pair system are stored in the form of a plurality of block contents of a preset length, wherein the first key-value pair is constructed by taking the name of the file and the block number of the block contents as keys, taking the block contents as values, and storing the first key-value pair in the key-value pair system.
3. The system according to claim 1, wherein the key-value pair system is further configured to store attribute information and symbolic links of a file, wherein a name and an attribute instruction of the file are used as keys, attribute information of the file and/or symbolic links of the file are used as values, a second key-value pair is constructed, and the second key-value pair is stored in the key-value pair system.
4. The system of claim 1, wherein the key-value pair system is further configured to store a directory tree of the file, wherein a parent node name of the directory tree is used as a key, operation log information for the parent node is used as a value, a third key-value pair is constructed, and the third key-value pair is stored in the key-value pair system.
5. A method of processing a document, characterized in that the method is applied to the processing system of a document according to any one of claims 1 to 4, comprising: acquiring a data processing request initiated by an application program on file data; Determining a request type of the data processing request, wherein the request type is one of a file reading request, a file writing request and a directory processing request; And carrying out data processing on the file data according to the request type.
6. The method of claim 5, wherein if the type of the data processing request is a read file request, performing data processing on the file data according to the request type comprises: Determining a first block number according to a first file name in the file reading request, a first starting byte and the length of read data; And obtaining a first piece of content according to the first file name and the first piece number, and feeding back the first piece of content to the application program.
7. The processing method of claim 6, wherein obtaining a first piece of content based on the first file name and the first block number, and feeding back the first piece of content to the application program comprises: Judging whether the client stores the first block content according to the first file name and the first block number; If the client stores the first piece of content, the first piece of content is obtained from the client and is fed back to the application program; If the client does not store the first block content, forwarding the file reading request to a key value pair system through the client, and judging whether the key value pair system stores the first block content according to the first file name and the first block number; If the key value pair system stores the first block of content, caching the first block of content stored in the key value pair system to the client, and feeding back the first block of content to the application program through the client; And if the key value pair system does not store the first block of content, forwarding the file reading request to a persistent file system through the client, acquiring the first block of content from the persistent file system, and feeding back the first block of content to the application program through the client.
8. The processing method according to claim 7, wherein after feeding back the first piece of content to the application program by the client, the method further comprises: And storing the first piece of content to the key value pair system in the form of key value pairs through the client.
9. The method according to claim 5, wherein if the type of the data processing request is a write file request, performing data processing on the file data according to the request type comprises: Forwarding the file writing request to a persistent file system through a client, and completing the file writing request in the persistent file system; And after the persistent file system finishes the file writing request, writing target data information corresponding to the file writing request into a key value pair system.
10. The processing method according to claim 9, wherein writing the target data information corresponding to the write file request to the key-pair system after the persistent file system completes the write file request includes: Determining a second block number according to a second file name in the file writing request, a second initial byte and the length of the target data information; judging whether the client stores second block contents corresponding to the second block number according to the second file name and the second block number; If the client stores the second block content corresponding to the second block number, the target data information is covered on the second block content to obtain the processed second block content, and the processed second block content is marked as a target dirty block; if the target data information completely covers the second block of content, initiating a write-in request of the target dirty block to the system through the client to write the target dirty block into the system through the key value; If the client does not store the second block content corresponding to the second block number, judging whether the length of the target data information is equal to a preset length, if the length of the target data information is equal to the preset length, writing the target data information into the key value pair system, if the length of the target data information is not equal to the preset length, caching the second block content corresponding to the target data information in the key value pair system to the client, executing the process of covering the second block content with the target data information at the client to obtain the processed second block content, and marking the processed second block content as the target dirty block.
11. The processing method according to claim 5, wherein if the type of the data processing request is a directory processing request, performing data processing on the file data according to the request type comprises: if the directory processing request is detected to be a first directory node adding request, adding the operation content of the first directory node adding request into a key value pair corresponding to the first directory; if the directory processing request is detected to be a second directory node deleting request, adding the operation content of the second directory node deleting request into a key value pair corresponding to the second directory; if the directory processing request is detected to be a third directory inquiring request, acquiring operation log information corresponding to the third directory from the key value pair system according to the name of a father node of the third directory; and screening the operation log information to obtain a screened third catalogue, and feeding the screened third catalogue back to the application program through a client.
12. A document processing apparatus, wherein the processing apparatus is applied to the document processing system according to any one of claims 1 to 4, comprising: the acquisition unit is used for acquiring a data processing request initiated by an application program on file data; a determining unit, configured to determine a request type of the data processing request, where the request type is one of a file reading request, a file writing request, and a directory processing request; And the processing unit is used for carrying out data processing on the file data according to the request type.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the method of processing a file according to any one of claims 5 to 11.

Description

File processing system, file processing method and device Technical Field The application relates to the technical field of file data processing, in particular to a file processing system, a file processing method and a file processing device. Background The AI training scene has the characteristics of high concurrency, frequent reading operation, massive small files and the like. The IO characteristics of the conventional distributed file processing system facing the AI training scene cannot be well matched. On the other hand, the traditional key value pair system, especially the key value pair system taking the memory as the center, has the characteristics of simple interface, high performance, convenient operation and maintenance and the like, and has better matching with the IO load characteristic of the AI training scene. Therefore, the key value pair system can be used as a buffer between the application program and the persistent file system to improve the throughput rate of the file IO port of the application program and reduce the delay of the IO port. However, in order to modify the source code of the application program by using the key value, that is, changing the POSIX API to the get and put interfaces of the adaptation key value pair system, more labor cost is required, and the error rate in data processing is relatively high. Existing distributed data caching layers, such as open source distributed memory file processing systems. The processing system of the open source distributed memory file consists of a central node master, a storage node worker, a processing system UFS (i.e. a persistent file system) of a lower layer file and other components. The central node master centrally manages cached file directory tree information, active and invalid worker node information, and organization relation information of all files-blocks and blocks-workers, etc. Before reading the file, the client needs to communicate with a central node master to acquire the block information of the file, and then initiates a data read-write request with a storage node worker where the block content is located. The storage node worker manages locally cached block information and responds to a request for reading block data of a client, and requests for pulling or persisting dirty data with the UFS are realized. The processing system of the open source distributed memory file has a central node master, which causes limitation on expansibility and usability. The client needs to perform RPC communication with the central node master and the storage node worker for a plurality of times to perform file data IO, and the central node master faces huge query pressure and metadata storage pressure when facing high concurrency reading of a large number of files. The delay of multiple RPC operations severely slows down the throughput of an application accessing a file every time a small file is read. Aiming at the problem that in the related art, in order to use a key value to modify and adapt the source code of an application program, the error rate is relatively high during data processing, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a file processing system, a file processing method and a file processing device, which at least solve the technical problem that in the related art, in order to use a key value to modify and adapt source codes of application programs, the error rate is high during data processing. According to one aspect of the embodiment of the application, a file processing system is provided, which comprises a client, wherein an application program is deployed in the client, the application program initiates a request of file data processing to a key value pair system and/or a persistent system through the client so as to provide services to the outside, the key value pair system is used as a cache database between the client and the persistent file system, the persistent file system is used for persistently storing file data, the application program interacts with the client through a portable operating system interface, the client interacts with the key value pair system through an interface adapting the key value pair system, and the client interacts with the persistent file system through a portable operating system interface. Further, the files in the key value pair system are stored in the form of a plurality of block contents with preset lengths, wherein the names of the files and the block numbers of the block contents are used as keys, the block contents are used as values, a first key value pair is built, and the first key value pair is stored in the key value pair system. Further, the key value pair system is further configured to store attribute information and symbol links of a file, wherein the name and the attribute instruction of the file are used as keys, the attribute information of the file and/or the symbol links of