Search

CN-121996638-A - Data processing method, system and device

CN121996638ACN 121996638 ACN121996638 ACN 121996638ACN-121996638-A

Abstract

The embodiment of the specification provides a data processing method, a system and a device, wherein the method comprises the steps of receiving a plurality of data processing requests, determining virtual files corresponding to the data processing requests and target file sets corresponding to the virtual files according to virtual file identifiers, wherein the target file sets are sets of physical files in a target data storage system, at least two virtual files correspond to the same target file set, one target file set corresponds to one file handle, opening the target file set by utilizing the file handle of the target file set, executing the plurality of data processing requests according to the data processing content, the virtual files and the target file sets corresponding to the virtual files, and writing one (or a small number of) files into the file system actually when a large number of files are processed, so that the resource occupation is reduced and more concurrent processing is supported.

Inventors

  • WANG XIANG
  • LI WENTAN
  • DAI MIN
  • ZHAO BAIQIANG
  • SHEN CHUNHUI
  • ZHANG WEI

Assignees

  • 阿里云计算有限公司

Dates

Publication Date
20260508
Application Date
20241101

Claims (20)

  1. 1. A data processing method, comprising: receiving a plurality of data processing requests, wherein the plurality of data processing requests carry virtual file identifiers and data processing contents corresponding to the data processing requests; Determining a virtual file corresponding to the data processing request and a target file set corresponding to the virtual file according to the virtual file identifier, wherein the target file set is a set of physical files in a target data storage system, at least two virtual files in a plurality of virtual files correspond to the same target file set, and one target file set corresponds to one file handle; and opening the target file set by using the file handle of the target file set, and executing the data processing requests according to the data processing content, the virtual file and the target file set corresponding to the virtual file.
  2. 2. The data processing method according to claim 1, wherein the plurality of data processing requests are a plurality of data writing requests, and the data processing content is data to be written; correspondingly, the executing the plurality of data processing requests according to the data processing content, the virtual file, and the target file set corresponding to the virtual file includes: Obtaining data to be written corresponding to the virtual file according to the corresponding relation between the virtual file and the data processing request and the associated relation between the data processing request and the data to be written; Encapsulating the data to be written corresponding to the virtual file to obtain a data encapsulation packet corresponding to the virtual file; Writing the data encapsulation package corresponding to the virtual file into a target buffer area corresponding to a target file set corresponding to the virtual file; And under the condition that the preset writing condition is met, writing the data package written in the target buffer area into a target file set corresponding to the target buffer area.
  3. 3. The data processing method according to claim 1, wherein before determining, according to the virtual file identifier, the virtual file corresponding to the data processing request and the target file set corresponding to the virtual file, the method further comprises: Receiving a creation request of the virtual file, wherein the creation request carries a file path of the virtual file; Creating the virtual file according to the file path, and distributing a corresponding target file set for the virtual file; writing the corresponding relation between the virtual file and the target file set corresponding to the virtual file into a target storage position, and determining a corresponding virtual file identification for the virtual file.
  4. 4. The data processing method according to claim 2, in a case that a preset writing condition is satisfied, writing a data package written in a target buffer into a target file set corresponding to the target buffer, including: Determining a physical file in an open state in the target file set as a target file, wherein the open state is used for indicating that the physical file is opened and data is allowed to be written, and a file handle corresponding to the target file set is a file handle corresponding to the target file; and under the condition that the preset writing condition is met, writing the data of the data package in the target buffer zone into the target file in the target file set corresponding to the target buffer zone by using the writing thread corresponding to the target file set.
  5. 5. The data processing method according to claim 2, wherein when a preset writing condition is met, writing the data package written in the target buffer into the target file set corresponding to the target buffer, further comprises: Under the condition that the data package written in the target buffer area is successfully written in the target file set corresponding to the target buffer area, the brushing thread is utilized to call a brushing interface, the data package written in the target file set successfully is stored in a persistent storage medium, The brushing thread is a thread corresponding to the target file set, and the brushing interface is an interface corresponding to the target file set, which is provided by the target data storage system.
  6. 6. The data processing method according to claim 5, wherein the writing the data package written in the target buffer area successfully into the target file set corresponding to the target buffer area includes: Determining the data quantity of the data to be written in the data package written in the target buffer area; updating written point positions corresponding to the target file sets corresponding to the target buffer areas according to the data quantity of the data to be written in the data package; Under the condition that the updating is completed, determining the data package written in the target buffer zone, successfully writing in the target file set corresponding to the target buffer zone, The written point location is used for recording the data volume of the data to be written into the target file set, and incremental updating is performed according to the data volume of the data to be written which is successfully written into the target file set.
  7. 7. The data processing method according to claim 6, wherein the calling the swiping interface by the swiping thread stores the data package successfully written to the target file set in the persistent storage medium, and the method comprises: Utilizing a brushing thread corresponding to the target file set, monitoring the written point location and a triggered brushing point location, wherein the triggered brushing point location is used for recording the data quantity of data to be written which is brushed into the persistent storage medium; And according to the written point location and the triggered brushing point location, the brushing interface is called under the condition that the data package which is successfully written into the target file set is determined and is not stored in the persistent storage medium, and the data package which is successfully written into the target file set is stored in the persistent storage medium.
  8. 8. The data processing method according to claim 5, wherein after storing the data encapsulation packet successfully written to the target file set in the persistent storage medium, the method further comprises: and deleting the data encapsulation package in the target buffer area corresponding to the corresponding target file set according to the callback function corresponding to the data encapsulation package which is stored to the persistent storage medium in a persistent mode.
  9. 9. The data processing method of claim 4, further comprising: Under the condition that the target file meets a preset switching condition, creating a new target file in a target file set corresponding to the target file, and updating an opening state of the target file into a closing state to obtain a physical file in the closing state, wherein the preset switching condition is that the data amount in the target file is larger than a preset threshold value or the creation time of the target file exceeds a preset time, and the closing state is used for indicating that the target file is not allowed to write data any more; And recovering file data in the physical file in the closed state under the condition that the physical file in the closed state meets the preset recovery condition.
  10. 10. The data processing method according to claim 9, wherein, in a case where it is determined that the physical file in the closed state satisfies a preset reclamation condition, reclaiming file data in the physical file in the closed state includes: Under the condition that the physical file in the closed state meets the preset recycling condition, creating a new target file in a target file set corresponding to the physical file in the closed state, and determining the physical file in the closed state as a file to be recycled; Writing the non-junk data in the file to be recycled into the new target file, and replacing the file to be recycled with the new target file, wherein the non-junk data is file data except junk data in the file to be recycled, and the junk data is deleted file data.
  11. 11. The data processing method according to claim 4, wherein the writing, by using the writing thread corresponding to the target file set, the data of the data package in the target buffer into the target file in the target file set corresponding to the target buffer includes: Writing the data package written in by a target buffer area into the target file in the target file set corresponding to the target buffer area by using a writing thread corresponding to the target file in the target file set; Under the condition that the writing thread corresponding to the target file in the target file set reports errors, a new target file is created in the target file set, and the writing thread corresponding to the new target file in the target file set is utilized to write the data package written in the corresponding target buffer zone into the new target file.
  12. 12. The data processing method according to any one of claims 2 to 11, further comprising: writing the data to be written corresponding to the virtual file into a memory buffer area, and determining the data to be written as mirror image data; rearranging the mirror image data in the memory buffer area under the condition that the mirror image data exceeds a preset threshold value to obtain rearranged data; and under the condition that the target file is updated to be in a closed state, replacing the data to be written of the data package written in the target file by using the rearranged data.
  13. 13. The data processing method according to claim 1, wherein the plurality of data processing requests are a plurality of data reading requests, and the data processing content is data information of data to be read; Correspondingly, the executing the plurality of data processing requests according to the data processing content, the virtual file and the target file set corresponding to the virtual file includes: Determining target data package information corresponding to the data to be read in the virtual file according to the data information of the data to be read; Determining a target file set corresponding to the virtual file, and determining a physical file corresponding to the virtual file from the target file set corresponding to the virtual file; Reading a target data package corresponding to the target data package information from a physical file corresponding to the virtual file; And reading the data to be read from the target data package according to the data information of the data to be read.
  14. 14. The data processing method according to claim 13, wherein the target data package information includes a virtual file corresponding to the data package, a data offset corresponding to the virtual file, and a data offset of a target file corresponding to the virtual file.
  15. 15. The data processing method according to claim 14, wherein the reading, from the target file corresponding to the virtual file, the target data package corresponding to the target data package information includes: Determining a target file corresponding to the target data package and a data offset of the target file according to the target data package information; and reading the target data package at the data offset position from the target file corresponding to the virtual file according to the data offset of the target file.
  16. 16. A data processing method is applied to a log file scene, and comprises the following steps: receiving a plurality of log data processing requests, wherein the plurality of log data processing requests carry virtual log file identifiers and log data processing contents corresponding to the plurality of log data processing requests; Determining a virtual log file corresponding to the log data processing request and a target log file set corresponding to the virtual log file according to the virtual log file identifier, wherein the target log file set is a set of physical log files in a target data storage system, at least two virtual log files in a plurality of virtual log files correspond to the same target log file set, and one target log file set corresponds to one file handle; And opening the target log file set by using the file handle of the target log file set, and executing the plurality of log data processing requests according to the log data processing content, the virtual log files and the target log file set corresponding to the virtual log files.
  17. 17. A data storage system, comprising: the receiving unit is used for receiving a plurality of data processing requests, wherein the plurality of data processing requests carry virtual file identifiers and data processing contents corresponding to the plurality of data processing requests; the determining unit is used for determining a virtual file corresponding to the data processing request and a target file set corresponding to the virtual file according to the virtual file identifier, wherein the target file set is a set of physical files in a target data storage system, at least two virtual files in a plurality of virtual files correspond to the same target file set, and one target file set corresponds to one file handle; And the execution unit is used for opening the target file set by utilizing the file handle of the target file set, and executing the plurality of data processing requests according to the data processing content, the virtual file and the target file set corresponding to the virtual file.
  18. 18. A data processing apparatus comprising: the receiving module is configured to receive a plurality of data processing requests, wherein the plurality of data processing requests all carry virtual file identifiers and data processing contents corresponding to the plurality of data processing requests; The determining module is configured to determine a virtual file corresponding to the data processing request and a target file set corresponding to the virtual file according to the virtual file identifier, wherein the target file set at least comprises one physical file in a target data storage system, at least two virtual files in a plurality of virtual files correspond to the same target file set, and one target file set corresponds to one file handle; And the execution module is configured to open the target file set by using the file handle of the target file set, and execute the plurality of data processing requests according to the data processing content, the virtual file and the target file set corresponding to the virtual file.
  19. 19. A computing device, comprising: A memory and a processor; The memory is adapted to store a computer program/instruction, the processor being adapted to execute the computer program/instruction, which when executed by the processor, implements the steps of the data processing method of any of claims 1 to 15.
  20. 20. A computer readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the data processing method of any one of claims 1 to 15.

Description

Data processing method, system and device Technical Field Embodiments of the present disclosure relate to the field of file systems, and in particular, to a data processing method, system, and device. Background Distributed databases often have a table that is fragmented (region), and different fragments fall onto different machines to implement a multi-machine collaboration service, where multiple fragments exist on a single machine. Since in general a large number of concurrent processes of files (concurrent processes including concurrent writing and concurrent reading, each file writing or file reading requires opening of a file) may result in insufficient file handle resources, e.g. HBase (an open-source, distributed storage system) uses HDFS (Hadoop Distributed FILE SYSTEM, a distributed file system) for journaling, which occupies several TCP (Transmission Control Protocol, a connection-oriented, reliable, byte-stream based transport layer communication protocol) thread resources each time a file is opened. In order to solve this problem, a single database machine often writes a plurality of fragmented WAL logs (Write-Ahead Logging, pre-written logs, transaction logs in a database, in a database management system, WAL is a logging mechanism used for guaranteeing consistency and durability of data. When error recovery is carried out, the log file is required to be cut according to the fragments (Split) and then data recovery is carried out, so that the data recovery time is increased, extra complexity is brought to the database, and the problem of file handle resource waste caused by concurrent writing of the file is solved by the method. Disclosure of Invention In view of this, the present specification embodiments provide two data processing methods. One or more embodiments of the present invention relate to a data processing system, a data processing apparatus, a computing device, a computer readable storage medium, and a computer program product that address the technical deficiencies of the prior art that may result in insufficient file handle resources from multiple file concurrent processing. According to a first aspect of embodiments of the present specification, there is provided a data processing method, including: receiving a plurality of data processing requests, wherein the plurality of data processing requests carry virtual file identifiers and data processing contents corresponding to the data processing requests; Determining a virtual file corresponding to the data processing request and a target file set corresponding to the virtual file according to the virtual file identifier, wherein the target file set is a set of physical files in a target data storage system, at least two virtual files in a plurality of virtual files correspond to the same target file set, and one target file set corresponds to one file handle; and opening the target file set by using the file handle of the target file set, and executing the data processing requests according to the data processing content, the virtual file and the target file set corresponding to the virtual file. According to a second aspect of embodiments of the present disclosure, there is provided a data processing method applied to a log file scene, including: receiving a plurality of log data processing requests, wherein the plurality of log data processing requests carry virtual log file identifiers and log data processing contents corresponding to the plurality of log data processing requests; Determining a virtual log file corresponding to the log data processing request and a target log file set corresponding to the virtual log file according to the virtual log file identifier, wherein the target log file set is a set of physical log files in a target data storage system, at least two virtual log files in a plurality of virtual log files correspond to the same target log file set, and one target log file set corresponds to one file handle; And opening the target log file set by using the file handle of the target log file set, and executing the plurality of log data processing requests according to the log data processing content, the virtual log files and the target log file set corresponding to the virtual log files. According to a third aspect of embodiments of the present specification, there is provided a data storage system comprising: the receiving unit is used for receiving a plurality of data processing requests, wherein the plurality of data processing requests carry virtual file identifiers and data processing contents corresponding to the plurality of data processing requests; the determining unit is used for determining a virtual file corresponding to the data processing request and a target file set corresponding to the virtual file according to the virtual file identifier, wherein the target file set is a set of physical files in a target data storage system, at least two virtual files in a plurality of virtual files corr