CN-116028445-B - File processing method and device, storage medium and electronic device
Abstract
The embodiment of the invention provides a file processing method, a device, a storage medium and an electronic device, wherein the method comprises the steps of scanning first file index information in a first combined file to obtain a first scanning result, wherein the first combined file is a combined file obtained by combining a first group of files, the first file index information is used for indicating whether the first group of files in the first combined file are deleted, and under the condition that the first scanning result indicates that the first part of files in the first group of files are deleted and the deleted first part of files meet a preset first contraction condition, a combining operation is carried out according to first residual files to obtain second combined files, the first residual files are files except the first part of files in the first group of files, and storage positions of the first residual files in the second combined file are continuous. The embodiment of the invention solves the problem of low capacity utilization rate of the storage system in the related technology.
Inventors
- YU JUNGAN
- ZHOU WENKAI
- ZHENG YANTAO
- ZHENG JI
Assignees
- 浙江大华技术股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20230116
Claims (10)
- 1. A method for processing a document, comprising: Scanning first file index information in a first merged file to obtain a first scanning result, wherein the first merged file is a merged file obtained by merging a first group of files, and when none of the first group of files in the first merged file is deleted, storage positions of the first group of files in the first merged file are continuous, the first file index information is used for indicating whether the first group of files in the first merged file are deleted or not, and the first file index information is also used for indicating sequence numbers of the first group of files in the first merged file and/or storage positions of the first group of files; Executing a merging operation according to a first residual file to obtain a second merged file under the condition that the first scanning result indicates that a first part of files in the first group of files are deleted and the deleted first part of files meet a preset first tightening condition, wherein the first residual file is a file except the first part of files in the first group of files, and the storage positions of the first residual file in the second merged file are continuous; And executing a merging operation according to a first residual file to obtain a second merged file under the condition that the first scanning result indicates that a first part of files in the first group of files are deleted and the deleted first part of files meets a preset first tightening condition, wherein the second merged file is obtained according to the first residual file, the third merged file is a merged file obtained by scanning third file index information in a third merged file, the third merged file is a merged file obtained by merging a second group of files under the condition that the first scanning result indicates that the first part of files in the first group of files are deleted and the deleted first part of files meet a third tightening condition, the second scanning result indicates that a second part of files in a second group of files are deleted and the deleted second part of files meet a preset second tightening condition, and the second merged file is obtained according to the first residual file, and the third scanning result indicates whether the third merged file index information in the third merged file is used for storing the third group of files or not.
- 2. The method of claim 1, wherein prior to performing a merge operation from the first remaining file to obtain a second merged file, the method further comprises: Determining that the deleted first partial file satisfies the first compaction condition in a case where a ratio between a number of files in the first partial file and a number of files in the first group of files is greater than or equal to a first proportional threshold and/or in a case where a ratio between a file capacity of the first partial file and a file capacity of the first group of files is greater than or equal to a second proportional threshold.
- 3. The method according to claim 2, wherein the method further comprises: in the case where the first file index information includes the number of the first group of files in the first merged file, determining the number of files in the first partial file as equal to the number of the first partial files in the first file index information and determining the number of files in the first group of files as equal to the number of the first group of files in the first file index information, and/or And determining the file capacity of the first part of files to be equal to the sum of the capacities corresponding to the storage positions of the files in the first part of files indicated by the first file index information, and determining the file capacity of the first part of files to be equal to the capacity corresponding to the storage positions of the first part of files indicated by the first file index information, when the first file index information also indicates the storage positions of the first part of files in the first merged file.
- 4. The method of claim 1, wherein the performing a merging operation according to the first remaining file to obtain a second merged file includes: Adjusting the storage position of the first residual file in the first combined file to be a continuous storage position, releasing the storage position of the non-stored file in the first combined file, and modifying the first file index information to be second file index information to obtain the second combined file, wherein the second file index information is used for indicating the storage position of the first residual file in the second combined file and indicating whether the first residual file in the second combined file is deleted or not, or And merging the first residual files into the second merged file, and deleting the first merged file, wherein the second merged file comprises the second file index information.
- 5. The method of claim 1, wherein the step of determining the position of the substrate comprises, The method further comprises the steps of: in a case that the first scan result indicates that the first partial file in the first group of files is deleted and the deleted first partial file meets a third compression condition, and the second scan result indicates that the second partial file in the second group of files is deleted and the deleted second partial file meets a preset second compression condition, performing a merging operation according to the first remaining files to obtain the second merged file, including: Writing a second remaining file into the first combined file, adjusting storage positions of the first remaining file and the second remaining file in the first combined file to be continuous storage positions, releasing storage positions of non-stored files in the first combined file and modifying the first file index information to third file index information to obtain the second combined file, wherein the third file index information is used for indicating storage positions of the first remaining file and the second remaining file in the second combined file and indicating whether the first remaining file and the second remaining file in the second combined file are deleted or not, and the second remaining file is a file except the second portion file in the second group of files Writing the second residual file into the first combined file under the condition that the file capacity of the second residual file is equal to the file capacity of the first part file, adjusting the storage positions of the first residual file and the second residual file in the first combined file into continuous storage positions, and modifying the first file index information into the third file index information to obtain the second combined file, or And merging the first residual file and the second residual file into the second merged file, and deleting the first merged file and the third merged file, wherein the second merged file comprises the third file index information.
- 6. The method of claim 5, wherein the method further comprises: determining that the deleted second partial file satisfies the second compaction condition in a case where a ratio between a number of files in the second partial file and a number of files in the second group of files is greater than or equal to a third ratio threshold and/or in a case where a ratio between a file capacity of the second partial file and a file capacity of the second group of files is greater than or equal to a fourth ratio threshold.
- 7. The method of claim 1, wherein the performing a merging operation according to the first remaining file to obtain a second merged file includes: and executing merging operation according to the first remaining files and N remaining files to obtain the second merged file, wherein N is equal to 1 or a positive integer greater than or equal to 2, and the N remaining files are used for representing the remaining files of each merged file in the N merged files after deleting part of the files.
- 8. A document processing apparatus, comprising: The first scanning module is used for scanning first file index information in a first combined file to obtain a first scanning result, wherein the first combined file is a combined file obtained by combining a first group of files, the storage position of the first group of files in the first combined file is continuous under the condition that none of the first group of files in the first combined file is deleted, the first file index information is used for indicating whether the first group of files in the first combined file are deleted, and the first file index information is also used for indicating the sequence number of the first group of files in the first combined file and/or the storage position of the first group of files; The processing module is used for executing merging operation according to a first residual file to obtain a second merged file when the first scanning result indicates that a first part of files in the first group of files are deleted and the deleted first part of files meet a preset first tightening condition, wherein the first residual file is a file except the first part of files in the first group of files, and the storage position of the first residual file in the second merged file is continuous; The processing module is used for obtaining a second combined file by executing combining operation according to the first residual file when the first scanning result indicates that a first part of files in the first group of files are deleted and the deleted first part of files meet a preset first contraction condition, wherein the second combined file is obtained by scanning third file index information in a third combined file, the third combined file is obtained by combining a second group of files, the third combined file is obtained by continuously storing information indicating whether the third combined file is used for deleting the third combined file or not when the first scanning result indicates that the first part of files in the first group of files are deleted and the deleted first part of files meet a third contraction condition, the second scanning result indicates that a second part of files in the second group of files are deleted and the deleted second part of files meet a preset second contraction condition, and the second combined file is obtained according to the first residual file.
- 9. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 7.
- 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.
Description
File processing method and device, storage medium and electronic device Technical Field The embodiment of the invention relates to the technical field of storage, in particular to a file processing method and device, a storage medium and an electronic device. Background In the era of explosion of media information, there are hundreds of billions or more of media information picture storage, and the average size of the picture is only 15kb, so that the problem of storing massive small files is defined. The existing storage systems are mainly designed aiming at large files, such as HDFS and CEPH, focus on the large files in terms of metadata management, data layout, stripe design, cache management and other implementation strategies, and massive small file applications are greatly reduced in terms of performance and storage efficiency and even cannot work. In the related art, there is a method of merging small files into large files, but when a user deletes a small file, there are other undeleted small files in the merged large file, and the entire merged large file cannot be deleted, so that the deleted small files in the merged large file become data holes, resulting in capacity incapable of being released, and thus the capacity utilization of the storage system is reduced. Aiming at the problem of low capacity utilization rate of a storage system in the related art, no effective solution is proposed at present. Disclosure of Invention The embodiment of the invention provides a file processing method, a file processing device, a storage medium and an electronic device, which are used for at least solving the problem of low capacity utilization rate of a storage system in the related technology. According to one embodiment of the invention, a processing method of files is provided, which comprises the steps of scanning first file index information in a first combined file to obtain a first scanning result, wherein the first combined file is a combined file obtained by combining a first group of files, when none of the first group of files in the first combined file is deleted, storage positions of the first group of files in the first combined file are continuous, the first file index information is used for indicating whether the first group of files in the first combined file is deleted, and when the first scanning result indicates that a first part of files in the first group of files is deleted and the deleted first part of files meet a preset first contraction condition, a second combined file is obtained according to a first residual file, wherein the first residual file is a file except the first part of files in the first group of files, and the storage positions of the first residual file in the second combined file are continuous. In an exemplary embodiment, before performing a merging operation according to the first remaining files to obtain a second merged file, the method further includes determining that the deleted first partial file satisfies the first compaction condition if a ratio between a number of files in the first partial file and a number of files in the first group of files is greater than or equal to a first proportion threshold and/or if a ratio between a file capacity of the first partial file and a file capacity of the first group of files is greater than or equal to a second proportion threshold. In an exemplary embodiment, the method further comprises determining the number of files in the first partial file to be equal to the sum of the capacities corresponding to the storage locations of the files in the first partial file represented by the first file index information and determining the number of files in the first group to be equal to the number of the files in the first file index information, and/or determining the file capacity of the first group to be equal to the capacity corresponding to the storage locations of the files in the first group represented by the first file index information, in case the first file index information also represents the storage locations of the files in the first merged file, in case the first file index information comprises the number of files in the first partial file in the first merged file. In an exemplary embodiment, the step of performing a merging operation according to the first remaining files to obtain a second merged file includes adjusting a storage location of the first remaining files in the first merged file to be a continuous storage location, releasing a storage location of a non-stored file in the first merged file, and modifying the first file index information to be second file index information, so as to obtain the second merged file, where the second file index information is used to indicate a storage location of the first remaining files in the second merged file and indicates whether the first remaining files in the second merged file are deleted, or merging the first remaining files into the second merged fi