CN-117056542-B - Method, device and medium for archiving and intelligent scheduling of mass files
Abstract
The invention discloses a method, equipment and a medium for archiving and intelligently scheduling mass files, which belong to the technical field of file management and comprise the following steps of storing unstructured data in a T0 stage according to a single image, storing structured data matched with each image, storing the structured data in a memory index library, establishing a main index, carrying out lossless compression archiving on an image set corresponding to the main index for data entering the T1 stage, converting the structure data set corresponding to the main index into a readable and writable configuration file, carrying out sub-storage on the structure data set corresponding to the main index for data entering the T2 stage, simultaneously clearing a memory index library, clearing an image set corresponding to the main index, regenerating temporary unstructured data of the single image and memory index library data matched with the single image for data which exceeds the T2 stage and is still required by a service, and deleting the temporary memory index library data and the unstructured data after entering the T3 stage.
Inventors
- ZHANG XUETAO
- ZHOU YANG
Assignees
- 中电通商数字技术(上海)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230714
Claims (9)
- 1. A method for archiving and intelligently scheduling mass files is characterized by comprising the following steps: Step S10, managing data with the same attribute in four time periods of T0, T1, T2 and T3; Step S20, storing unstructured data in the T0 stage according to a single image, storing the paired structured data of each image, storing the structured data into a memory index library, and establishing a main index; S30, carrying out lossless compression archiving on the image set corresponding to the main index for the data entering the T1 stage, and converting the structure data set corresponding to the main index into a readable and writable configuration file; Step S40, for the data entering the T2 stage, storing the data set of the structure corresponding to the main index separately, and simultaneously clearing the memory index library of the junction, and clearing the image set corresponding to the main index; step S50, regenerating temporary non-structural data of the single image and memory index library data matched with the single image for data which exceeds the T2 stage and is still required to be used by the service, and deleting the temporary memory index library data and the non-structural data after entering the T3 stage; In the step S10, the data with the same attribute comprises the following characteristics that the image sets belong to the same examination and are used simultaneously when being used; in the step S10, defining the time of warehousing the first unstructured image file checked once in the service system as T0; Defining timeliness time T1 of medical image data, wherein the retrieval frequency and access frequency of the image data reach a low point after the time T1; The processing time of the original data after re-archiving is defined as T2, and the time after a period of time after query use after the period of T2 is defined as T3.
- 2. The mass file archiving and intelligent scheduling method as claimed in claim 1, wherein: In step S20, structured data having the same attribute generates only one piece of main index data, and each piece of structured data is put in a warehouse to update the time of the main index.
- 3. The mass file archiving and intelligent scheduling method as claimed in claim 1 or 2, wherein: when in the stages T0, T1 and T2, the application system directly accesses the memory index library to acquire a downloading path, and downloads data through the downloading path.
- 4. The mass file archiving and intelligent scheduling method as claimed in claim 1, wherein: S31, accessing non-archived data with main index time exceeding T1, and taking out all unstructured data and matched structured data of which one-time examination represented by the main index belongs; Step S32, uniformly compressing the extracted unstructured data into a compressed file for storage, and writing the path into a main index; step S33, the extracted structured data is converted into a user-defined read-write configuration file which takes the relative path as a core for storage, and the path is written into a main index.
- 5. The mass file archiving and intelligent scheduling method as claimed in claim 4, wherein: After the steps S32 and S33 are completed, the readable and writable configuration file and the compressed file are downloaded through the main index, the compressed file is decompressed, the corresponding file is generated according to the readable and writable configuration file, data verification is carried out on the corresponding file and the original data, the data is ensured not to be distorted, and the readable and writable configuration file is a JSON configuration file or an XML configuration file.
- 6. The mass file archiving and intelligent scheduling method as claimed in claim 1, wherein: In step S40, after archiving the data in the stage T1, the processing of the original data comprises deleting the original image set corresponding to the main index, carrying out database backup on the structured data corresponding to the main index for verification of special requirements of future data, deleting the original structured data after the backup is completed, and deleting the structured data related to the memory index database.
- 7. The mass file archiving and intelligent scheduling method as claimed in claim 1, wherein: in step S50, when the application system queries the check list, whether corresponding data exists in the synchronized memory index library data, if not, directly generating a main index sensing event; after receiving the sensing event, the compressed file and the readable and writable configuration file are acquired based on the main index data, and temporary memory index library data and unstructured data are generated according to the readable and writable configuration file so as to be used by corresponding applications.
- 8. A mass file archiving and intelligent scheduling device is characterized by being used for running the mass file archiving and intelligent scheduling method according to any one of claims 1-7.
- 9. A mass file archiving and intelligent scheduling medium comprising a computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to implement the method of any one of claims 1 to 7.
Description
Method, device and medium for archiving and intelligent scheduling of mass files Technical Field The invention relates to the technical field of file management, in particular to a method, equipment and medium for archiving and intelligent scheduling of mass files. Background In the medical image industry, the storage occupied by unstructured image files is in the unit of TB, especially the annual storage capacity of digitized pathological images in head hospitals can reach the alarming PB level, and structured data information matched with unstructured data is also stored in the unit of ten millions and billions. At the same time, the data has timeliness, namely the possibility that the data is reused after a certain time is increasingly low, even not used in a special case. The general solution in the technical layer is to solve the inquiry and reading speed by means of a non-relational database, object storage and the like, and save the hardware cost by storing and selecting a low-cost storage medium by a cold and hot storage technology. The most advantageous way to reduce the storage consumption is compression, which can reduce the storage consumption by about 50% -60% for medical image file compression through practical measurement, but compression can lead to the service system not being used in a conventional way. Disclosure of Invention The present invention is directed to solving the above-described problems. An object of the present invention is to provide a method for archiving and intelligently scheduling mass files, by which unstructured image file data is re-archived, storage consumption is reduced, paired structured data is reduced, and timeliness of data access and downloading is not affected. The scheme adopted by the embodiment is that the method for archiving and intelligently scheduling the mass files comprises the following steps: Step S10, managing data with the same attribute in four time periods of T0, T1, T2 and T3; Step S20, storing unstructured data in the T0 stage according to a single image, storing the paired structured data of each image, storing the structured data into a memory index library, and establishing a main index; S30, carrying out lossless compression archiving on the image set corresponding to the main index for the data entering the T1 stage, and converting the structure data set corresponding to the main index into a readable and writable configuration file; Step S40, for the data entering the T2 stage, storing the data set of the structure corresponding to the main index separately, and simultaneously clearing the memory index library of the junction, and clearing the image set corresponding to the main index; And S50, regenerating temporary non-structural data of the single image and memory index library data matched with the single image for data which exceeds the T2 stage and still needs to be used by the service, and deleting the temporary memory index library data and the non-structural data after entering the T3 stage. Wherein the data files that can be used have the characteristic that in step S10, the data with the same attribute includes the characteristic that the image sets belong to the same examination and are used at the same time when they are used. For each time period, in step S10, the time for storing the first unstructured image file for one inspection in the service system is defined as T0, the time for storing the medical image data is defined as time T1, the retrieval frequency and access frequency of the image data reach a low point after the time T1, the processing time of the original data after the re-archiving is defined as T2, and the time after a period of time after the query use after the period T2 is defined as T3. In a preferred technical scheme of the present invention, in step S20, structured data with the same attribute only generates one piece of main index data, and each piece of structured data is put in a warehouse to update the time of the main index. The preferable technical scheme is that when in the phases of T0, T1 and T2, the application system directly accesses the memory index library to acquire a downloading path, and downloads data through the downloading path. The specific processing of the data in the step S30 comprises the step S31 of accessing the non-archived data with the main index time exceeding T1 and taking out all the unstructured data and the matched structured data of one-time inspection represented by the main index, the step S32 of uniformly compressing the taken-out unstructured data into a compressed file for storage and writing the path into the main index, and the step S33 of converting the taken-out structured data into a self-defined readable and writable configuration file with the relative path as a core for storage and writing the path into the main index. Further, after the steps S32 and S33 are completed, the readable and writable configuration file and the compressed file are downloade