CN-120687629-B - Electronic archive storage management method, system and storage medium
Abstract
The invention discloses a method, a system and a storage medium for storing and managing electronic files, which belong to the technical field of storage management, the invention extracts pictures in the electronic files, converts file characteristics into a first characteristic matrix as a first mark to carry out file matching marking on the pictures, converts various picture characteristics into a plurality of corresponding second characteristic matrixes as a plurality of second marks respectively, therefore, the position matching mark is carried out on the closed contour line corresponding to the picture, and after the matching and the position positioning mark are finished for the file and the picture, the file and the picture are respectively and independently stored, so that the storage efficiency is effectively improved, and meanwhile, the picture damage possibly caused by the uniform compression mode of the picture and the file is avoided.
Inventors
- ZHONG YONGMING
- Luo Jianyou
- YI DONGXIAN
Assignees
- 广东云垒数字科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20250623
Claims (8)
- 1. An electronic archive storage management method is characterized by comprising the following steps: Acquiring file characteristics of an electronic file, and matching a file library for the electronic file according to the file characteristics; Extracting pictures in the electronic file, reserving closed contour lines of the pictures at corresponding positions, and acquiring various picture features of the pictures; The file features are converted into first feature matrixes to serve as first marks for file matching marking of the pictures, multiple picture features are respectively converted into multiple corresponding second feature matrixes to serve as multiple second marks, position matching marking is conducted on the closed contour lines corresponding to the pictures, and the electronic files with the position matching marks are regarded as second files; respectively compressing the second archive and the picture, and respectively storing the second archive and the picture into a first space and a second space in a matched archive; The file characteristics comprise file basic information and security level, wherein the file basic information comprises file format and file size, and the types of the picture characteristics comprise picture size, resolution, color characteristics, texture characteristics and picture format; the method comprises the following steps of: Acquiring content information of an electronic file through semantic recognition, performing word segmentation on the content information, and screening out first sensitive words from the content information through a first preset sensitive word bank; Screening second sensitive words from the first sensitive words through a second preset sensitive word bank, and taking sentence contents connected with the second sensitive words as secret information; Calculating a security index SL according to the first sensitive word and the security information: k1 and k2 are respectively a first preset weight and a second preset weight, B1 and B2 are respectively the number of first sensitive words and secret information, B is the number of sensitive words in a first preset sensitive word stock, The method comprises the steps of presetting a secret coefficient for a second sensitive word corresponding to the ith secret information; And acquiring a preset index range of each security level, and taking the security level corresponding to the preset index range in which the security index falls as the security level of the electronic file.
- 2. The method for storing and managing electronic files according to claim 1, wherein the step of screening the first sensitive word through the first preset sensitive word bank further comprises the steps of analyzing and comparing the first sensitive word with each sensitive word in the first preset sensitive word bank one by one if a word which is not identical to each sensitive word in the first preset sensitive word bank exists in the content information, and taking the word as the first sensitive word if the synonym of the word is found; The screening of the second sensitive words from the first sensitive words through the second preset sensitive word bank further comprises the step of screening and obtaining synonym analysis comparison in the same mode as the screening of the first sensitive words.
- 3. The electronic archive storage management method of claim 1, wherein the matching archive for the electronic archive according to the archive characteristics specifically comprises: screening a first candidate archive with the security level identical to that of the electronic archive; Screening second candidate archives with format compatibility scores greater than or equal to a preset scoring threshold from the first candidate archives according to the archives formats of the electronic archives; Screening a third candidate archive in which the residual storage space is larger than the archive size of the electronic archive and the space difference is larger than or equal to a preset space threshold value from the second candidate archive; If a plurality of third candidate archives exist, the comprehensive matching value of each third candidate archives is calculated according to the format compatibility scores and the space difference values, and the third candidate archives with the largest comprehensive matching value are used as the archives matched with the electronic archives.
- 4. A method of managing electronic archive storage as claimed in claim 3, wherein said calculation of format compatibility scores is specifically: ; GS is the format compatibility score of the first candidate archive for the archive format of the electronic archive, k3 and k4 are a third preset weight and a fourth preset weight respectively, k3+k4=1, k3> k4, M is a complete matching coefficient, when the format compatibility vector of the first candidate archive contains the archive format of the electronic archive, M=1, otherwise M=0, C is a compatible conversion coefficient, when the first candidate archive supports the archive format of the electronic archive to be compatible through a preset conversion tool, C=1, otherwise C=0, T is a reliability coefficient of the conversion tool preset in the first candidate archive.
- 5. The method for managing electronic archive storage of claim 4, wherein the calculating of the comprehensive matching value specifically comprises: ; PS is the comprehensive matching of the third candidate archive, k5 and k6 are the fifth preset weight and the sixth preset weight, k5+k6=1, and kc is the spatial difference between the third candidate archive and the electronic archive.
- 6. The method for managing storage of electronic files according to claim 1, wherein the extracting pictures in the electronic files is specifically: And carrying out structural analysis according to the file format of the electronic file through a corresponding analysis library, directly extracting if the pictures in the electronic file are independent pictures, identifying the outline of the embedded picture through an edge detection algorithm if the pictures in the electronic file are embedded pictures, and dividing the picture area and the non-picture area according to the outline of the embedded picture to finish extraction.
- 7. An electronic archive storage management system employing the electronic archive storage management method of any one of claims 1 to 6, comprising: the matching module is used for acquiring the file characteristics of the electronic file and matching the file library for the electronic file according to the file characteristics; The picture extraction module is used for extracting pictures in the electronic file, reserving closed contour lines of the pictures at corresponding positions and acquiring various picture characteristics of the pictures; The marking module is used for converting the file features into first feature matrixes as first marks to carry out file matching marking on the pictures, converting various picture features into a plurality of corresponding second feature matrixes as a plurality of second marks respectively, carrying out position matching marking on the closed contour lines corresponding to the pictures, and taking the electronic files with the position matching marking as second files; and the storage module is used for respectively storing the second archive and the pictures into a first space and a second space in the matched archive after respectively carrying out corresponding compression.
- 8. A computer readable storage medium storing instructions which, when executed, perform the electronic archive storage management method of any one of claims 1 to 6.
Description
Electronic archive storage management method, system and storage medium Technical Field The present invention relates to the field of storage management technologies, and in particular, to a method, a system, and a storage medium for managing electronic archive storage. Background With the development of information technology and the long-term business development of each enterprise company or other units, the number of files is continuously increased, and paper files are difficult to store and inconvenient to retrieve, so that the convenience of storage and the retrieval efficiency are greatly improved through the storage and warehousing of electronic files, and therefore, the storage management of electronic sub-files is very important for each enterprise company or other units; However, the electronic file may contain a large number of pictures, the storage efficiency of the whole electronic file may be reduced, and the uniform compression method is directly adopted to compress and store the whole electronic file, which greatly increases the risk of compression failure or picture damage due to different formats of non-picture content and pictures in the electronic file. Disclosure of Invention The invention provides an electronic archive storage management method for solving the technical problems in the prior art, which comprises the following steps: Acquiring file characteristics of an electronic file, and matching a file library for the electronic file according to the file characteristics; Extracting pictures in the electronic file, reserving closed contour lines of the pictures at corresponding positions, and acquiring various picture features of the pictures; The file features are converted into first feature matrixes to serve as first marks for file matching marking of the pictures, multiple picture features are respectively converted into multiple corresponding second feature matrixes to serve as multiple second marks, position matching marking is conducted on the closed contour lines corresponding to the pictures, and the electronic files with the position matching marks are regarded as second files; And respectively compressing the second archive and the picture, and then respectively storing the second archive and the picture into a first space and a second space in the matched archive. Further, the archive features comprise archive base information and security levels, the archive base information comprises archive formats and archive sizes, and the types of the picture features comprise picture sizes, resolutions, color features, texture features and picture formats. Further, the security level of the electronic file is obtained by the following steps: Acquiring content information of an electronic file through semantic recognition, performing word segmentation on the content information, and screening out first sensitive words from the content information through a first preset sensitive word bank; Screening second sensitive words from the first sensitive words through a second preset sensitive word bank, and taking sentence contents connected with the second sensitive words as secret information; Calculating a security index SL according to the first sensitive word and the security information: k1 and k2 are respectively a first preset weight and a second preset weight, B1 and B2 are respectively the number of first sensitive words and secret information, B is the number of sensitive words in a first preset sensitive word stock, The method comprises the steps of presetting a secret coefficient for a second sensitive word corresponding to the ith secret information; And acquiring a preset index range of each security level, and taking the security level corresponding to the preset index range in which the security index falls as the security level of the electronic file. Further, the screening of the first sensitive words from the first preset sensitive word stock further comprises the steps of analyzing and comparing the sensitive words in the first preset sensitive word stock one by one if the content information contains words which are not identical to the sensitive words in the first preset sensitive word stock, and taking the words as the first sensitive words if the synonyms of the words are found; The screening of the second sensitive words from the first sensitive words through the second preset sensitive word bank further comprises the step of screening and obtaining synonym analysis comparison in the same mode as the screening of the first sensitive words. Further, the matching archive for the electronic archive according to the archive characteristics specifically includes: screening a first candidate archive with the security level identical to that of the electronic archive; Screening second candidate archives with format compatibility scores greater than or equal to a preset scoring threshold from the first candidate archives according to the archives formats of the electro