CN-121979711-A - Method, equipment and medium for detecting damaged data
Abstract
The application relates to the technical field of data storage, in particular to a method, equipment and medium for detecting damaged data, wherein the method divides the stored data into a plurality of data blocks; the method comprises the steps of calculating a characteristic value of a data block based on a byte value of each data block, mapping the byte value of the data block into a pixel value of a gray image based on the characteristic value of the data block, scanning the gray image corresponding to stored data by utilizing a target detection model which is trained in advance to obtain a damaged area in the gray image, wherein the target detection model is trained based on damaged data corresponding to multiple file types and normal data corresponding to multiple file types, the damaged data corresponding to each file type has multiple damaged types which are marked in advance, the damaged area in the gray image corresponding to the stored data can be accurately identified, the damaged data in the stored data can be obtained based on the damaged area in the gray image, and accurate identification and positioning of the damaged data are realized.
Inventors
- Xing Cipai
- YIN YUE
- PENG JING
- XIONG LINBO
Assignees
- 深圳前海微众银行股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260104
Claims (10)
- 1. A method of detecting corrupted data, the method comprising: The method comprises the steps of dividing storage data into a plurality of data blocks, calculating characteristic values of the data blocks based on byte values of each data block, and mapping the byte values of the data blocks into pixel values of a gray image based on the characteristic values of the data blocks; The method comprises the steps of obtaining a gray level image corresponding to stored data by utilizing a target detection model which is trained in advance, and scanning the gray level image corresponding to the stored data to obtain a damaged area in the gray level image, wherein the target detection model is obtained by training based on damaged data corresponding to a plurality of file types and normal data corresponding to the plurality of file types, and the damaged data corresponding to each file type has a plurality of damage types which are marked in advance; And obtaining damaged data in the stored data based on the damaged area in the gray level image.
- 2. The method according to claim 1, wherein the method further comprises: And based on the position of the damaged data, determining health data corresponding to the damaged data in mirror image data of the stored data, and recovering the damaged data based on the health data.
- 3. The method of claim 1, wherein calculating the characteristic value of each data block based on the byte value of the data block comprises: Extracting multi-dimensional characteristics of byte values of any data block to obtain sub-characteristic values of the data block corresponding to each dimension, wherein the sub-characteristic values of any dimension are obtained by statistics on the byte values of the data block; and obtaining the characteristic value of the data block based on each sub-characteristic value and the corresponding weight.
- 4. The method of claim 3, wherein the multi-dimensional features include at least one of information entropy, variance, and maximum consecutive bits; the multi-dimensional feature extraction is performed on the byte value of any data block to obtain a sub-feature value of the data block corresponding to each dimension, including: For a byte value of any one data block, at least one of an information entropy value, a variance value, and a maximum consecutive bit value of the byte value is calculated.
- 5. The method of claim 1, wherein the training process of the object detection model comprises: dividing a plurality of data blocks from health data of different file types, and mapping byte values of each data block into pixel values of gray images to obtain gray images corresponding to the health data as positive samples, wherein the file types comprise at least one of a structured file, an executable file, a compressed file, a multimedia file and a text file; Based on the health data of different file types, injecting simulation damages of various damage types to obtain damage data of different file types under different damage types, dividing each damage data into a plurality of data blocks, mapping byte values of each data block into pixel values of gray images to obtain gray images corresponding to the damage data as negative samples, and taking damage areas of the damage data as labels; and training the initial target detection model based on the positive sample, the negative sample and the corresponding labels thereof to obtain the trained target detection model.
- 6. The method of claim 1, wherein prior to dividing the stored data into the plurality of data blocks, the method further comprises: and if the trigger condition is determined to be met, executing the step of dividing the stored data into a plurality of data blocks, wherein the trigger condition comprises at least one of the following: The file has a write operation and the timing detection task is started.
- 7. The method of any one of claims 1 to 6, wherein the plurality of damage types includes at least one of: Minimal granularity corruption, overlay corruption, data integrity corruption, hardware corruption.
- 8. The method according to any one of claims 1 to 6, wherein the obtaining the damage data in the stored data based on the damage region in the grayscale image includes: Obtaining a pixel value corresponding to a damaged area based on the position of the damaged area in the gray level image; and determining damaged data in the stored data corresponding to the pixel value.
- 9. An electronic device comprising a memory for storing program instructions; A processor for invoking program instructions stored in the memory and for performing the steps comprised in the method according to any of claims 1-8 in accordance with the obtained program instructions.
- 10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-8.
Description
Method, equipment and medium for detecting damaged data Technical Field The present application relates to the field of data storage technologies, and in particular, to a method, an apparatus, and a medium for detecting damaged data. Background The current common data corruption detection scheme is to verify data integrity primarily through a hash function or cyclic redundancy check. This method can detect if the stored data has changed, but locating a specific damaged location requires a manual intervention. The common data backup and recovery method is to backup data to an external storage medium periodically, and recover the data when the data is damaged. The recovery point target (Recovery Point Objective, RPO) and recovery time target (Recovery Time Objective, RTO) in this manner are limited by the backup frequency and efficiency of the recovery procedure. The recovery process often covers the whole file with a damaged file, and when the file size is large, the recovery speed is slow. Therefore, how to improve the efficiency and accuracy of the location and recovery of damaged data is a technical problem to be solved. Disclosure of Invention The application provides a method, equipment and medium for detecting damaged data, which are used for improving the detection efficiency and accuracy of the damaged data. In a first aspect, the present application provides a method for detecting corrupted data, the method comprising: The method comprises the steps of dividing storage data into a plurality of data blocks, calculating characteristic values of the data blocks based on byte values of each data block, and mapping the byte values of the data blocks into pixel values of a gray image based on the characteristic values of the data blocks; The method comprises the steps of obtaining a gray level image corresponding to stored data by utilizing a target detection model which is trained in advance, and scanning the gray level image corresponding to the stored data to obtain a damaged area in the gray level image, wherein the target detection model is obtained by training based on damaged data corresponding to a plurality of file types and normal data corresponding to the plurality of file types, and the damaged data corresponding to each file type has a plurality of damage types which are marked in advance; And obtaining damaged data in the stored data based on the damaged area in the gray level image. The method comprises the steps of dividing storage data into a plurality of data blocks, calculating characteristic values of the data blocks based on byte values of each data block, mapping the byte values of the data blocks into pixel values of gray images based on the characteristic values of the data blocks, scanning the gray images corresponding to the storage data by utilizing a target detection model which is trained in advance to obtain damaged areas in the gray images, wherein the target detection model is trained based on damaged data corresponding to a plurality of file types and normal data corresponding to a plurality of file types, each damaged data corresponding to a plurality of file types has a plurality of damaged types which are marked in advance, the damaged areas in the gray images corresponding to the storage data can be accurately identified, the damaged data in the storage data can be obtained based on the damaged areas in the gray images, and the method is used for accurately identifying and positioning the damaged data and is suitable for the damaged data corresponding to the plurality of file types. In one possible embodiment, the method further comprises: And based on the position of the damaged data, determining health data corresponding to the damaged data in mirror image data of the stored data, and recovering the damaged data based on the health data. By the position based on the damaged data, the healthy data corresponding to the damaged data in the mirror image data of the stored data can be accurately and efficiently determined, the damaged data is recovered based on the healthy data, active and near-real-time monitoring and repairing are realized, the target detection model can continuously or high-frequency scan the data, active, early discovery and rapid automatic repairing of the damaged data are realized, hysteresis of most of related technologies for recovering the damaged data is remarkably reduced, a time window exists between the damaged data and discovery, and the problems of increasing the risk of data loss are solved, and the efficiency of data recovery and the safety of the stored data are improved. In one possible implementation manner, the calculating the characteristic value of each data block based on the byte value of the data block includes: Extracting multi-dimensional characteristics of byte values of any data block to obtain sub-characteristic values of the data block corresponding to each dimension, wherein the sub-characteristic values of any dimension a