CN-121996489-A - Hard disk fault prediction method based on multivariate data

CN121996489ACN 121996489 ACN121996489 ACN 121996489ACN-121996489-A

Abstract

The invention discloses a hard disk fault prediction method based on multi-element data, which comprises the steps of constructing a multi-level cache structure based on a high-frequency access data sequence, obtaining cache priority scores by combining access temperature changes and read-write error counts, forming optimized data distribution configuration based on the cache priority scores, transferring historical parameter sequences to a solid state disk cache layer, conducting encryption setting of sensitive information in the optimized data distribution configuration, recording audit logs to generate an access behavior tracking sequence, aggregating multiple data center gradient updating based on the access behavior tracking sequence to obtain global fault prediction parameters, adjusting cache dynamic rules based on the global fault prediction parameters to determine an optimized characteristic data acquisition path, triggering encryption circulation based on the characteristic data acquisition path to obtain updated sensitive protection configuration, and fusing the updated sensitive protection configuration to a distributed learning aggregation process to output hard disk fault prediction results.

Inventors

LING CHEN
ZHEN ZHICHAO
ZHANG HONGBIN
LV YANFEI
LI LU
FAN MINGJIE

Assignees

中国人民解放军军事科学院军事科学信息研究中心

Dates

Publication Date: 20260508
Application Date: 20260225

Claims (8)

1. A hard disk fault prediction method based on multivariate data is characterized by comprising the following steps: performing cluster analysis on acquisition records acquired from the storage equipment to obtain a high-frequency access data sequence; Constructing a multi-level cache structure based on the high-frequency access data sequence, and determining a cache priority score according to the data access temperature change and the read-write error count; migrating the historical parameter sequence to a solid state disk cache layer based on the cache priority score to obtain optimized data distribution configuration; Encrypting the sensitive information in the optimized data distribution configuration to obtain a desensitized multi-element data set; Setting hierarchical access rights for the desensitization multi-metadata set and recording an audit log to obtain an access behavior tracking sequence; based on the access behavior tracking sequence, a distributed learning method is adopted to aggregate gradient update to obtain a global fault prediction parameter; Adjusting a cache dynamic rule to determine an optimized characteristic data acquisition path based on the global fault prediction parameter; triggering encryption processing circulation based on the characteristic data acquisition path to obtain updated sensitive protection configuration; And fusing the updated sensitive protection configuration into a distributed learning aggregation process to obtain a hard disk failure prediction result.
2. The method for predicting hard disk failure based on multivariate data according to claim 1, wherein the process of obtaining the high frequency access data sequence based on the acquisition record comprises: Grouping the acquisition records by adopting a K-means clustering method to obtain data distribution characteristics; Constructing an overhead model based on the data distribution characteristics by adopting an access mode clustering method to obtain calculation overhead distribution; Generating a high-frequency sequence by adopting a sequence extraction method based on the calculation overhead distribution; Performing system attribute association based on the high-frequency sequence to obtain a fault early warning sequence; and determining the high-frequency access data sequence based on the fault early warning sequence.
3. The method of claim 1, wherein constructing a multi-level cache structure based on the high frequency access data sequence and determining a cache priority score comprises: Acquiring memory layer data based on the high-frequency access data sequence to construct a multi-level cache structure; determining a dynamic adjustment rule based on the multi-level cache structure, and distributing the access temperature change to a latest time window to obtain a temperature change distribution result; Acquiring read-write error counts based on the temperature change distribution result, and distributing the error counts exceeding a count threshold to a nearest time window to obtain error count distribution results; Determining a preliminary priority score based on the error count allocation result integrating access patterns in the access log; And adjusting the scoring weight based on the preliminary priority scoring combined with a preset error recovery mechanism to obtain a cache priority scoring.
4. The method for predicting hard disk failure based on multivariate data according to claim 1, wherein the process of encrypting the sensitive information in the optimized data distribution configuration to obtain the desensitized multivariate data set comprises: Identifying sensitive information from the optimized data distribution configuration based on preset sensitivity quantization analysis to obtain a sensitive data list; Encrypting the sensitive data list by adopting an AES encryption algorithm and a dynamic key generation method to obtain encryption processing output; Performing desensitization processing by adopting a mask substitution method based on the encryption processing output, and setting access authority control based on role allocation to obtain a preliminary desensitization set; And obtaining a final desensitization multi-metadata set by adopting a fit scale quasi-verification and data flow tracking method based on the preliminary desensitization set.
5. The method for predicting hard disk failure based on multivariate data according to claim 1, wherein the process of obtaining global failure prediction parameters by aggregating gradient updates based on the access behavior tracking sequence by using a distributed learning method comprises: extracting cooperative training requirements based on the access behavior tracking sequence; Transmitting local gradient update among a plurality of data centers based on the collaborative training demand and carrying out aggregation to obtain a preliminary global model parameter; Based on the preliminary global model parameters, processing the access behavior tracking sequence by adopting a convolutional neural network to obtain updated fault prediction parameters; acquiring training requirement cooperative information based on the updated fault prediction parameters; Adopting a long-short-term memory network to perform fusion processing on the training requirement cooperative information to obtain a global parameter adjustment value; Based on the global parameter adjustment value, adopting a data center aggregation mode, and judging aggregation stability by calculating a gradient deviation value to obtain an enhanced fault prediction parameter set; And extracting abnormal event pre-judging details based on the enhanced fault prediction parameter set to obtain global fault prediction parameters.
6. The method of claim 1, wherein the step of determining an optimized feature data acquisition path based on the global fault prediction parameter adjustment cache dynamic rule comprises: acquiring a parameter adjustment mechanism based on the global fault prediction parameter; Updating a dynamic adjustment rule in the multi-level cache structure based on the parameter adjustment mechanism, and monitoring cache data based on the adjusted rule to obtain a cache data consistency verification result; Executing a real-time optimization process based on the cache data consistency verification result, and analyzing fault parameters to obtain an optimized calculation basis; Determining a characteristic data extraction path based on the optimized calculation basis, and generating extracted data for the extraction value conforming to the optimized calculation basis; And determining the optimized characteristic data acquisition path by adopting the multi-level cache structure based on the extracted data.
7. The method of claim 1, wherein triggering an encryption processing cycle based on the characteristic data acquisition path to obtain an updated sensitive protection configuration comprises: acquiring a change index based on cloud scene load change monitoring; adjusting the characteristic data acquisition path based on the change index to obtain an adjusted path; Based on the adjusted path, triggering an encryption processing loop under the controlled frame, and processing sensitive data by adopting an iterative encryption method to obtain an encryption result; verifying and dynamically updating configuration based on the encryption result to obtain updated sensitive data; and adjusting the protection configuration based on the updated sensitive data to obtain the updated sensitive protection configuration.
8. The method for predicting hard disk failure based on multivariate data according to claim 1, wherein the process of fusing the update-sensitive protection configuration into a distributed learning aggregation process to obtain a hard disk failure prediction result comprises: extracting encryption parameters from the updated sensitive protection configuration, and constructing a sensitive fusion aggregation frame based on the encryption parameters; Processing data sharing among nodes under the sensitive fusion aggregation frame by adopting a federation learning algorithm to obtain distributed training parameters; Extracting hard disk vibration and temperature sensitive data characteristics based on the distributed training parameters; identifying a fault mode by adopting an abnormal mode matching method based on the sensitive data characteristics to obtain a preliminary fault index; obtaining an enhanced prediction model based on the preliminary fault index fusion inter-node sensitive audit service optimization real-time calculation process; and integrating data flow monitoring attributes based on the enhanced prediction model to calculate and obtain a hard disk failure prediction result.

Description

Hard disk fault prediction method based on multivariate data Technical Field The invention belongs to the technical field of hard disk fault prediction, and particularly relates to a hard disk fault prediction method based on multivariate data. Background Hard disk failure prediction is used as a key technology for guaranteeing data safety and system reliability, and is widely integrated into daily maintenance of various large-scale storage systems. With the rapid increase of data volume, particularly the high real-time and high-precision prediction demands of complex environments such as cloud platforms, data centers and the like on storage systems are increasingly urgent, hard disk fault prediction technology has gradually evolved from a traditional mode relying on static historical data to an intelligent analysis mode based on dynamic multivariate data (such as real-time monitoring indexes, access modes, hardware logs and the like). However, the existing prediction scheme focuses on a single data source or a single technical means, and although a certain effect can be achieved in a part of scenes, common problems of low data access efficiency, unreasonable resource allocation, insufficient sensitive protection and the like often exist when multi-source, heterogeneous and massive data are processed, and the comprehensive requirements of a modern distributed storage system on performance, safety and efficiency are difficult to adapt. When complex multi-element data is processed, the current hard disk fault prediction method mainly faces two technical bottlenecks, namely, the first problem of mismatching of data access efficiency and storage resource allocation. The existing system lacks a dynamic importance assessment and layering management mechanism for multi-source data, so that high-value and high-frequency accessed data cannot be rapidly scheduled to a high-speed storage layer, and low-value data occupies valuable memory or rapid storage resources. The resource mismatch not only reduces the real-time response capability of the prediction model, but also causes data access delay in a high-load scene, thereby missing the optimal window of fault early warning. Second, the data sensitive protection mechanism and the prediction process are subject to cracking. Most of the existing schemes have insufficient protection on sensitive information (such as user behavior logs, system configuration parameters and the like) in the data acquisition, transmission and aggregation processes, and especially lack of a mechanism for organically integrating encryption desensitization, access control and distributed learning in the cross-data center collaborative training process, so that the risk of data leakage exists in the prediction process, and increasingly strict data security compliance requirements are difficult to meet. Disclosure of Invention In order to solve the technical problems, the invention provides a hard disk fault prediction method based on multivariate data, so as to solve the problems in the prior art. In order to achieve the above object, the present invention provides a hard disk failure prediction method based on multivariate data, comprising: performing cluster analysis on acquisition records acquired from the storage equipment to obtain a high-frequency access data sequence; Constructing a multi-level cache structure based on the high-frequency access data sequence, and determining a cache priority score according to the data access temperature change and the read-write error count; migrating the historical parameter sequence to a solid state disk cache layer based on the cache priority score to obtain optimized data distribution configuration; Encrypting the sensitive information in the optimized data distribution configuration to obtain a desensitized multi-element data set; Setting hierarchical access rights for the desensitization multi-metadata set and recording an audit log to obtain an access behavior tracking sequence; based on the access behavior tracking sequence, a distributed learning method is adopted to aggregate gradient update to obtain a global fault prediction parameter; Adjusting a cache dynamic rule to determine an optimized characteristic data acquisition path based on the global fault prediction parameter; triggering encryption processing circulation based on the characteristic data acquisition path to obtain updated sensitive protection configuration; And fusing the updated sensitive protection configuration into a distributed learning aggregation process to obtain a hard disk failure prediction result. Optionally, the process of obtaining the high frequency access data sequence based on the acquisition record includes: Grouping the acquisition records by adopting a K-means clustering method to obtain data distribution characteristics; Constructing an overhead model based on the data distribution characteristics by adopting an access mode clustering method to obtain calcula