磁盘故障预测思路和论文。把目前相关的磁盘故障预测论文翻了一遍,总结一下如下预测思路。

磁盘故障预测思路

一些磁盘故障预测的 papers,包括场景:

  1. disk anomaly detection
  2. disk failure prediction
  3. disk rul prediction

磁盘故障预测方法可以如下分类:

从场景角度:

  1. disk anomaly detection (磁盘异常检测)
  2. disk failure prediction (磁盘故障预测)
  3. disk rul prediciton (磁盘剩余寿命预测)

从技术角度看:

  1. 统计规则、人工经验总结规则、SMART
  2. 机器学习、人工特征工程、多实例学习
  3. 深度学习(自动特征工程) 、人工特征工程、采样策略

从数据标注的角度:

  1. 无标注(unsupervised anomaly detection)
  2. 二分类(健康、故障)
  3. 多分类(健康等级)
  4. 序列标注(per time step per state,考虑到故障等级之间状态转移的约束条件,如故障的磁盘不能转化为健康状态)
  5. 回归(剩余寿命)

从数据使用角度:

  1. I/O相关数据,如 I/O Latency
  2. SMART
  3. I/O Latency + SMART

链接

SMART监控项研究以及存储健康分级机制

服务器硬盘故障预测实践

What SMART Stats Tell Us About Hard Drives

磁盘故障预测 papers

[1] AIOps_Real-World_Challenges_and_Research_Innovations.pdf

[2] A_Two-Step_Parametric_Method_for_Failure_Prediction_in_Hard_Disk_Drives.pdf

[3] A_combined_Bayesian_network_method_for_predicting_drive_failure_times_from_SMART_attributes.pdf

[4] An_approach_to_failure_prediction_in_a_cloud_based_environment.pdf

[5] Anomaly_detection_using_SMART_indicators_for_hard_disk_drive_failure_prediction.pdf

[6] BaNHFaP_A_Bayesian_Network_based_Failure_Prediction_Approach_for_Hard_Disk_Drives.pdf.pdf

[7] Bayesian_Network_based_Failure_Prediction_for_Hard_Drives.pdf

[8] Bayesian_approaches_to_failure_prediction_for_disk_drives.pdf

[9] Big_Data_Preventive_Maintenance_for_Hard_Disk_Failure_Detection.pdf

[10] Characterizing_Disk_Failures_with_Quantified_Disk_Degradation_Signatures_An_Early_Experience.pdf

[11] DFPE_Explaining_Predictive_Models_for_Disk_Failure_Prediction.pdf

[12] Disk_Failure_Prediction_in_Data_Centers_via_Online_Learning.pdf

[13] Failure_Trends_in_a_Large_Disk_Drive_Population.pdf

[14] Fast_Predictive_Repair_in_Erasure-Coded_Storage.pdf

[15] Finding_soon-to-fail_disks_in_a_haystack.pdf

[16] Hard_Drive_Failure_Prediction_Using_Classification_and_Regression_Trees.pdf

[17] Hard_Drive_Failure_Prediction_for_Large_Scale_Storage_System.pdf

[18] Hidden_semi-Markov_Models_for_Predictive_Maintenance_of_Rotating_Elements.pdf

[19] Improving_Storage_System_Reliability_with_Proactive_Error_Prediction.pdf

[20] LSTM-based_Encoder-Decoder_for_Multi-sensor_Anomaly_Detection.pdf

[21] Large_Scale_Predictive_Analytics_for_Hard_Disk_Remaining_Useful_Life_Estimation.pdf

[22] Machine_Learning_Methods_for_Predicting_Failures_in_Hard_Drives_A_Multiple-Instance_Application.pdf

[23] Machine_Learning_and_Failure_Prediction_in_Hard_Disk_Drives.pdf

[24] Mechanisms_for_Integrated_Feature_Normalization_and_Remaining_Useful_Life_Estimation_Using_LSTMs_Applied_to_Hard-Disks.pdf

[25] Online_Anomaly_Detection_for_Hard_Disk_Drives_Based_on_Mahalanobis_Distance.pdf

[26] Online_Failure_Prediction_in_Cloud_Datacenters.pdf

[27] Overview_of_Remaining_Useful_Life_Prediction_Techniques_in_Through-Life_Engineering_Service.pdf

[28] Predicting_Disk_Replacement_towards_Reliable_Data.pdf

[29] Predicting_disk_failures_with_HMM-_and_HSMM-based_approaches.pdf

[30] Proactive_Drive_Failure_Prediction_for_Large_Scale_Storage_Systems.pdf

[31] Significance_of_Disk_Failure_Prediction_in_Datacenters.pdf

[32] Transfer_Learning_based_Failure_Prediction_for_Minority_Disks_in_Large_Data_Centers_of_Heterogeneous_Disk_Systems.pdf

[33] Understanding_Latency_and_Response_Time_Behavior.pdf

[34] redictive_Models_of_Hard_Drive_Failures_based_on_Operational_Data.pdf

[35] 故障预测技术研究综述.pdf

Github Repo,论文下载地址:disk-prediction-papers