Large scale data storage is susceptible to failure. As disks are damaged and replaced, traditional machine learning models, which rely on historical data to make predictions, struggle to accurately predict disk failures. This paper presents a novel method for predicting disk failures by leveraging multi-layer domain adaptive learning techniques. First, disk data with numerous faults is selected as the source domain, and disk data with fewer faults is selected as the target domain. A training of the feature extraction network is performed with the selected origin and destination domains. The contrast between the two domains facilitates the transfer of diagnostic knowledge from the domain of source and target. According to the experimental findings, it has been demonstrated that the proposed technique can generate a reliable prediction model and improve the ability to predict failures on disk data with few failure samples.
To address the problem of insufficient failure data generated by disks and the imbalance between the number of normal and failure data. The existing Conditional Tabular Generative Adversarial Networks (CTGAN) deep learning methods have been proven to be effective in solving imbalance disk failure data. But CTGAN cannot learn the internal information of disk failure data very well. In this paper, a fault diagnosis method based on improved CTGAN, a classifier for specific category discrimination is added and a discriminator generate adversarial network based on residual network is proposed. We named it Residual Conditional Tabular Generative Adversarial Networks (RCTGAN). Firstly, to enhance the stability of system a residual network is utilized. RCTGAN uses a small amount of real failure data to synthesize fake fault data; Then, the synthesized data is mixed with the real data to balance the amount of normal and failure data; Finally, four classifier (multilayer perceptron, support vector machine, decision tree, random forest) models are trained using the balanced data set, and the performance of the models is evaluated using G-mean. The experimental results show that the data synthesized by the RCTGAN can further improve the fault diagnosis accuracy of the classifier.