Alert button

CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

Nov 05, 2020
Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark C. Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu

Figure 1 for CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Figure 2 for CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Figure 3 for CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Figure 4 for CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: