Picture for Bhargava Gopireddy

Bhargava Gopireddy

Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training

Add code
Apr 08, 2025
Figure 1 for Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training
Figure 2 for Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training
Figure 3 for Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training
Figure 4 for Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training
Viaarxiv icon