Picture for Samyam Rajbhandari

Samyam Rajbhandari

ZeRO-Prefill: Zero Redundancy Overheads in MoE Prefill Serving

Add code
May 03, 2026
Viaarxiv icon

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Add code
Dec 16, 2025
Viaarxiv icon

Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI

Add code
Jul 16, 2025
Viaarxiv icon

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation

Add code
Oct 04, 2024
Viaarxiv icon

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Add code
Jan 09, 2024
Figure 1 for DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Figure 2 for DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Figure 3 for DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Figure 4 for DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Viaarxiv icon

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

Add code
Sep 29, 2023
Figure 1 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Figure 2 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Figure 3 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Figure 4 for DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Viaarxiv icon

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Add code
Sep 25, 2023
Figure 1 for DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Figure 2 for DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Figure 3 for DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Figure 4 for DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Viaarxiv icon

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Add code
Aug 02, 2023
Figure 1 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 2 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 3 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 4 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Viaarxiv icon

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Add code
Jun 16, 2023
Figure 1 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 2 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 3 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 4 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Viaarxiv icon

A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training

Add code
Mar 11, 2023
Figure 1 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 2 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 3 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 4 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Viaarxiv icon