Picture for Yuxiong He

Yuxiong He

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Add code
Aug 02, 2023
Figure 1 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 2 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 3 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 4 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Viaarxiv icon

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

Add code
Jul 20, 2023
Figure 1 for ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Figure 2 for ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Figure 3 for ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Figure 4 for ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Viaarxiv icon

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Add code
Jun 16, 2023
Figure 1 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 2 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 3 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 4 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Viaarxiv icon

Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?

Add code
May 16, 2023
Figure 1 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Figure 2 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Figure 3 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Figure 4 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Viaarxiv icon

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

Add code
May 03, 2023
Figure 1 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 2 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 3 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 4 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Viaarxiv icon

A Comprehensive Study on Post-Training Quantization for Large Language Models

Add code
Mar 16, 2023
Figure 1 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 2 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 3 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 4 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Viaarxiv icon

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Add code
Mar 15, 2023
Figure 1 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 2 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 3 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 4 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Viaarxiv icon

Scaling Vision-Language Models with Sparse Mixture of Experts

Add code
Mar 13, 2023
Figure 1 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 2 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 3 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 4 for Scaling Vision-Language Models with Sparse Mixture of Experts
Viaarxiv icon

A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training

Add code
Mar 11, 2023
Figure 1 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 2 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 3 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 4 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Viaarxiv icon

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Add code
Jan 27, 2023
Figure 1 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 2 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 3 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 4 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Viaarxiv icon