Alert button
Picture for Yuxiong He

Yuxiong He

Alert button

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Add code
Bookmark button
Alert button
Jun 16, 2023
Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He

Figure 1 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 2 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 3 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Figure 4 for ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Viaarxiv icon

Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?

Add code
Bookmark button
Alert button
May 16, 2023
Pareesa Ameneh Golnari, Zhewei Yao, Yuxiong He

Figure 1 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Figure 2 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Figure 3 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Figure 4 for Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Viaarxiv icon

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

Add code
Bookmark button
Alert button
May 03, 2023
Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao

Figure 1 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 2 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 3 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Figure 4 for HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs
Viaarxiv icon

A Comprehensive Study on Post-Training Quantization for Large Language Models

Add code
Bookmark button
Alert button
Mar 16, 2023
Zhewei Yao, Cheng Li, Xiaoxia Wu, Stephen Youn, Yuxiong He

Figure 1 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 2 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 3 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 4 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Viaarxiv icon

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Add code
Bookmark button
Alert button
Mar 15, 2023
Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda

Figure 1 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 2 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 3 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 4 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Viaarxiv icon

Scaling Vision-Language Models with Sparse Mixture of Experts

Add code
Bookmark button
Alert button
Mar 13, 2023
Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He

Figure 1 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 2 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 3 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 4 for Scaling Vision-Language Models with Sparse Mixture of Experts
Viaarxiv icon

A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training

Add code
Bookmark button
Alert button
Mar 11, 2023
Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele

Figure 1 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 2 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 3 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Figure 4 for A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
Viaarxiv icon

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Add code
Bookmark button
Alert button
Jan 27, 2023
Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He

Figure 1 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 2 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 3 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 4 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Viaarxiv icon