Alert button
Picture for Minjia Zhang

Minjia Zhang

Alert button

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

Add code
Bookmark button
Alert button
Jan 29, 2022
Minjia Zhang, Niranjan Uma Naresh, Yuxiong He

Figure 1 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 2 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 3 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 4 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Viaarxiv icon

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Add code
Bookmark button
Alert button
Jan 14, 2022
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He

Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities

Add code
Bookmark button
Alert button
Nov 28, 2021
Fuxun Yu, Di Wang, Longfei Shangguan, Minjia Zhang, Xulong Tang, Chenchen Liu, Xiang Chen

Figure 1 for A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities
Figure 2 for A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities
Figure 3 for A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities
Figure 4 for A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities
Viaarxiv icon

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Add code
Bookmark button
Alert button
Oct 28, 2021
Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Figure 1 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Figure 2 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Figure 3 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Figure 4 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Viaarxiv icon

Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning

Add code
Bookmark button
Alert button
Oct 15, 2021
Soobee Lee, Minindu Weerakoon, Jonghyun Choi, Minjia Zhang, Di Wang, Myeongjae Jeon

Figure 1 for Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning
Figure 2 for Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning
Figure 3 for Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning
Figure 4 for Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning
Viaarxiv icon

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training

Add code
Bookmark button
Alert button
Aug 13, 2021
Conglong Li, Minjia Zhang, Yuxiong He

Figure 1 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 2 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 3 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 4 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Viaarxiv icon

Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search

Add code
Bookmark button
Alert button
Jul 27, 2021
Dantong Zhu, Minjia Zhang

Figure 1 for Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search
Figure 2 for Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search
Figure 3 for Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search
Figure 4 for Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search
Viaarxiv icon

ZeRO-Offload: Democratizing Billion-Scale Model Training

Add code
Bookmark button
Alert button
Jan 18, 2021
Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He

Figure 1 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 2 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 3 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 4 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Viaarxiv icon

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Add code
Bookmark button
Alert button
Oct 26, 2020
Minjia Zhang, Yuxiong He

Figure 1 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Figure 2 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Figure 3 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Figure 4 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Viaarxiv icon

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

Add code
Bookmark button
Alert button
Nov 04, 2019
Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria Arnau, Antonio Gonzalez

Figure 1 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Figure 2 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Figure 3 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Figure 4 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Viaarxiv icon