Alert button
Picture for Yuanzhong Xu

Yuanzhong Xu

Alert button

Vector-quantized Image Modeling with Improved VQGAN

Add code
Bookmark button
Alert button
Oct 09, 2021
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

Figure 1 for Vector-quantized Image Modeling with Improved VQGAN
Figure 2 for Vector-quantized Image Modeling with Improved VQGAN
Figure 3 for Vector-quantized Image Modeling with Improved VQGAN
Figure 4 for Vector-quantized Image Modeling with Improved VQGAN
Viaarxiv icon

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Oct 01, 2021
Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu

Figure 1 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 2 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 3 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 4 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Viaarxiv icon

GSPMD: General and Scalable Parallelization for ML Computation Graphs

Add code
Bookmark button
Alert button
May 10, 2021
Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Blake Hechtman, Yanping Huang, Rahul Joshi, Maxim Krikun, Dmitry Lepikhin, Andy Ly, Marcello Maggioni, Ruoming Pang, Noam Shazeer, Shibo Wang, Tao Wang, Yonghui Wu, Zhifeng Chen

Figure 1 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 2 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 3 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 4 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Viaarxiv icon

Exploring the limits of Concurrency in ML Training on Google TPUs

Add code
Bookmark button
Alert button
Nov 07, 2020
Sameer Kumar, James Bradbury, Cliff Young, Yu Emma Wang, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, Andy Swing

Figure 1 for Exploring the limits of Concurrency in ML Training on Google TPUs
Figure 2 for Exploring the limits of Concurrency in ML Training on Google TPUs
Figure 3 for Exploring the limits of Concurrency in ML Training on Google TPUs
Figure 4 for Exploring the limits of Concurrency in ML Training on Google TPUs
Viaarxiv icon

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Add code
Bookmark button
Alert button
Jun 30, 2020
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Figure 1 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 2 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 3 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 4 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Viaarxiv icon

Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training

Add code
Bookmark button
Alert button
Apr 28, 2020
Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Hongjun Choi, Blake Hechtman, Shibo Wang

Figure 1 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Figure 2 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Figure 3 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Figure 4 for Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Viaarxiv icon

Scale MLPerf-0.6 models on Google TPU-v3 Pods

Add code
Bookmark button
Alert button
Oct 02, 2019
Sameer Kumar, Victor Bitorff, Dehao Chen, Chiachen Chou, Blake Hechtman, HyoukJoong Lee, Naveen Kumar, Peter Mattson, Shibo Wang, Tao Wang, Yuanzhong Xu, Zongwei Zhou

Figure 1 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Figure 2 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Figure 3 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Figure 4 for Scale MLPerf-0.6 models on Google TPU-v3 Pods
Viaarxiv icon