Alert button
Picture for Ruoming Pang

Ruoming Pang

Alert button

Co-training Transformer with Videos and Images Improves Action Recognition

Dec 14, 2021
Bowen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha

Figure 1 for Co-training Transformer with Videos and Images Improves Action Recognition
Figure 2 for Co-training Transformer with Videos and Images Improves Action Recognition
Figure 3 for Co-training Transformer with Videos and Images Improves Action Recognition
Figure 4 for Co-training Transformer with Videos and Images Improves Action Recognition
Viaarxiv icon

Vector-quantized Image Modeling with Improved VQGAN

Oct 09, 2021
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

Figure 1 for Vector-quantized Image Modeling with Improved VQGAN
Figure 2 for Vector-quantized Image Modeling with Improved VQGAN
Figure 3 for Vector-quantized Image Modeling with Improved VQGAN
Figure 4 for Vector-quantized Image Modeling with Improved VQGAN
Viaarxiv icon

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Oct 01, 2021
Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu

Figure 1 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 2 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 3 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 4 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Viaarxiv icon

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Aug 07, 2021
Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

Figure 1 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Figure 2 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Figure 3 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Figure 4 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Viaarxiv icon

GSPMD: General and Scalable Parallelization for ML Computation Graphs

May 10, 2021
Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Blake Hechtman, Yanping Huang, Rahul Joshi, Maxim Krikun, Dmitry Lepikhin, Andy Ly, Marcello Maggioni, Ruoming Pang, Noam Shazeer, Shibo Wang, Tao Wang, Yonghui Wu, Zhifeng Chen

Figure 1 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 2 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 3 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Figure 4 for GSPMD: General and Scalable Parallelization for ML Computation Graphs
Viaarxiv icon

Scaling End-to-End Models for Large-Scale Multilingual ASR

Apr 30, 2021
Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma

Figure 1 for Scaling End-to-End Models for Large-Scale Multilingual ASR
Figure 2 for Scaling End-to-End Models for Large-Scale Multilingual ASR
Figure 3 for Scaling End-to-End Models for Large-Scale Multilingual ASR
Viaarxiv icon

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

Apr 25, 2021
Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao

Figure 1 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Figure 2 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Figure 3 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Figure 4 for Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Viaarxiv icon

Searching for Fast Model Families on Datacenter Accelerators

Feb 10, 2021
Sheng Li, Mingxing Tan, Ruoming Pang, Andrew Li, Liqun Cheng, Quoc Le, Norman P. Jouppi

Figure 1 for Searching for Fast Model Families on Datacenter Accelerators
Figure 2 for Searching for Fast Model Families on Datacenter Accelerators
Figure 3 for Searching for Fast Model Families on Datacenter Accelerators
Figure 4 for Searching for Fast Model Families on Datacenter Accelerators
Viaarxiv icon

Transformer Based Deliberation for Two-Pass Speech Recognition

Jan 27, 2021
Ke Hu, Ruoming Pang, Tara N. Sainath, Trevor Strohman

Figure 1 for Transformer Based Deliberation for Two-Pass Speech Recognition
Figure 2 for Transformer Based Deliberation for Two-Pass Speech Recognition
Figure 3 for Transformer Based Deliberation for Two-Pass Speech Recognition
Figure 4 for Transformer Based Deliberation for Two-Pass Speech Recognition
Viaarxiv icon

Cascaded encoders for unifying streaming and non-streaming ASR

Oct 27, 2020
Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman

Figure 1 for Cascaded encoders for unifying streaming and non-streaming ASR
Figure 2 for Cascaded encoders for unifying streaming and non-streaming ASR
Figure 3 for Cascaded encoders for unifying streaming and non-streaming ASR
Figure 4 for Cascaded encoders for unifying streaming and non-streaming ASR
Viaarxiv icon