Alert button
Picture for Ruoming Pang

Ruoming Pang

Alert button

Instruction-Following Speech Recognition

Sep 18, 2023
Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Figure 1 for Instruction-Following Speech Recognition
Figure 2 for Instruction-Following Speech Recognition
Figure 3 for Instruction-Following Speech Recognition
Figure 4 for Instruction-Following Speech Recognition
Viaarxiv icon

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Sep 08, 2023
Erik Daxberger, Floris Weers, Bowen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

Figure 1 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 2 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 3 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Figure 4 for Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Viaarxiv icon

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Mar 31, 2023
Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

Figure 1 for Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Figure 2 for Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Figure 3 for Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Figure 4 for Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Viaarxiv icon

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

Feb 08, 2023
Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang

Figure 1 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 2 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 3 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Figure 4 for STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Viaarxiv icon

A Language Agnostic Multilingual Streaming On-Device ASR System

Aug 29, 2022
Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani

Figure 1 for A Language Agnostic Multilingual Streaming On-Device ASR System
Figure 2 for A Language Agnostic Multilingual Streaming On-Device ASR System
Figure 3 for A Language Agnostic Multilingual Streaming On-Device ASR System
Figure 4 for A Language Agnostic Multilingual Streaming On-Device ASR System
Viaarxiv icon

Pathways: Asynchronous Distributed Dataflow for ML

Mar 23, 2022
Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, Yonghui Wu

Figure 1 for Pathways: Asynchronous Distributed Dataflow for ML
Figure 2 for Pathways: Asynchronous Distributed Dataflow for ML
Figure 3 for Pathways: Asynchronous Distributed Dataflow for ML
Figure 4 for Pathways: Asynchronous Distributed Dataflow for ML
Viaarxiv icon

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Mar 09, 2022
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor Strohman, Shankar Kumar

Figure 1 for Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition
Figure 2 for Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition
Figure 3 for Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition
Figure 4 for Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition
Viaarxiv icon

Co-training Transformer with Videos and Images Improves Action Recognition

Dec 14, 2021
Bowen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha

Figure 1 for Co-training Transformer with Videos and Images Improves Action Recognition
Figure 2 for Co-training Transformer with Videos and Images Improves Action Recognition
Figure 3 for Co-training Transformer with Videos and Images Improves Action Recognition
Figure 4 for Co-training Transformer with Videos and Images Improves Action Recognition
Viaarxiv icon

Vector-quantized Image Modeling with Improved VQGAN

Oct 09, 2021
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

Figure 1 for Vector-quantized Image Modeling with Improved VQGAN
Figure 2 for Vector-quantized Image Modeling with Improved VQGAN
Figure 3 for Vector-quantized Image Modeling with Improved VQGAN
Figure 4 for Vector-quantized Image Modeling with Improved VQGAN
Viaarxiv icon