Alert button
Picture for Zihang Dai

Zihang Dai

Alert button

Transformer Quality in Linear Time

Feb 21, 2022
Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le

Figure 1 for Transformer Quality in Linear Time
Figure 2 for Transformer Quality in Linear Time
Figure 3 for Transformer Quality in Linear Time
Figure 4 for Transformer Quality in Linear Time
Viaarxiv icon

Combined Scaling for Zero-shot Transfer Learning

Nov 19, 2021
Hieu Pham, Zihang Dai, Golnaz Ghiasi, Hanxiao Liu, Adams Wei Yu, Minh-Thang Luong, Mingxing Tan, Quoc V. Le

Figure 1 for Combined Scaling for Zero-shot Transfer Learning
Figure 2 for Combined Scaling for Zero-shot Transfer Learning
Figure 3 for Combined Scaling for Zero-shot Transfer Learning
Figure 4 for Combined Scaling for Zero-shot Transfer Learning
Viaarxiv icon

Primer: Searching for Efficient Transformers for Language Modeling

Sep 17, 2021
David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

Figure 1 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 2 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 3 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 4 for Primer: Searching for Efficient Transformers for Language Modeling
Viaarxiv icon

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Aug 24, 2021
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

Figure 1 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 2 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 3 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 4 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Viaarxiv icon

Combiner: Full Attention Transformer with Sparse Computation Cost

Jul 12, 2021
Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

Figure 1 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 2 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 3 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 4 for Combiner: Full Attention Transformer with Sparse Computation Cost
Viaarxiv icon

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Jun 09, 2021
Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

Figure 1 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 2 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 3 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 4 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Viaarxiv icon

Pay Attention to MLPs

Jun 01, 2021
Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

Figure 1 for Pay Attention to MLPs
Figure 2 for Pay Attention to MLPs
Figure 3 for Pay Attention to MLPs
Figure 4 for Pay Attention to MLPs
Viaarxiv icon

Unsupervised Parallel Corpus Mining on Web Data

Sep 18, 2020
Guokun Lai, Zihang Dai, Yiming Yang

Figure 1 for Unsupervised Parallel Corpus Mining on Web Data
Figure 2 for Unsupervised Parallel Corpus Mining on Web Data
Figure 3 for Unsupervised Parallel Corpus Mining on Web Data
Figure 4 for Unsupervised Parallel Corpus Mining on Web Data
Viaarxiv icon

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Jun 05, 2020
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

Figure 1 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 2 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 3 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 4 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Viaarxiv icon