Picture for Ashish Vaswani

Ashish Vaswani

The Efficiency Misnomer

Add code
Oct 25, 2021
Figure 1 for The Efficiency Misnomer
Figure 2 for The Efficiency Misnomer
Figure 3 for The Efficiency Misnomer
Viaarxiv icon

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

Add code
Sep 22, 2021
Figure 1 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Figure 2 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Figure 3 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Figure 4 for Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Viaarxiv icon

Simple and Efficient ways to Improve REALM

Add code
Apr 18, 2021
Figure 1 for Simple and Efficient ways to Improve REALM
Figure 2 for Simple and Efficient ways to Improve REALM
Figure 3 for Simple and Efficient ways to Improve REALM
Figure 4 for Simple and Efficient ways to Improve REALM
Viaarxiv icon

Scaling Local Self-Attention for Parameter Efficient Visual Backbones

Add code
Mar 30, 2021
Figure 1 for Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Figure 2 for Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Figure 3 for Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Figure 4 for Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Viaarxiv icon

Bottleneck Transformers for Visual Recognition

Add code
Jan 27, 2021
Figure 1 for Bottleneck Transformers for Visual Recognition
Figure 2 for Bottleneck Transformers for Visual Recognition
Figure 3 for Bottleneck Transformers for Visual Recognition
Figure 4 for Bottleneck Transformers for Visual Recognition
Viaarxiv icon

Efficient Content-Based Sparse Attention with Routing Transformers

Add code
Mar 12, 2020
Figure 1 for Efficient Content-Based Sparse Attention with Routing Transformers
Figure 2 for Efficient Content-Based Sparse Attention with Routing Transformers
Figure 3 for Efficient Content-Based Sparse Attention with Routing Transformers
Figure 4 for Efficient Content-Based Sparse Attention with Routing Transformers
Viaarxiv icon

Stand-Alone Self-Attention in Vision Models

Add code
Jun 13, 2019
Figure 1 for Stand-Alone Self-Attention in Vision Models
Figure 2 for Stand-Alone Self-Attention in Vision Models
Figure 3 for Stand-Alone Self-Attention in Vision Models
Figure 4 for Stand-Alone Self-Attention in Vision Models
Viaarxiv icon

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

Add code
Jun 04, 2019
Figure 1 for Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Figure 2 for Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Figure 3 for Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Figure 4 for Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Viaarxiv icon

Attention Augmented Convolutional Networks

Add code
Apr 22, 2019
Figure 1 for Attention Augmented Convolutional Networks
Figure 2 for Attention Augmented Convolutional Networks
Figure 3 for Attention Augmented Convolutional Networks
Figure 4 for Attention Augmented Convolutional Networks
Viaarxiv icon

Mesh-TensorFlow: Deep Learning for Supercomputers

Add code
Nov 05, 2018
Figure 1 for Mesh-TensorFlow: Deep Learning for Supercomputers
Figure 2 for Mesh-TensorFlow: Deep Learning for Supercomputers
Figure 3 for Mesh-TensorFlow: Deep Learning for Supercomputers
Viaarxiv icon