Picture for Yi Tay

Yi Tay

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Add code
Jul 02, 2021
Figure 1 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Figure 2 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Figure 3 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Figure 4 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Viaarxiv icon

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

Add code
Jun 29, 2021
Figure 1 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Figure 2 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Figure 3 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Figure 4 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Viaarxiv icon

How Reliable are Model Diagnostics?

Add code
May 12, 2021
Figure 1 for How Reliable are Model Diagnostics?
Figure 2 for How Reliable are Model Diagnostics?
Figure 3 for How Reliable are Model Diagnostics?
Figure 4 for How Reliable are Model Diagnostics?
Viaarxiv icon

Are Pre-trained Convolutions Better than Pre-trained Transformers?

Add code
May 07, 2021
Figure 1 for Are Pre-trained Convolutions Better than Pre-trained Transformers?
Figure 2 for Are Pre-trained Convolutions Better than Pre-trained Transformers?
Figure 3 for Are Pre-trained Convolutions Better than Pre-trained Transformers?
Figure 4 for Are Pre-trained Convolutions Better than Pre-trained Transformers?
Viaarxiv icon

Rethinking Search: Making Experts out of Dilettantes

Add code
May 05, 2021
Figure 1 for Rethinking Search: Making Experts out of Dilettantes
Figure 2 for Rethinking Search: Making Experts out of Dilettantes
Figure 3 for Rethinking Search: Making Experts out of Dilettantes
Viaarxiv icon

OmniNet: Omnidirectional Representations from Transformers

Add code
Mar 01, 2021
Figure 1 for OmniNet: Omnidirectional Representations from Transformers
Figure 2 for OmniNet: Omnidirectional Representations from Transformers
Figure 3 for OmniNet: Omnidirectional Representations from Transformers
Figure 4 for OmniNet: Omnidirectional Representations from Transformers
Viaarxiv icon

Do Transformer Modifications Transfer Across Implementations and Applications?

Add code
Feb 23, 2021
Figure 1 for Do Transformer Modifications Transfer Across Implementations and Applications?
Figure 2 for Do Transformer Modifications Transfer Across Implementations and Applications?
Figure 3 for Do Transformer Modifications Transfer Across Implementations and Applications?
Viaarxiv icon

Switch Spaces: Learning Product Spaces with Sparse Gating

Add code
Feb 17, 2021
Figure 1 for Switch Spaces: Learning Product Spaces with Sparse Gating
Figure 2 for Switch Spaces: Learning Product Spaces with Sparse Gating
Figure 3 for Switch Spaces: Learning Product Spaces with Sparse Gating
Figure 4 for Switch Spaces: Learning Product Spaces with Sparse Gating
Viaarxiv icon

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters

Add code
Feb 17, 2021
Figure 1 for Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters
Figure 2 for Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters
Figure 3 for Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters
Figure 4 for Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters
Viaarxiv icon

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection

Add code
Feb 09, 2021
Figure 1 for Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection
Figure 2 for Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection
Figure 3 for Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection
Figure 4 for Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection
Viaarxiv icon