Picture for Noam Shazeer

Noam Shazeer

Dima

Talking-Heads Attention

Add code
Mar 05, 2020
Figure 1 for Talking-Heads Attention
Figure 2 for Talking-Heads Attention
Figure 3 for Talking-Heads Attention
Figure 4 for Talking-Heads Attention
Viaarxiv icon

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Add code
Feb 24, 2020
Figure 1 for How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Figure 2 for How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Viaarxiv icon

GLU Variants Improve Transformer

Add code
Feb 12, 2020
Figure 1 for GLU Variants Improve Transformer
Figure 2 for GLU Variants Improve Transformer
Figure 3 for GLU Variants Improve Transformer
Viaarxiv icon

Faster Transformer Decoding: N-gram Masked Self-Attention

Add code
Jan 14, 2020
Figure 1 for Faster Transformer Decoding: N-gram Masked Self-Attention
Figure 2 for Faster Transformer Decoding: N-gram Masked Self-Attention
Viaarxiv icon

Fast Transformer Decoding: One Write-Head is All You Need

Add code
Nov 06, 2019
Figure 1 for Fast Transformer Decoding: One Write-Head is All You Need
Figure 2 for Fast Transformer Decoding: One Write-Head is All You Need
Figure 3 for Fast Transformer Decoding: One Write-Head is All You Need
Viaarxiv icon

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Add code
Oct 24, 2019
Figure 1 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Figure 2 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Figure 3 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Figure 4 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Viaarxiv icon

High Resolution Medical Image Analysis with Spatial Partitioning

Add code
Sep 12, 2019
Figure 1 for High Resolution Medical Image Analysis with Spatial Partitioning
Figure 2 for High Resolution Medical Image Analysis with Spatial Partitioning
Figure 3 for High Resolution Medical Image Analysis with Spatial Partitioning
Viaarxiv icon

Corpora Generation for Grammatical Error Correction

Add code
Apr 10, 2019
Figure 1 for Corpora Generation for Grammatical Error Correction
Figure 2 for Corpora Generation for Grammatical Error Correction
Figure 3 for Corpora Generation for Grammatical Error Correction
Figure 4 for Corpora Generation for Grammatical Error Correction
Viaarxiv icon

Blockwise Parallel Decoding for Deep Autoregressive Models

Add code
Nov 07, 2018
Figure 1 for Blockwise Parallel Decoding for Deep Autoregressive Models
Figure 2 for Blockwise Parallel Decoding for Deep Autoregressive Models
Figure 3 for Blockwise Parallel Decoding for Deep Autoregressive Models
Figure 4 for Blockwise Parallel Decoding for Deep Autoregressive Models
Viaarxiv icon

Mesh-TensorFlow: Deep Learning for Supercomputers

Add code
Nov 05, 2018
Figure 1 for Mesh-TensorFlow: Deep Learning for Supercomputers
Figure 2 for Mesh-TensorFlow: Deep Learning for Supercomputers
Figure 3 for Mesh-TensorFlow: Deep Learning for Supercomputers
Viaarxiv icon