Alert button
Picture for Noam Shazeer

Noam Shazeer

Alert button

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Add code
Bookmark button
Alert button
Jun 30, 2020
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Figure 1 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 2 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 3 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Figure 4 for GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Viaarxiv icon

Talking-Heads Attention

Add code
Bookmark button
Alert button
Mar 05, 2020
Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou

Figure 1 for Talking-Heads Attention
Figure 2 for Talking-Heads Attention
Figure 3 for Talking-Heads Attention
Figure 4 for Talking-Heads Attention
Viaarxiv icon

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Add code
Bookmark button
Alert button
Feb 24, 2020
Adam Roberts, Colin Raffel, Noam Shazeer

Figure 1 for How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Figure 2 for How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Viaarxiv icon

GLU Variants Improve Transformer

Add code
Bookmark button
Alert button
Feb 12, 2020
Noam Shazeer

Figure 1 for GLU Variants Improve Transformer
Figure 2 for GLU Variants Improve Transformer
Figure 3 for GLU Variants Improve Transformer
Viaarxiv icon

Faster Transformer Decoding: N-gram Masked Self-Attention

Add code
Bookmark button
Alert button
Jan 14, 2020
Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer

Figure 1 for Faster Transformer Decoding: N-gram Masked Self-Attention
Figure 2 for Faster Transformer Decoding: N-gram Masked Self-Attention
Viaarxiv icon

Fast Transformer Decoding: One Write-Head is All You Need

Add code
Bookmark button
Alert button
Nov 06, 2019
Noam Shazeer

Figure 1 for Fast Transformer Decoding: One Write-Head is All You Need
Figure 2 for Fast Transformer Decoding: One Write-Head is All You Need
Figure 3 for Fast Transformer Decoding: One Write-Head is All You Need
Viaarxiv icon

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Add code
Bookmark button
Alert button
Oct 24, 2019
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Figure 1 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Figure 2 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Figure 3 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Figure 4 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Viaarxiv icon

High Resolution Medical Image Analysis with Spatial Partitioning

Add code
Bookmark button
Alert button
Sep 12, 2019
Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song

Figure 1 for High Resolution Medical Image Analysis with Spatial Partitioning
Figure 2 for High Resolution Medical Image Analysis with Spatial Partitioning
Figure 3 for High Resolution Medical Image Analysis with Spatial Partitioning
Viaarxiv icon