Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Picture for Noam Shazeer

Primer: Searching for Efficient Transformers for Language Modeling

Sep 17, 2021
David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

* "Primer: Searching for Efficient Transformers for Language Modeling" initial preprint. 35 pages 

  Access Paper or Ask Questions

GSPMD: General and Scalable Parallelization for ML Computation Graphs

May 10, 2021
Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Blake Hechtman, Yanping Huang, Rahul Joshi, Maxim Krikun, Dmitry Lepikhin, Andy Ly, Marcello Maggioni, Ruoming Pang, Noam Shazeer, Shibo Wang, Tao Wang, Yonghui Wu, Zhifeng Chen

  Access Paper or Ask Questions

Do Transformer Modifications Transfer Across Implementations and Applications?

Feb 23, 2021
Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

  Access Paper or Ask Questions

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Jan 11, 2021
William Fedus, Barret Zoph, Noam Shazeer

  Access Paper or Ask Questions

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Jun 30, 2020
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

  Access Paper or Ask Questions

Talking-Heads Attention

Mar 05, 2020
Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou

  Access Paper or Ask Questions

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Feb 24, 2020
Adam Roberts, Colin Raffel, Noam Shazeer

  Access Paper or Ask Questions

GLU Variants Improve Transformer

Feb 12, 2020
Noam Shazeer

  Access Paper or Ask Questions

Faster Transformer Decoding: N-gram Masked Self-Attention

Jan 14, 2020
Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer

  Access Paper or Ask Questions

Fast Transformer Decoding: One Write-Head is All You Need

Nov 06, 2019
Noam Shazeer

  Access Paper or Ask Questions

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Oct 24, 2019
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

  Access Paper or Ask Questions

High Resolution Medical Image Analysis with Spatial Partitioning

Sep 12, 2019
Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song

  Access Paper or Ask Questions

Corpora Generation for Grammatical Error Correction

Apr 10, 2019
Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong

* Accepted at NAACL 2019. arXiv admin note: text overlap with arXiv:1811.01710 

  Access Paper or Ask Questions

Blockwise Parallel Decoding for Deep Autoregressive Models

Nov 07, 2018
Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

* NIPS 2018 

  Access Paper or Ask Questions

Mesh-TensorFlow: Deep Learning for Supercomputers

Nov 05, 2018
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

  Access Paper or Ask Questions

Weakly Supervised Grammatical Error Correction using Iterative Decoding

Oct 31, 2018
Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar

  Access Paper or Ask Questions

Music Transformer

Oct 10, 2018
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

* Rewrote many sections to clarify the work, and extended relative attention to the local case. Previous title is "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" 

  Access Paper or Ask Questions

Image Transformer

Jun 15, 2018
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

* Appears in International Conference on Machine Learning, 2018. Code available at 

  Access Paper or Ask Questions

Fast Decoding in Sequence Models using Discrete Latent Variables

Jun 07, 2018
Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

* ICML 2018 

  Access Paper or Ask Questions

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Apr 11, 2018
Noam Shazeer, Mitchell Stern

  Access Paper or Ask Questions

Tensor2Tensor for Neural Machine Translation

Mar 16, 2018
Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

* arXiv admin note: text overlap with arXiv:1706.03762 

  Access Paper or Ask Questions

Generating Wikipedia by Summarizing Long Sequences

Jan 30, 2018
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer

* Published as a conference paper at ICLR 2018 

  Access Paper or Ask Questions

Attention Is All You Need

Dec 06, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

* 15 pages, 5 figures 

  Access Paper or Ask Questions

One Model To Learn Them All

Jun 16, 2017
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

  Access Paper or Ask Questions

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Jan 23, 2017
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean

  Access Paper or Ask Questions

NN-grams: Unifying neural network and n-gram language models for Speech Recognition

Jun 23, 2016
Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier

* To be published in the proceedings of INTERSPEECH 2016 

  Access Paper or Ask Questions

Exploring the Limits of Language Modeling

Feb 11, 2016
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu

  Access Paper or Ask Questions

Swivel: Improving Embeddings by Noticing What's Missing

Feb 06, 2016
Noam Shazeer, Ryan Doherty, Colin Evans, Chris Waterson

* 9 pages, 4 figures 

  Access Paper or Ask Questions