Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Picture for Noam Shazeer

Primer: Searching for Efficient Transformers for Language Modeling


Sep 17, 2021
David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

* "Primer: Searching for Efficient Transformers for Language Modeling" initial preprint. 35 pages 

  Access Paper or Ask Questions

GSPMD: General and Scalable Parallelization for ML Computation Graphs


May 10, 2021
Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Blake Hechtman, Yanping Huang, Rahul Joshi, Maxim Krikun, Dmitry Lepikhin, Andy Ly, Marcello Maggioni, Ruoming Pang, Noam Shazeer, Shibo Wang, Tao Wang, Yonghui Wu, Zhifeng Chen


  Access Paper or Ask Questions

Do Transformer Modifications Transfer Across Implementations and Applications?


Feb 23, 2021
Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel


  Access Paper or Ask Questions

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity


Jan 11, 2021
William Fedus, Barret Zoph, Noam Shazeer


  Access Paper or Ask Questions

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding


Jun 30, 2020
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen


  Access Paper or Ask Questions

Talking-Heads Attention


Mar 05, 2020
Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou


  Access Paper or Ask Questions

How Much Knowledge Can You Pack Into the Parameters of a Language Model?


Feb 24, 2020
Adam Roberts, Colin Raffel, Noam Shazeer


  Access Paper or Ask Questions

GLU Variants Improve Transformer


Feb 12, 2020
Noam Shazeer


  Access Paper or Ask Questions

Faster Transformer Decoding: N-gram Masked Self-Attention


Jan 14, 2020
Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer


  Access Paper or Ask Questions

Fast Transformer Decoding: One Write-Head is All You Need


Nov 06, 2019
Noam Shazeer


  Access Paper or Ask Questions

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer


Oct 24, 2019
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu


  Access Paper or Ask Questions

High Resolution Medical Image Analysis with Spatial Partitioning


Sep 12, 2019
Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song


  Access Paper or Ask Questions

Corpora Generation for Grammatical Error Correction


Apr 10, 2019
Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong

* Accepted at NAACL 2019. arXiv admin note: text overlap with arXiv:1811.01710 

  Access Paper or Ask Questions

Blockwise Parallel Decoding for Deep Autoregressive Models


Nov 07, 2018
Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

* NIPS 2018 

  Access Paper or Ask Questions

Mesh-TensorFlow: Deep Learning for Supercomputers


Nov 05, 2018
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman


  Access Paper or Ask Questions

Weakly Supervised Grammatical Error Correction using Iterative Decoding


Oct 31, 2018
Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar


  Access Paper or Ask Questions

Music Transformer


Oct 10, 2018
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

* Rewrote many sections to clarify the work, and extended relative attention to the local case. Previous title is "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" 

  Access Paper or Ask Questions

Image Transformer


Jun 15, 2018
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

* Appears in International Conference on Machine Learning, 2018. Code available at https://github.com/tensorflow/tensor2tensor 

  Access Paper or Ask Questions

Fast Decoding in Sequence Models using Discrete Latent Variables


Jun 07, 2018
Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

* ICML 2018 

  Access Paper or Ask Questions

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost


Apr 11, 2018
Noam Shazeer, Mitchell Stern


  Access Paper or Ask Questions

Tensor2Tensor for Neural Machine Translation


Mar 16, 2018
Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

* arXiv admin note: text overlap with arXiv:1706.03762 

  Access Paper or Ask Questions

Generating Wikipedia by Summarizing Long Sequences


Jan 30, 2018
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer

* Published as a conference paper at ICLR 2018 

  Access Paper or Ask Questions

Attention Is All You Need


Dec 06, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

* 15 pages, 5 figures 

  Access Paper or Ask Questions

One Model To Learn Them All


Jun 16, 2017
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit


  Access Paper or Ask Questions

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer


Jan 23, 2017
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean


  Access Paper or Ask Questions

NN-grams: Unifying neural network and n-gram language models for Speech Recognition


Jun 23, 2016
Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier

* To be published in the proceedings of INTERSPEECH 2016 

  Access Paper or Ask Questions

Exploring the Limits of Language Modeling


Feb 11, 2016
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu


  Access Paper or Ask Questions

Swivel: Improving Embeddings by Noticing What's Missing


Feb 06, 2016
Noam Shazeer, Ryan Doherty, Colin Evans, Chris Waterson

* 9 pages, 4 figures 

  Access Paper or Ask Questions