Get our free extension to see links to code for papers anywhere online!

 Add to Chrome

 Add to Firefox

CatalyzeX Code Finder - Browser extension linking code for ML papers across the web! | Product Hunt Embed
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Jun 30, 2020
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen


  Access Paper or Ask Questions

Talking-Heads Attention

Mar 05, 2020
Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou


  Access Paper or Ask Questions

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Feb 24, 2020
Adam Roberts, Colin Raffel, Noam Shazeer


  Access Paper or Ask Questions

GLU Variants Improve Transformer

Feb 12, 2020
Noam Shazeer


  Access Paper or Ask Questions

Faster Transformer Decoding: N-gram Masked Self-Attention

Jan 14, 2020
Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer


  Access Paper or Ask Questions

Fast Transformer Decoding: One Write-Head is All You Need

Nov 06, 2019
Noam Shazeer


  Access Paper or Ask Questions

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Oct 24, 2019
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu


  Access Paper or Ask Questions

High Resolution Medical Image Analysis with Spatial Partitioning

Sep 12, 2019
Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song


  Access Paper or Ask Questions

Corpora Generation for Grammatical Error Correction

Apr 10, 2019
Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong

* Accepted at NAACL 2019. arXiv admin note: text overlap with arXiv:1811.01710 

  Access Paper or Ask Questions

Blockwise Parallel Decoding for Deep Autoregressive Models

Nov 07, 2018
Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

* NIPS 2018 

  Access Paper or Ask Questions

Mesh-TensorFlow: Deep Learning for Supercomputers

Nov 05, 2018
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman


  Access Paper or Ask Questions

Weakly Supervised Grammatical Error Correction using Iterative Decoding

Oct 31, 2018
Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar


  Access Paper or Ask Questions

Music Transformer

Oct 10, 2018
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

* Rewrote many sections to clarify the work, and extended relative attention to the local case. Previous title is "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" 

  Access Paper or Ask Questions

Image Transformer

Jun 15, 2018
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Ɓukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

* Appears in International Conference on Machine Learning, 2018. Code available at https://github.com/tensorflow/tensor2tensor 

  Access Paper or Ask Questions

Fast Decoding in Sequence Models using Discrete Latent Variables

Jun 07, 2018
Ɓukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

* ICML 2018 

  Access Paper or Ask Questions

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Apr 11, 2018
Noam Shazeer, Mitchell Stern


  Access Paper or Ask Questions

Tensor2Tensor for Neural Machine Translation

Mar 16, 2018
Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Ɓukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

* arXiv admin note: text overlap with arXiv:1706.03762 

  Access Paper or Ask Questions

Generating Wikipedia by Summarizing Long Sequences

Jan 30, 2018
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer

* Published as a conference paper at ICLR 2018 

  Access Paper or Ask Questions

Attention Is All You Need

Dec 06, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

* 15 pages, 5 figures 

  Access Paper or Ask Questions

One Model To Learn Them All

Jun 16, 2017
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit


  Access Paper or Ask Questions

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Jan 23, 2017
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean


  Access Paper or Ask Questions

NN-grams: Unifying neural network and n-gram language models for Speech Recognition

Jun 23, 2016
Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier

* To be published in the proceedings of INTERSPEECH 2016 

  Access Paper or Ask Questions

Exploring the Limits of Language Modeling

Feb 11, 2016
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu


  Access Paper or Ask Questions

Swivel: Improving Embeddings by Noticing What's Missing

Feb 06, 2016
Noam Shazeer, Ryan Doherty, Colin Evans, Chris Waterson

* 9 pages, 4 figures 

  Access Paper or Ask Questions

End-to-End Text-Dependent Speaker Verification

Sep 27, 2015
Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer

* submitted to ICASSP 2016 

  Access Paper or Ask Questions

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Sep 23, 2015
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer


  Access Paper or Ask Questions

Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation

Jun 26, 2015
Noam Shazeer, Joris Pelemans, Ciprian Chelba


  Access Paper or Ask Questions

Variational Program Inference

Jun 04, 2010
Georges Harik, Noam Shazeer


  Access Paper or Ask Questions