Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Depth Dependence of $μ$P Learning Rates in ReLU MLPs


May 13, 2023
Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

Add code


   Access Paper or Ask Questions

On student-teacher deviations in distillation: does it pay to disobey?


Jan 30, 2023
Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Add code


   Access Paper or Ask Questions

On the Adversarial Robustness of Mixture of Experts


Oct 19, 2022
Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli

Add code

* Accepted to NeurIPS 2022 

   Access Paper or Ask Questions

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers


Oct 12, 2022
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

Add code


   Access Paper or Ask Questions

Treeformer: Dense Gradient Trees for Efficient Attention Computation


Aug 18, 2022
Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, Prateek Jain

Add code

* Preprint. Under Review 

   Access Paper or Ask Questions

Robust Training of Neural Networks using Scale Invariant Architectures


Feb 02, 2022
Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar

Add code

* 36 pages, 7 figures 

   Access Paper or Ask Questions

Leveraging redundancy in attention with Reuse Transformers


Oct 13, 2021
Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Add code


   Access Paper or Ask Questions

Teacher's pet: understanding and mitigating biases in distillation


Jul 08, 2021
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Add code

* 21 pages, 8 figures 

   Access Paper or Ask Questions

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation


Jun 16, 2021
Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

Add code

* 14 pages 

   Access Paper or Ask Questions

1
2
3
4
>>