Alert button
Picture for Sashank J. Reddi

Sashank J. Reddi

Alert button

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

May 13, 2023
Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

Viaarxiv icon

Differentially Private Adaptive Optimization with Delayed Preconditioners

Dec 01, 2022
Tian Li, Manzil Zaheer, Ken Ziyu Liu, Sashank J. Reddi, H. Brendan McMahan, Virginia Smith

Figure 1 for Differentially Private Adaptive Optimization with Delayed Preconditioners
Figure 2 for Differentially Private Adaptive Optimization with Delayed Preconditioners
Figure 3 for Differentially Private Adaptive Optimization with Delayed Preconditioners
Figure 4 for Differentially Private Adaptive Optimization with Delayed Preconditioners
Viaarxiv icon

On the Algorithmic Stability and Generalization of Adaptive Optimization Methods

Nov 08, 2022
Han Nguyen, Hai Pham, Sashank J. Reddi, Barnabás Póczos

Figure 1 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 2 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 3 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 4 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Viaarxiv icon

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Oct 12, 2022
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

Figure 1 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 2 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 3 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 4 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Viaarxiv icon

Private Adaptive Optimization with Side Information

Feb 12, 2022
Tian Li, Manzil Zaheer, Sashank J. Reddi, Virginia Smith

Figure 1 for Private Adaptive Optimization with Side Information
Figure 2 for Private Adaptive Optimization with Side Information
Figure 3 for Private Adaptive Optimization with Side Information
Figure 4 for Private Adaptive Optimization with Side Information
Viaarxiv icon

Robust Training of Neural Networks using Scale Invariant Architectures

Feb 02, 2022
Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar

Figure 1 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 2 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 3 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 4 for Robust Training of Neural Networks using Scale Invariant Architectures
Viaarxiv icon

A Field Guide to Federated Optimization

Jul 14, 2021
Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu

Figure 1 for A Field Guide to Federated Optimization
Figure 2 for A Field Guide to Federated Optimization
Figure 3 for A Field Guide to Federated Optimization
Figure 4 for A Field Guide to Federated Optimization
Viaarxiv icon

Distilling Double Descent

Feb 13, 2021
Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

Figure 1 for Distilling Double Descent
Figure 2 for Distilling Double Descent
Figure 3 for Distilling Double Descent
Figure 4 for Distilling Double Descent
Viaarxiv icon

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

Aug 08, 2020
Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Figure 1 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Figure 2 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Figure 3 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Figure 4 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Viaarxiv icon

$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Jun 08, 2020
Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Figure 1 for $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Figure 2 for $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Viaarxiv icon