Picture for Sashank J. Reddi

Sashank J. Reddi

Differentially Private Adaptive Optimization with Delayed Preconditioners

Add code
Dec 01, 2022
Viaarxiv icon

On the Algorithmic Stability and Generalization of Adaptive Optimization Methods

Add code
Nov 08, 2022
Figure 1 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 2 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 3 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Figure 4 for On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Viaarxiv icon

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Add code
Oct 12, 2022
Figure 1 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 2 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 3 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 4 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Viaarxiv icon

Private Adaptive Optimization with Side Information

Add code
Feb 12, 2022
Figure 1 for Private Adaptive Optimization with Side Information
Figure 2 for Private Adaptive Optimization with Side Information
Figure 3 for Private Adaptive Optimization with Side Information
Figure 4 for Private Adaptive Optimization with Side Information
Viaarxiv icon

Robust Training of Neural Networks using Scale Invariant Architectures

Add code
Feb 02, 2022
Figure 1 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 2 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 3 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 4 for Robust Training of Neural Networks using Scale Invariant Architectures
Viaarxiv icon

A Field Guide to Federated Optimization

Add code
Jul 14, 2021
Figure 1 for A Field Guide to Federated Optimization
Figure 2 for A Field Guide to Federated Optimization
Figure 3 for A Field Guide to Federated Optimization
Figure 4 for A Field Guide to Federated Optimization
Viaarxiv icon

Distilling Double Descent

Add code
Feb 13, 2021
Figure 1 for Distilling Double Descent
Figure 2 for Distilling Double Descent
Figure 3 for Distilling Double Descent
Figure 4 for Distilling Double Descent
Viaarxiv icon

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

Add code
Aug 08, 2020
Figure 1 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Figure 2 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Figure 3 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Figure 4 for Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Viaarxiv icon

$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Add code
Jun 08, 2020
Figure 1 for $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Figure 2 for $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Viaarxiv icon

Why distillation helps: a statistical perspective

Add code
May 21, 2020
Figure 1 for Why distillation helps: a statistical perspective
Figure 2 for Why distillation helps: a statistical perspective
Figure 3 for Why distillation helps: a statistical perspective
Figure 4 for Why distillation helps: a statistical perspective
Viaarxiv icon