Picture for Srinadh Bhojanapalli

Srinadh Bhojanapalli

Dj

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Add code
May 31, 2024
Viaarxiv icon

Efficient Language Model Architectures for Differentially Private Federated Learning

Add code
Mar 12, 2024
Figure 1 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 2 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 3 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 4 for Efficient Language Model Architectures for Differentially Private Federated Learning
Viaarxiv icon

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

Add code
Feb 14, 2024
Viaarxiv icon

Efficacy of Dual-Encoders for Extreme Multi-Label Classification

Add code
Oct 16, 2023
Figure 1 for Efficacy of Dual-Encoders for Extreme Multi-Label Classification
Figure 2 for Efficacy of Dual-Encoders for Extreme Multi-Label Classification
Figure 3 for Efficacy of Dual-Encoders for Extreme Multi-Label Classification
Figure 4 for Efficacy of Dual-Encoders for Extreme Multi-Label Classification
Viaarxiv icon

Functional Interpolation for Relative Positions Improves Long Context Transformers

Add code
Oct 06, 2023
Viaarxiv icon

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

Add code
May 13, 2023
Viaarxiv icon

On student-teacher deviations in distillation: does it pay to disobey?

Add code
Jan 30, 2023
Figure 1 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 2 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 3 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 4 for On student-teacher deviations in distillation: does it pay to disobey?
Viaarxiv icon

On the Adversarial Robustness of Mixture of Experts

Add code
Oct 19, 2022
Figure 1 for On the Adversarial Robustness of Mixture of Experts
Figure 2 for On the Adversarial Robustness of Mixture of Experts
Figure 3 for On the Adversarial Robustness of Mixture of Experts
Figure 4 for On the Adversarial Robustness of Mixture of Experts
Viaarxiv icon

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Add code
Oct 12, 2022
Figure 1 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 2 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 3 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 4 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Viaarxiv icon

Treeformer: Dense Gradient Trees for Efficient Attention Computation

Add code
Aug 18, 2022
Figure 1 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Figure 2 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Figure 3 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Figure 4 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Viaarxiv icon