Picture for Srinadh Bhojanapalli

Srinadh Bhojanapalli

Dj

Spark Transformer: Reactivating Sparsity in FFN and Attention

Add code
Jun 07, 2025
Figure 1 for Spark Transformer: Reactivating Sparsity in FFN and Attention
Figure 2 for Spark Transformer: Reactivating Sparsity in FFN and Attention
Figure 3 for Spark Transformer: Reactivating Sparsity in FFN and Attention
Figure 4 for Spark Transformer: Reactivating Sparsity in FFN and Attention
Viaarxiv icon

Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count

Add code
Oct 21, 2024
Viaarxiv icon

Mimetic Initialization Helps State Space Models Learn to Recall

Add code
Oct 14, 2024
Figure 1 for Mimetic Initialization Helps State Space Models Learn to Recall
Figure 2 for Mimetic Initialization Helps State Space Models Learn to Recall
Figure 3 for Mimetic Initialization Helps State Space Models Learn to Recall
Figure 4 for Mimetic Initialization Helps State Space Models Learn to Recall
Viaarxiv icon

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Add code
May 31, 2024
Figure 1 for Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
Figure 2 for Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
Figure 3 for Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
Figure 4 for Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
Viaarxiv icon

Efficient Language Model Architectures for Differentially Private Federated Learning

Add code
Mar 12, 2024
Figure 1 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 2 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 3 for Efficient Language Model Architectures for Differentially Private Federated Learning
Figure 4 for Efficient Language Model Architectures for Differentially Private Federated Learning
Viaarxiv icon

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

Add code
Feb 14, 2024
Figure 1 for HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference
Figure 2 for HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference
Figure 3 for HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference
Figure 4 for HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference
Viaarxiv icon

Efficacy of Dual-Encoders for Extreme Multi-Label Classification

Add code
Oct 16, 2023
Viaarxiv icon

Functional Interpolation for Relative Positions Improves Long Context Transformers

Add code
Oct 06, 2023
Viaarxiv icon

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

Add code
May 13, 2023
Viaarxiv icon

On student-teacher deviations in distillation: does it pay to disobey?

Add code
Jan 30, 2023
Figure 1 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 2 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 3 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 4 for On student-teacher deviations in distillation: does it pay to disobey?
Viaarxiv icon