Picture for Sanjiv Kumar

Sanjiv Kumar

Google Research

What do larger image classifiers memorise?

Add code
Oct 09, 2023
Viaarxiv icon

Functional Interpolation for Relative Positions Improves Long Context Transformers

Add code
Oct 06, 2023
Viaarxiv icon

Think before you speak: Training Language Models With Pause Tokens

Add code
Oct 03, 2023
Viaarxiv icon

SPEGTI: Structured Prediction for Efficient Generative Text-to-Image Models

Add code
Aug 14, 2023
Viaarxiv icon

When Does Confidence-Based Cascade Deferral Suffice?

Add code
Jul 06, 2023
Figure 1 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 2 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 3 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 4 for When Does Confidence-Based Cascade Deferral Suffice?
Viaarxiv icon

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

Add code
May 13, 2023
Viaarxiv icon

ResMem: Learn what you can and memorize the rest

Add code
Feb 03, 2023
Viaarxiv icon

Learning to reject meets OOD detection: Are all abstentions created equal?

Add code
Jan 31, 2023
Viaarxiv icon

On student-teacher deviations in distillation: does it pay to disobey?

Add code
Jan 30, 2023
Figure 1 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 2 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 3 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 4 for On student-teacher deviations in distillation: does it pay to disobey?
Viaarxiv icon

Leveraging Importance Weights in Subset Selection

Add code
Jan 28, 2023
Viaarxiv icon