Picture for Aditya Krishna Menon

Aditya Krishna Menon

Data61/CSIRO and the Australian National University

Think before you speak: Training Language Models With Pause Tokens

Add code
Oct 03, 2023
Figure 1 for Think before you speak: Training Language Models With Pause Tokens
Figure 2 for Think before you speak: Training Language Models With Pause Tokens
Figure 3 for Think before you speak: Training Language Models With Pause Tokens
Figure 4 for Think before you speak: Training Language Models With Pause Tokens
Viaarxiv icon

The importance of feature preprocessing for differentially private linear optimization

Add code
Jul 19, 2023
Figure 1 for The importance of feature preprocessing for differentially private linear optimization
Figure 2 for The importance of feature preprocessing for differentially private linear optimization
Viaarxiv icon

When Does Confidence-Based Cascade Deferral Suffice?

Add code
Jul 06, 2023
Figure 1 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 2 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 3 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 4 for When Does Confidence-Based Cascade Deferral Suffice?
Viaarxiv icon

ResMem: Learn what you can and memorize the rest

Add code
Feb 03, 2023
Figure 1 for ResMem: Learn what you can and memorize the rest
Figure 2 for ResMem: Learn what you can and memorize the rest
Figure 3 for ResMem: Learn what you can and memorize the rest
Figure 4 for ResMem: Learn what you can and memorize the rest
Viaarxiv icon

Learning to reject meets OOD detection: Are all abstentions created equal?

Add code
Jan 31, 2023
Viaarxiv icon

On student-teacher deviations in distillation: does it pay to disobey?

Add code
Jan 30, 2023
Figure 1 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 2 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 3 for On student-teacher deviations in distillation: does it pay to disobey?
Figure 4 for On student-teacher deviations in distillation: does it pay to disobey?
Viaarxiv icon

Supervision Complexity and its Role in Knowledge Distillation

Add code
Jan 28, 2023
Viaarxiv icon

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Add code
Jan 27, 2023
Figure 1 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 2 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 3 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 4 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Viaarxiv icon

When does mixup promote local linearity in learned representations?

Add code
Oct 28, 2022
Figure 1 for When does mixup promote local linearity in learned representations?
Figure 2 for When does mixup promote local linearity in learned representations?
Figure 3 for When does mixup promote local linearity in learned representations?
Figure 4 for When does mixup promote local linearity in learned representations?
Viaarxiv icon

Robust Distillation for Worst-class Performance

Add code
Jun 13, 2022
Figure 1 for Robust Distillation for Worst-class Performance
Figure 2 for Robust Distillation for Worst-class Performance
Figure 3 for Robust Distillation for Worst-class Performance
Figure 4 for Robust Distillation for Worst-class Performance
Viaarxiv icon