Picture for Ankit Singh Rawat

Ankit Singh Rawat

Mechanics of Next Token Prediction with Self-Attention

Add code
Mar 12, 2024
Figure 1 for Mechanics of Next Token Prediction with Self-Attention
Figure 2 for Mechanics of Next Token Prediction with Self-Attention
Figure 3 for Mechanics of Next Token Prediction with Self-Attention
Figure 4 for Mechanics of Next Token Prediction with Self-Attention
Viaarxiv icon

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

Add code
Feb 21, 2024
Figure 1 for From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
Figure 2 for From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
Figure 3 for From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
Figure 4 for From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
Viaarxiv icon

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Add code
Oct 12, 2023
Figure 1 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 2 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 3 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 4 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Viaarxiv icon

What do larger image classifiers memorise?

Add code
Oct 09, 2023
Viaarxiv icon

Think before you speak: Training Language Models With Pause Tokens

Add code
Oct 03, 2023
Viaarxiv icon

When Does Confidence-Based Cascade Deferral Suffice?

Add code
Jul 06, 2023
Figure 1 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 2 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 3 for When Does Confidence-Based Cascade Deferral Suffice?
Figure 4 for When Does Confidence-Based Cascade Deferral Suffice?
Viaarxiv icon

On the Role of Attention in Prompt-tuning

Add code
Jun 06, 2023
Viaarxiv icon

ResMem: Learn what you can and memorize the rest

Add code
Feb 03, 2023
Viaarxiv icon

Supervision Complexity and its Role in Knowledge Distillation

Add code
Jan 28, 2023
Viaarxiv icon

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Add code
Jan 27, 2023
Figure 1 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 2 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 3 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Figure 4 for EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Viaarxiv icon