Alert button
Picture for Madhura Pande

Madhura Pande

Alert button

On the Prunability of Attention Heads in Multilingual BERT

Add code
Bookmark button
Alert button
Sep 26, 2021
Aakriti Budhraja, Madhura Pande, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for On the Prunability of Attention Heads in Multilingual BERT
Figure 2 for On the Prunability of Attention Heads in Multilingual BERT
Figure 3 for On the Prunability of Attention Heads in Multilingual BERT
Figure 4 for On the Prunability of Attention Heads in Multilingual BERT
Viaarxiv icon

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT

Add code
Bookmark button
Alert button
Jan 22, 2021
Madhura Pande, Aakriti Budhraja, Preksha Nema, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Figure 2 for The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Figure 3 for The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Figure 4 for The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Viaarxiv icon

On the Importance of Local Information in Transformer Based Models

Add code
Bookmark button
Alert button
Aug 13, 2020
Madhura Pande, Aakriti Budhraja, Preksha Nema, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for On the Importance of Local Information in Transformer Based Models
Figure 2 for On the Importance of Local Information in Transformer Based Models
Figure 3 for On the Importance of Local Information in Transformer Based Models
Figure 4 for On the Importance of Local Information in Transformer Based Models
Viaarxiv icon