MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

Add code
Dec 31, 2020
Figure 1 for MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Figure 2 for MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Figure 3 for MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Figure 4 for MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: