Picture for Mitchell A. Gordon

Mitchell A. Gordon

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

Add code
Mar 05, 2020
Figure 1 for Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Figure 2 for Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Figure 3 for Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Figure 4 for Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Viaarxiv icon

Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Add code
Feb 19, 2020
Figure 1 for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Figure 2 for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Figure 3 for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Figure 4 for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Viaarxiv icon

Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation

Add code
Dec 06, 2019
Figure 1 for Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Figure 2 for Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Figure 3 for Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Figure 4 for Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Viaarxiv icon