Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Knowledge Distillation: Bad Models Can Be Good Role Models

Mar 28, 2022

Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

Figure 1 for Knowledge Distillation: Bad Models Can Be Good Role Models

Figure 2 for Knowledge Distillation: Bad Models Can Be Good Role Models

Share this with someone who'll enjoy it:

Abstract:Large neural networks trained in the overparameterized regime are able to fit noise to zero train error. Recent work \citep{nakkiran2020distributional} has empirically observed that such networks behave as "conditional samplers" from the noisy distribution. That is, they replicate the noise in the train data to unseen examples. We give a theoretical framework for studying this conditional sampling behavior in the context of learning theory. We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data. We show that samplers, while being bad classifiers, can be good teachers. Concretely, we prove that distillation from samplers is guaranteed to produce a student which approximates the Bayes optimal classifier. Finally, we show that some common learning algorithms (e.g., Nearest-Neighbours and Kernel Machines) can generate samplers when applied in the overparameterized regime.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Knowledge Distillation: Bad Models Can Be Good Role Models

Paper and Code