Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Solvable Model for Inheriting the Regularization through Knowledge Distillation

Dec 02, 2020

Luca Saglietti, Lenka Zdeborová

Figure 1 for Solvable Model for Inheriting the Regularization through Knowledge Distillation

Figure 2 for Solvable Model for Inheriting the Regularization through Knowledge Distillation

Figure 3 for Solvable Model for Inheriting the Regularization through Knowledge Distillation

Figure 4 for Solvable Model for Inheriting the Regularization through Knowledge Distillation

Share this with someone who'll enjoy it:

Abstract:In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge distillation where a smaller neural network is trained using the outputs of a larger neural network is a particularly interesting case of transfer learning. In the present work, we introduce a statistical physics framework that allows an analytic characterization of the properties of knowledge distillation (KD) in shallow neural networks. Focusing the analysis on a solvable model that exhibits a non-trivial generalization gap, we investigate the effectiveness of KD. We are able to show that, through KD, the regularization properties of the larger teacher model can be inherited by the smaller student and that the yielded generalization performance is closely linked to and limited by the optimality of the teacher. Finally, we analyze the double descent phenomenology that can arise in the considered KD setting.

View paper on

Share this with someone who'll enjoy it:

Title:Solvable Model for Inheriting the Regularization through Knowledge Distillation

Paper and Code