Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Theoretical Analysis of Fine-tuning with Linear Teachers

Jul 04, 2021

Gal Shachaf, Alon Brutzkus, Amir Globerson

Figure 1 for A Theoretical Analysis of Fine-tuning with Linear Teachers

Figure 2 for A Theoretical Analysis of Fine-tuning with Linear Teachers

Figure 3 for A Theoretical Analysis of Fine-tuning with Linear Teachers

Share this with someone who'll enjoy it:

Abstract:Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. We analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the similarity between the source tasks and the target task, however measuring it is non trivial. We show that a relevant measure considers the relation between the source task, the target task and the covariance structure of the target data. In the setting of linear regression, we show that under realistic settings a substantial sample complexity reduction is plausible when the above measure is low. For deep linear regression, we present a novel result regarding the inductive bias of gradient-based training when the network is initialized with pretrained weights. Using this result we show that the similarity measure for this setting is also affected by the depth of the network. We further present results on shallow ReLU models, and analyze the dependence of sample complexity there on source and target tasks. We empirically demonstrate our results for both synthetic and realistic data.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:A Theoretical Analysis of Fine-tuning with Linear Teachers

Paper and Code