Alert button
Picture for Sanjeev Arora

Sanjeev Arora

Alert button

Do Transformers Parse while Predicting the Masked Word?

Add code
Bookmark button
Alert button
Mar 14, 2023
Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

Figure 1 for Do Transformers Parse while Predicting the Masked Word?
Figure 2 for Do Transformers Parse while Predicting the Masked Word?
Figure 3 for Do Transformers Parse while Predicting the Masked Word?
Figure 4 for Do Transformers Parse while Predicting the Masked Word?
Viaarxiv icon

Why (and When) does Local SGD Generalize Better than SGD?

Add code
Bookmark button
Alert button
Mar 09, 2023
Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora

Figure 1 for Why (and When) does Local SGD Generalize Better than SGD?
Figure 2 for Why (and When) does Local SGD Generalize Better than SGD?
Figure 3 for Why (and When) does Local SGD Generalize Better than SGD?
Figure 4 for Why (and When) does Local SGD Generalize Better than SGD?
Viaarxiv icon

Task-Specific Skill Localization in Fine-tuned Language Models

Add code
Bookmark button
Alert button
Feb 13, 2023
Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Figure 1 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 2 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 3 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 4 for Task-Specific Skill Localization in Fine-tuned Language Models
Viaarxiv icon

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

Add code
Bookmark button
Alert button
Nov 05, 2022
Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Figure 1 for New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
Figure 2 for New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
Figure 3 for New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
Figure 4 for New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
Viaarxiv icon

A Kernel-Based View of Language Model Fine-Tuning

Add code
Bookmark button
Alert button
Oct 11, 2022
Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

Figure 1 for A Kernel-Based View of Language Model Fine-Tuning
Figure 2 for A Kernel-Based View of Language Model Fine-Tuning
Figure 3 for A Kernel-Based View of Language Model Fine-Tuning
Figure 4 for A Kernel-Based View of Language Model Fine-Tuning
Viaarxiv icon

Understanding Influence Functions and Datamodels via Harmonic Analysis

Add code
Bookmark button
Alert button
Oct 03, 2022
Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

Figure 1 for Understanding Influence Functions and Datamodels via Harmonic Analysis
Figure 2 for Understanding Influence Functions and Datamodels via Harmonic Analysis
Figure 3 for Understanding Influence Functions and Datamodels via Harmonic Analysis
Figure 4 for Understanding Influence Functions and Datamodels via Harmonic Analysis
Viaarxiv icon

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

Add code
Bookmark button
Alert button
Jul 08, 2022
Zhiyuan Li, Tianhao Wang, JasonD. Lee, Sanjeev Arora

Figure 1 for Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Viaarxiv icon

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Add code
Bookmark button
Alert button
Jun 14, 2022
Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora

Figure 1 for Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Figure 2 for Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Figure 3 for Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Figure 4 for Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Viaarxiv icon

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Add code
Bookmark button
Alert button
May 20, 2022
Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Figure 1 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 2 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 3 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 4 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Viaarxiv icon