Alert button
Picture for Abhishek Panigrahi

Abhishek Panigrahi

Alert button

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Bookmark button
Alert button
Feb 08, 2024
Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

Viaarxiv icon

Trainable Transformer in Transformer

Add code
Bookmark button
Alert button
Jul 03, 2023
Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

Viaarxiv icon

Do Transformers Parse while Predicting the Masked Word?

Add code
Bookmark button
Alert button
Mar 14, 2023
Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

Figure 1 for Do Transformers Parse while Predicting the Masked Word?
Figure 2 for Do Transformers Parse while Predicting the Masked Word?
Figure 3 for Do Transformers Parse while Predicting the Masked Word?
Figure 4 for Do Transformers Parse while Predicting the Masked Word?
Viaarxiv icon

Task-Specific Skill Localization in Fine-tuned Language Models

Add code
Bookmark button
Alert button
Feb 13, 2023
Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Figure 1 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 2 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 3 for Task-Specific Skill Localization in Fine-tuned Language Models
Figure 4 for Task-Specific Skill Localization in Fine-tuned Language Models
Viaarxiv icon

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Add code
Bookmark button
Alert button
May 20, 2022
Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Figure 1 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 2 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 3 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Figure 4 for On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Viaarxiv icon

Understanding Gradient Descent on Edge of Stability in Deep Learning

Add code
Bookmark button
Alert button
May 19, 2022
Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi

Figure 1 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Figure 2 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Figure 3 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Figure 4 for Understanding Gradient Descent on Edge of Stability in Deep Learning
Viaarxiv icon

Learning and Generalization in RNNs

Add code
Bookmark button
Alert button
May 31, 2021
Abhishek Panigrahi, Navin Goyal

Figure 1 for Learning and Generalization in RNNs
Figure 2 for Learning and Generalization in RNNs
Viaarxiv icon

Non-Gaussianity of Stochastic Gradient Noise

Add code
Bookmark button
Alert button
Oct 25, 2019
Abhishek Panigrahi, Raghav Somani, Navin Goyal, Praneeth Netrapalli

Figure 1 for Non-Gaussianity of Stochastic Gradient Noise
Figure 2 for Non-Gaussianity of Stochastic Gradient Noise
Figure 3 for Non-Gaussianity of Stochastic Gradient Noise
Figure 4 for Non-Gaussianity of Stochastic Gradient Noise
Viaarxiv icon

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Add code
Bookmark button
Alert button
Aug 16, 2019
Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

Figure 1 for Effect of Activation Functions on the Training of Overparametrized Neural Nets
Figure 2 for Effect of Activation Functions on the Training of Overparametrized Neural Nets
Viaarxiv icon