Picture for Srinadh Bhojanapalli

Srinadh Bhojanapalli

Dj

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Add code
Oct 12, 2022
Figure 1 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 2 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 3 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Figure 4 for Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
Viaarxiv icon

Treeformer: Dense Gradient Trees for Efficient Attention Computation

Add code
Aug 18, 2022
Figure 1 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Figure 2 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Figure 3 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Figure 4 for Treeformer: Dense Gradient Trees for Efficient Attention Computation
Viaarxiv icon

Robust Training of Neural Networks using Scale Invariant Architectures

Add code
Feb 02, 2022
Figure 1 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 2 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 3 for Robust Training of Neural Networks using Scale Invariant Architectures
Figure 4 for Robust Training of Neural Networks using Scale Invariant Architectures
Viaarxiv icon

Leveraging redundancy in attention with Reuse Transformers

Add code
Oct 13, 2021
Figure 1 for Leveraging redundancy in attention with Reuse Transformers
Figure 2 for Leveraging redundancy in attention with Reuse Transformers
Figure 3 for Leveraging redundancy in attention with Reuse Transformers
Figure 4 for Leveraging redundancy in attention with Reuse Transformers
Viaarxiv icon

Teacher's pet: understanding and mitigating biases in distillation

Add code
Jul 08, 2021
Figure 1 for Teacher's pet: understanding and mitigating biases in distillation
Figure 2 for Teacher's pet: understanding and mitigating biases in distillation
Figure 3 for Teacher's pet: understanding and mitigating biases in distillation
Figure 4 for Teacher's pet: understanding and mitigating biases in distillation
Viaarxiv icon

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

Add code
Jun 16, 2021
Figure 1 for Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Figure 2 for Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Figure 3 for Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Figure 4 for Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Viaarxiv icon

Demystifying the Better Performance of Position Encoding Variants for Transformer

Add code
Apr 18, 2021
Figure 1 for Demystifying the Better Performance of Position Encoding Variants for Transformer
Figure 2 for Demystifying the Better Performance of Position Encoding Variants for Transformer
Figure 3 for Demystifying the Better Performance of Position Encoding Variants for Transformer
Figure 4 for Demystifying the Better Performance of Position Encoding Variants for Transformer
Viaarxiv icon

Understanding Robustness of Transformers for Image Classification

Add code
Mar 26, 2021
Figure 1 for Understanding Robustness of Transformers for Image Classification
Figure 2 for Understanding Robustness of Transformers for Image Classification
Figure 3 for Understanding Robustness of Transformers for Image Classification
Figure 4 for Understanding Robustness of Transformers for Image Classification
Viaarxiv icon

On the Reproducibility of Neural Network Predictions

Add code
Feb 05, 2021
Figure 1 for On the Reproducibility of Neural Network Predictions
Figure 2 for On the Reproducibility of Neural Network Predictions
Figure 3 for On the Reproducibility of Neural Network Predictions
Figure 4 for On the Reproducibility of Neural Network Predictions
Viaarxiv icon

Modifying Memories in Transformer Models

Add code
Dec 01, 2020
Figure 1 for Modifying Memories in Transformer Models
Figure 2 for Modifying Memories in Transformer Models
Figure 3 for Modifying Memories in Transformer Models
Viaarxiv icon