Picture for Razvan Pascanu

Razvan Pascanu

Google DeepMind

Transformers need glasses! Information over-squashing in language tasks

Add code
Jun 06, 2024
Figure 1 for Transformers need glasses! Information over-squashing in language tasks
Figure 2 for Transformers need glasses! Information over-squashing in language tasks
Figure 3 for Transformers need glasses! Information over-squashing in language tasks
Figure 4 for Transformers need glasses! Information over-squashing in language tasks
Viaarxiv icon

Deep Grokking: Would Deep Neural Networks Generalize Better?

Add code
May 29, 2024
Figure 1 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 2 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 3 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 4 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Viaarxiv icon

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

Add code
May 01, 2024
Viaarxiv icon

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Add code
Apr 11, 2024
Figure 1 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Figure 2 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Figure 3 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Figure 4 for RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Viaarxiv icon

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Add code
Mar 12, 2024
Figure 1 for Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Figure 2 for Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Figure 3 for Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Figure 4 for Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Viaarxiv icon

Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

Add code
Mar 03, 2024
Figure 1 for Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Figure 2 for Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Figure 3 for Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Figure 4 for Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Viaarxiv icon

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Add code
Feb 29, 2024
Viaarxiv icon

Disentangling the Causes of Plasticity Loss in Neural Networks

Add code
Feb 29, 2024
Viaarxiv icon

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

Add code
Feb 05, 2024
Figure 1 for Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
Figure 2 for Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
Figure 3 for Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
Figure 4 for Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem
Viaarxiv icon

Improving fine-grained understanding in image-text pre-training

Add code
Jan 18, 2024
Figure 1 for Improving fine-grained understanding in image-text pre-training
Figure 2 for Improving fine-grained understanding in image-text pre-training
Figure 3 for Improving fine-grained understanding in image-text pre-training
Figure 4 for Improving fine-grained understanding in image-text pre-training
Viaarxiv icon