Picture for Róbert Csordás

Róbert Csordás

GIM: Improved Interpretability for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

Do Language Models Use Their Depth Efficiently?

Add code
May 20, 2025
Viaarxiv icon

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Add code
May 01, 2025
Viaarxiv icon

Measuring In-Context Computation Complexity via Hidden State Prediction

Add code
Mar 17, 2025
Viaarxiv icon

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Add code
Oct 28, 2024
Viaarxiv icon

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Add code
Aug 20, 2024
Figure 1 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 2 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 3 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 4 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Viaarxiv icon

MoEUT: Mixture-of-Experts Universal Transformers

Add code
May 25, 2024
Viaarxiv icon

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Add code
Dec 14, 2023
Figure 1 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Figure 2 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Figure 3 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Figure 4 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Viaarxiv icon

Automating Continual Learning

Add code
Dec 01, 2023
Viaarxiv icon

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

Add code
Oct 24, 2023
Viaarxiv icon