Picture for Samy Jelassi

Samy Jelassi

DMA, CIMS

Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs

Add code
Dec 15, 2025
Viaarxiv icon

Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones

Add code
May 27, 2025
Figure 1 for Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Figure 2 for Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Figure 3 for Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Figure 4 for Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Viaarxiv icon

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Add code
Apr 10, 2025
Figure 1 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Figure 2 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Figure 3 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Figure 4 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Viaarxiv icon

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Add code
Apr 09, 2025
Figure 1 for To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Figure 2 for To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Figure 3 for To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Figure 4 for To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Viaarxiv icon

The Role of Sparsity for Length Generalization in Transformers

Add code
Feb 24, 2025
Figure 1 for The Role of Sparsity for Length Generalization in Transformers
Figure 2 for The Role of Sparsity for Length Generalization in Transformers
Figure 3 for The Role of Sparsity for Length Generalization in Transformers
Figure 4 for The Role of Sparsity for Length Generalization in Transformers
Viaarxiv icon

Collective Model Intelligence Requires Compatible Specialization

Add code
Nov 04, 2024
Figure 1 for Collective Model Intelligence Requires Compatible Specialization
Figure 2 for Collective Model Intelligence Requires Compatible Specialization
Figure 3 for Collective Model Intelligence Requires Compatible Specialization
Figure 4 for Collective Model Intelligence Requires Compatible Specialization
Viaarxiv icon

Mixture of Parrots: Experts improve memorization more than reasoning

Add code
Oct 24, 2024
Figure 1 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 2 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 3 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 4 for Mixture of Parrots: Experts improve memorization more than reasoning
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon

Universal Length Generalization with Turing Programs

Add code
Jul 03, 2024
Figure 1 for Universal Length Generalization with Turing Programs
Figure 2 for Universal Length Generalization with Turing Programs
Figure 3 for Universal Length Generalization with Turing Programs
Figure 4 for Universal Length Generalization with Turing Programs
Viaarxiv icon

How Does Overparameterization Affect Features?

Add code
Jul 01, 2024
Figure 1 for How Does Overparameterization Affect Features?
Figure 2 for How Does Overparameterization Affect Features?
Figure 3 for How Does Overparameterization Affect Features?
Figure 4 for How Does Overparameterization Affect Features?
Viaarxiv icon