Picture for Frederic Sala

Frederic Sala

RIFT: A RubrIc Failure Mode Taxonomy and Automated Diagnostics

Add code
Apr 01, 2026
Viaarxiv icon

Test-Time Scaling Makes Overtraining Compute-Optimal

Add code
Apr 01, 2026
Viaarxiv icon

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

Add code
Mar 25, 2026
Viaarxiv icon

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

Add code
Mar 10, 2026
Viaarxiv icon

Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models

Add code
Mar 09, 2026
Viaarxiv icon

Weight Updates as Activation Shifts: A Principled Framework for Steering

Add code
Feb 28, 2026
Viaarxiv icon

SkillOrchestra: Learning to Route Agents via Skill Transfer

Add code
Feb 23, 2026
Viaarxiv icon

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Add code
Oct 16, 2025
Viaarxiv icon

Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset

Add code
Jun 25, 2025
Viaarxiv icon

Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation

Add code
Jun 12, 2025
Viaarxiv icon