Picture for Aditi Raghunathan

Aditi Raghunathan

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

Add code
May 12, 2026
Viaarxiv icon

Annotations Mitigate Post-Training Mode Collapse

Add code
May 11, 2026
Viaarxiv icon

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

Add code
May 04, 2026
Viaarxiv icon

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

Add code
Apr 19, 2026
Viaarxiv icon

Hodoscope: Unsupervised Monitoring for AI Misbehaviors

Add code
Apr 13, 2026
Viaarxiv icon

Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?

Add code
Apr 13, 2026
Viaarxiv icon

The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data

Add code
Mar 17, 2026
Viaarxiv icon

One-step Language Modeling via Continuous Denoising

Add code
Feb 18, 2026
Viaarxiv icon

S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations

Add code
Feb 16, 2026
Viaarxiv icon

Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs

Add code
Jul 31, 2025
Viaarxiv icon