Picture for Virginia Smith

Virginia Smith

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

Add code
May 26, 2026
Viaarxiv icon

Curriculum Learning for Safety Alignment

Add code
May 25, 2026
Viaarxiv icon

Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?

Add code
Apr 13, 2026
Viaarxiv icon

DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment

Add code
Mar 23, 2026
Viaarxiv icon

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Add code
Mar 12, 2026
Viaarxiv icon

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration

Add code
Jan 26, 2026
Viaarxiv icon

Research in Collaborative Learning Does Not Serve Cross-Silo Federated Learning in Practice

Add code
Oct 14, 2025
Viaarxiv icon

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Add code
Jun 10, 2025
Viaarxiv icon

Membership Inference Attacks for Unseen Classes

Add code
Jun 06, 2025
Viaarxiv icon

Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs

Add code
May 26, 2025
Viaarxiv icon