Picture for Percy Liang

Percy Liang

Shammie

MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline

Add code
Oct 08, 2025
Viaarxiv icon

Pre-training under infinite compute

Add code
Sep 18, 2025
Figure 1 for Pre-training under infinite compute
Figure 2 for Pre-training under infinite compute
Figure 3 for Pre-training under infinite compute
Figure 4 for Pre-training under infinite compute
Viaarxiv icon

UQ: Assessing Language Models on Unsolved Questions

Add code
Aug 25, 2025
Viaarxiv icon

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Add code
Jul 03, 2025
Viaarxiv icon

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Add code
May 26, 2025
Viaarxiv icon

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Add code
May 21, 2025
Viaarxiv icon

Extracting memorized pieces of (copyrighted) books from open-weight language models

Add code
May 18, 2025
Viaarxiv icon

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

Add code
May 12, 2025
Viaarxiv icon

Reliable and Efficient Amortized Model-based Evaluation

Add code
Mar 17, 2025
Viaarxiv icon

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Add code
Feb 27, 2025
Viaarxiv icon