Picture for Nicolas Baldwin

Nicolas Baldwin

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

Add code
Feb 09, 2026
Viaarxiv icon

Cmprsr: Abstractive Token-Level Question-Agnostic Prompt Compressor

Add code
Nov 15, 2025
Viaarxiv icon

Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks

Add code
Aug 26, 2025
Viaarxiv icon

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Add code
Jul 03, 2025
Viaarxiv icon