Picture for Thomas Walshe

Thomas Walshe

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration

Add code
Jan 21, 2025
Viaarxiv icon