Picture for Brian Jang

Brian Jang

Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction

Add code
Dec 16, 2025
Viaarxiv icon

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

Add code
Nov 10, 2025
Viaarxiv icon