Picture for Minghe Shen

Minghe Shen

Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges

Add code
Jan 28, 2026
Viaarxiv icon