Picture for Hongli Zhou

Hongli Zhou

Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory

Add code
May 21, 2025
Viaarxiv icon

Think-J: Learning to Think for Generative LLM-as-a-Judge

Add code
May 20, 2025
Viaarxiv icon

Mitigating the Bias of Large Language Model Evaluation

Add code
Sep 25, 2024
Viaarxiv icon