Alert button

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

Add code
Bookmark button
Alert button
Jan 30, 2024
Steffi Chern, Ethan Chern, Graham Neubig, Pengfei Liu

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: