Picture for Yilun Zhao

Yilun Zhao

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

Add code
Jul 01, 2025
Viaarxiv icon

SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Add code
Jun 18, 2025
Viaarxiv icon

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

Add code
Jun 13, 2025
Viaarxiv icon

SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing

Add code
Jun 05, 2025
Viaarxiv icon

VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos

Add code
May 29, 2025
Viaarxiv icon

Table-R1: Inference-Time Scaling for Table Reasoning

Add code
May 29, 2025
Viaarxiv icon

Judging with Many Minds: Do More Perspectives Mean Less Prejudice?

Add code
May 26, 2025
Viaarxiv icon

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Add code
May 21, 2025
Viaarxiv icon

Z1: Efficient Test-time Scaling with Code

Add code
Apr 01, 2025
Viaarxiv icon

MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search

Add code
Mar 26, 2025
Viaarxiv icon