Picture for Huayu Sha

Huayu Sha

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models

Add code
Aug 07, 2025
Viaarxiv icon

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Add code
Jun 04, 2025
Viaarxiv icon