Picture for Jiayi Xiang

Jiayi Xiang

SiMing-Bench: Evaluating Procedural Correctness from Continuous Interactions in Clinical Skill Videos

Add code
Apr 10, 2026
Viaarxiv icon

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

Add code
Oct 10, 2025
Viaarxiv icon