Picture for Shuze Daniel Liu

Shuze Daniel Liu

AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification

Add code
May 07, 2026
Viaarxiv icon

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

Add code
Apr 10, 2026
Viaarxiv icon

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics

Add code
Jan 30, 2026
Viaarxiv icon