Picture for Yongan Yu

Yongan Yu

THiNK: Can Large Language Models Think-aloud?

Add code
May 26, 2025
Viaarxiv icon

WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models

Add code
May 26, 2025
Viaarxiv icon

From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models

Add code
May 17, 2025
Viaarxiv icon

CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation

Add code
Apr 30, 2025
Viaarxiv icon

MaintainCoder: Maintainable Code Generation Under Dynamic Requirements

Add code
Mar 31, 2025
Viaarxiv icon