Picture for Ethan Yu

Ethan Yu

Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks

Add code
Apr 14, 2026
Viaarxiv icon