Picture for Sarah Elshabrawy

Sarah Elshabrawy

Constructing Evaluation Datasets for Procedural Reasoning: Balancing Naturalness, Grounding, and Multi-Hop Coverage

Add code
Jun 11, 2026
Viaarxiv icon