Picture for Keyi Ding

Keyi Ding

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Add code
May 05, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon