Picture for Keyi Ding

Keyi Ding

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Add code
May 05, 2025
Figure 1 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Figure 2 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Figure 3 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Figure 4 for FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon