Picture for Jiajun Shi

Jiajun Shi

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Viaarxiv icon

P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

Add code
May 21, 2025
Viaarxiv icon

CryptoX : Compositional Reasoning Evaluation of Large Language Models

Add code
Feb 08, 2025
Viaarxiv icon

MdEval: Massively Multilingual Code Debugging

Add code
Nov 04, 2024
Figure 1 for MdEval: Massively Multilingual Code Debugging
Figure 2 for MdEval: Massively Multilingual Code Debugging
Figure 3 for MdEval: Massively Multilingual Code Debugging
Figure 4 for MdEval: Massively Multilingual Code Debugging
Viaarxiv icon