Picture for Johannes Mols

Johannes Mols

EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

Add code
Feb 13, 2025
Figure 1 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Figure 2 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Figure 3 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Figure 4 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Viaarxiv icon

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs

Add code
Jan 29, 2025
Viaarxiv icon