Picture for Pan Lu

Pan Lu

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Add code
Oct 10, 2024
Viaarxiv icon

VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

Add code
Jun 19, 2024
Viaarxiv icon

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Add code
Jun 13, 2024
Viaarxiv icon

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

Add code
May 30, 2024
Viaarxiv icon

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Add code
Mar 21, 2024
Viaarxiv icon

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Add code
Feb 27, 2024
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

Model Editing Can Hurt General Abilities of Large Language Models

Add code
Jan 09, 2024
Viaarxiv icon

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Add code
Oct 03, 2023
Viaarxiv icon

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Add code
Jul 20, 2023
Viaarxiv icon