Picture for Pan Lu

Pan Lu

VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

Add code
Jun 19, 2024
Viaarxiv icon

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Add code
Jun 13, 2024
Viaarxiv icon

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

Add code
May 30, 2024
Viaarxiv icon

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Add code
Mar 21, 2024
Figure 1 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 2 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 3 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 4 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Viaarxiv icon

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Add code
Feb 27, 2024
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

Model Editing Can Hurt General Abilities of Large Language Models

Add code
Jan 09, 2024
Viaarxiv icon

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Add code
Oct 03, 2023
Viaarxiv icon

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Add code
Jul 20, 2023
Figure 1 for SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Figure 2 for SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Figure 3 for SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Figure 4 for SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Viaarxiv icon

TheoremQA: A Theorem-driven Question Answering dataset

Add code
May 23, 2023
Figure 1 for TheoremQA: A Theorem-driven Question Answering dataset
Figure 2 for TheoremQA: A Theorem-driven Question Answering dataset
Figure 3 for TheoremQA: A Theorem-driven Question Answering dataset
Figure 4 for TheoremQA: A Theorem-driven Question Answering dataset
Viaarxiv icon