Picture for Wanrong Zhu

Wanrong Zhu

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

Add code
Jul 06, 2024
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Viaarxiv icon

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Add code
Apr 25, 2024
Figure 1 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Figure 2 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Figure 3 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Figure 4 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Viaarxiv icon

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

Add code
Apr 23, 2024
Figure 1 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 2 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 3 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 4 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Viaarxiv icon

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

Add code
Jan 17, 2024
Viaarxiv icon

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Add code
Nov 13, 2023
Viaarxiv icon

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Add code
Aug 12, 2023
Figure 1 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Figure 2 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Figure 3 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Figure 4 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Viaarxiv icon

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Add code
Aug 07, 2023
Figure 1 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 2 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 3 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 4 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Viaarxiv icon

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Add code
Jul 18, 2023
Viaarxiv icon

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Add code
Jul 12, 2023
Figure 1 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 2 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 3 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 4 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Viaarxiv icon