Picture for Wanrong Zhu

Wanrong Zhu

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling

Add code
May 23, 2025
Viaarxiv icon

Towards Visual Text Grounding of Multimodal Large Language Model

Add code
Apr 07, 2025
Viaarxiv icon

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

Add code
Jul 06, 2024
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Figure 1 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Figure 2 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Figure 3 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Figure 4 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Viaarxiv icon

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Add code
Apr 25, 2024
Viaarxiv icon

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

Add code
Apr 23, 2024
Figure 1 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 2 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 3 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 4 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Viaarxiv icon

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

Add code
Jan 17, 2024
Viaarxiv icon

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Add code
Nov 13, 2023
Figure 1 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 2 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 3 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 4 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Viaarxiv icon

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Add code
Aug 12, 2023
Viaarxiv icon

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Add code
Aug 07, 2023
Figure 1 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 2 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 3 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 4 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Viaarxiv icon