Picture for Wanrong Zhu

Wanrong Zhu

Text-Conditioned Background Generation for Editable Multi-Layer Documents

Add code
Dec 19, 2025
Viaarxiv icon

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive

Add code
Nov 14, 2025
Viaarxiv icon

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling

Add code
May 23, 2025
Viaarxiv icon

Towards Visual Text Grounding of Multimodal Large Language Model

Add code
Apr 07, 2025
Figure 1 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 2 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 3 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 4 for Towards Visual Text Grounding of Multimodal Large Language Model
Viaarxiv icon

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

Add code
Jul 06, 2024
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Figure 1 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Figure 2 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Figure 3 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Figure 4 for MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Viaarxiv icon

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Add code
Apr 25, 2024
Viaarxiv icon

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

Add code
Apr 23, 2024
Figure 1 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 2 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 3 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Figure 4 for Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Viaarxiv icon

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

Add code
Jan 17, 2024
Viaarxiv icon

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Add code
Nov 13, 2023
Figure 1 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 2 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 3 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 4 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Viaarxiv icon