Picture for Huanyu Zhang

Huanyu Zhang

PEARL: Personalized Streaming Video Understanding Model

Add code
Mar 20, 2026
Viaarxiv icon

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

Add code
Mar 09, 2026
Viaarxiv icon

GEBench: Benchmarking Image Generation Models as GUI Environments

Add code
Feb 09, 2026
Viaarxiv icon

How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing

Add code
Feb 02, 2026
Viaarxiv icon

Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

Add code
Jan 28, 2026
Viaarxiv icon

BaseReward: A Strong Baseline for Multimodal Reward Model

Add code
Sep 19, 2025
Figure 1 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 2 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 3 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 4 for BaseReward: A Strong Baseline for Multimodal Reward Model
Viaarxiv icon

11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis

Add code
Aug 27, 2025
Viaarxiv icon

Memory-Efficient Differentially Private Training with Gradient Random Projection

Add code
Jun 18, 2025
Viaarxiv icon

A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

Add code
Apr 21, 2025
Viaarxiv icon

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

Add code
Jan 13, 2025
Figure 1 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Figure 2 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Figure 3 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Figure 4 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Viaarxiv icon