Picture for Zeyu Wang

Zeyu Wang

Do VLMs Perceive or Recall? Probing Visual Perception vs. Memory with Classic Visual Illusions

Add code
Jan 29, 2026
Viaarxiv icon

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Add code
Jan 21, 2026
Viaarxiv icon

ComfySearch: Autonomous Exploration and Reasoning for ComfyUI Workflows

Add code
Jan 07, 2026
Viaarxiv icon

IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition

Add code
Dec 31, 2025
Viaarxiv icon

Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction

Add code
Nov 18, 2025
Viaarxiv icon

Hi-Reco: High-Fidelity Real-Time Conversational Digital Humans

Add code
Nov 16, 2025
Viaarxiv icon

EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation

Add code
Nov 14, 2025
Figure 1 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Figure 2 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Figure 3 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Figure 4 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Viaarxiv icon

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Figure 1 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 2 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 3 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 4 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Viaarxiv icon

Resolving Ambiguity in Gaze-Facilitated Visual Assistant Interaction Paradigm

Add code
Sep 26, 2025
Viaarxiv icon

Follow-Your-Instruction: A Comprehensive MLLM Agent for World Data Synthesis

Add code
Aug 07, 2025
Viaarxiv icon