Picture for Zhuowen Tu

Zhuowen Tu

SemLayer: Semantic-aware Generative Segmentation and Layer Construction for Abstract Icons

Add code
Mar 25, 2026
Viaarxiv icon

SG-VLA: Learning Spatially-Grounded Vision-Language-Action Models for Mobile Manipulation

Add code
Mar 24, 2026
Viaarxiv icon

CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models

Add code
Mar 16, 2026
Viaarxiv icon

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Add code
Feb 26, 2026
Viaarxiv icon

Soft Tail-dropping for Adaptive Visual Tokenization

Add code
Jan 20, 2026
Viaarxiv icon

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Add code
Jan 08, 2026
Viaarxiv icon

CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning

Add code
Dec 09, 2025
Viaarxiv icon

Real Deep Research for AI, Robotics and Beyond

Add code
Oct 23, 2025
Figure 1 for Real Deep Research for AI, Robotics and Beyond
Figure 2 for Real Deep Research for AI, Robotics and Beyond
Figure 3 for Real Deep Research for AI, Robotics and Beyond
Figure 4 for Real Deep Research for AI, Robotics and Beyond
Viaarxiv icon

C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing

Add code
Oct 06, 2025
Viaarxiv icon

VideoNSA: Native Sparse Attention Scales Video Understanding

Add code
Oct 02, 2025
Figure 1 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 2 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 3 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 4 for VideoNSA: Native Sparse Attention Scales Video Understanding
Viaarxiv icon