Picture for Zhizheng Zhang

Zhizheng Zhang

Southeast University, China

TrackVLA: Embodied Visual Tracking in the Wild

Add code
May 29, 2025
Viaarxiv icon

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

Add code
May 06, 2025
Viaarxiv icon

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat

Add code
Mar 03, 2025
Viaarxiv icon

FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real

Add code
Feb 25, 2025
Viaarxiv icon

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Add code
Feb 18, 2025
Viaarxiv icon

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Add code
Dec 09, 2024
Viaarxiv icon

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Add code
Dec 05, 2024
Figure 1 for Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Figure 2 for Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Figure 3 for Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Figure 4 for Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Viaarxiv icon

A General Theory for Compositional Generalization

Add code
May 20, 2024
Viaarxiv icon

Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Add code
May 13, 2024
Figure 1 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 2 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 3 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 4 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Viaarxiv icon

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

Add code
Mar 19, 2024
Figure 1 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 2 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 3 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 4 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Viaarxiv icon