Picture for Ziyu Zhu

Ziyu Zhu

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Add code
Jun 05, 2025
Viaarxiv icon

DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy

Add code
May 19, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

Add code
Apr 28, 2025
Viaarxiv icon

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Add code
Apr 01, 2025
Viaarxiv icon

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

Add code
Mar 12, 2025
Viaarxiv icon

On Domain-Specific Post-Training for Multimodal Large Language Models

Add code
Nov 29, 2024
Figure 1 for On Domain-Specific Post-Training for Multimodal Large Language Models
Figure 2 for On Domain-Specific Post-Training for Multimodal Large Language Models
Figure 3 for On Domain-Specific Post-Training for Multimodal Large Language Models
Figure 4 for On Domain-Specific Post-Training for Multimodal Large Language Models
Viaarxiv icon

GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation

Add code
Nov 02, 2024
Figure 1 for GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation
Figure 2 for GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation
Figure 3 for GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation
Figure 4 for GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation
Viaarxiv icon

Task-oriented Sequential Grounding in 3D Scenes

Add code
Aug 07, 2024
Figure 1 for Task-oriented Sequential Grounding in 3D Scenes
Figure 2 for Task-oriented Sequential Grounding in 3D Scenes
Figure 3 for Task-oriented Sequential Grounding in 3D Scenes
Figure 4 for Task-oriented Sequential Grounding in 3D Scenes
Viaarxiv icon

Unifying 3D Vision-Language Understanding via Promptable Queries

Add code
May 19, 2024
Figure 1 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 2 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 3 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 4 for Unifying 3D Vision-Language Understanding via Promptable Queries
Viaarxiv icon