Picture for Weikai Chen

Weikai Chen

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns

Add code
Nov 16, 2025
Viaarxiv icon

AvatarTex: High-Fidelity Facial Texture Reconstruction from Single-Image Stylized Avatars

Add code
Nov 10, 2025
Viaarxiv icon

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

Add code
Nov 09, 2025
Viaarxiv icon

SPGen: Spherical Projection as Consistent and Flexible Representation for Single Image 3D Shape Generation

Add code
Sep 16, 2025
Viaarxiv icon

GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation

Add code
Apr 29, 2025
Viaarxiv icon

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Add code
Mar 14, 2025
Figure 1 for TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Figure 2 for TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Figure 3 for TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Figure 4 for TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Viaarxiv icon

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

Add code
Dec 12, 2024
Figure 1 for Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Figure 2 for Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Figure 3 for Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Figure 4 for Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Viaarxiv icon

GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details

Add code
Nov 05, 2024
Figure 1 for GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Figure 2 for GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Figure 3 for GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Figure 4 for GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Viaarxiv icon

MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection

Add code
Jul 31, 2024
Figure 1 for MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
Figure 2 for MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
Figure 3 for MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
Figure 4 for MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
Viaarxiv icon

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

Add code
Mar 26, 2024
Viaarxiv icon