Picture for Zheng Zhu

Zheng Zhu

Tencent, WeChat Pay

Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models

Add code
Jun 18, 2026
Viaarxiv icon

R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

Add code
Jun 15, 2026
Viaarxiv icon

ScoutVLA: UAV-Centric Active Perception via a Dual-Expert VLA Model for Open-World Embodied Question Answering

Add code
Jun 09, 2026
Viaarxiv icon

iMaC: Translating Actions into Motion and Contact Images for Embodied World Models

Add code
Jun 08, 2026
Viaarxiv icon

RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling

Add code
Jun 04, 2026
Viaarxiv icon

WAM-Nav: Asymmetric Latent World-Action Modeling for Unified Visual Navigation

Add code
Jun 03, 2026
Viaarxiv icon

SKIP: Sparse Keyframe Interpolation Paradigm for Efficient Embodied World Models

Add code
May 30, 2026
Viaarxiv icon

SAFE-Pruner: Semantic Attention-Guided Future-Aware Token Pruning for Efficient Vision-Language-Action Manipulation

Add code
May 28, 2026
Viaarxiv icon

StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

Add code
Apr 20, 2026
Viaarxiv icon

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment

Add code
Apr 12, 2026
Viaarxiv icon