Picture for Wenxuan Song

Wenxuan Song

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

Add code
Mar 27, 2026
Viaarxiv icon

Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance

Add code
Mar 27, 2026
Viaarxiv icon

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

Add code
Mar 27, 2026
Viaarxiv icon

VAMPO: Policy Optimization for Improving Visual Dynamics in Video Action Models

Add code
Mar 19, 2026
Viaarxiv icon

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

Add code
Mar 17, 2026
Viaarxiv icon

PROSPECT: Unified Streaming Vision-Language Navigation via Semantic--Spatial Fusion and Latent Predictive Representation

Add code
Mar 04, 2026
Viaarxiv icon

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

Add code
Feb 26, 2026
Viaarxiv icon

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

Add code
Feb 19, 2026
Viaarxiv icon

Designing KRIYA: An AI Companion for Wellbeing Self-Reflection

Add code
Jan 21, 2026
Viaarxiv icon

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Add code
Dec 28, 2025
Viaarxiv icon