Picture for Xinlong Wang

Xinlong Wang

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Add code
Apr 21, 2026
Viaarxiv icon

Scaling World Model for Hierarchical Manipulation Policies

Add code
Feb 12, 2026
Viaarxiv icon

MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation

Add code
Feb 09, 2026
Viaarxiv icon

DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter

Add code
Feb 05, 2026
Viaarxiv icon

EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models

Add code
Feb 04, 2026
Viaarxiv icon

LINA: Linear Autoregressive Image Generative Models with Continuous Tokens

Add code
Jan 30, 2026
Viaarxiv icon

Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

Add code
Dec 11, 2025
Figure 1 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Figure 2 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Figure 3 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Figure 4 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Viaarxiv icon

Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments

Add code
Oct 30, 2025
Figure 1 for Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments
Figure 2 for Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments
Figure 3 for Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments
Figure 4 for Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments
Viaarxiv icon

Emu3.5: Native Multimodal Models are World Learners

Add code
Oct 30, 2025
Viaarxiv icon

BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP

Add code
Oct 22, 2025
Viaarxiv icon