Picture for Yixiang Chen

Yixiang Chen

Improving Vision-Language-Action Model Fine-Tuning with Structured Stage and Keyframe Supervision

Add code
Jun 25, 2026
Viaarxiv icon

SKIP: Sparse Keyframe Interpolation Paradigm for Efficient Embodied World Models

Add code
May 30, 2026
Viaarxiv icon

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

Add code
Apr 03, 2026
Viaarxiv icon

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks

Add code
Feb 03, 2026
Viaarxiv icon

VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation

Add code
Dec 18, 2025
Viaarxiv icon

EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

Add code
Sep 26, 2025
Figure 1 for EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation
Figure 2 for EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation
Figure 3 for EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation
Figure 4 for EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation
Viaarxiv icon

DTPA: Dynamic Token-level Prefix Augmentation for Controllable Text Generation

Add code
Aug 06, 2025
Figure 1 for DTPA: Dynamic Token-level Prefix Augmentation for Controllable Text Generation
Figure 2 for DTPA: Dynamic Token-level Prefix Augmentation for Controllable Text Generation
Figure 3 for DTPA: Dynamic Token-level Prefix Augmentation for Controllable Text Generation
Figure 4 for DTPA: Dynamic Token-level Prefix Augmentation for Controllable Text Generation
Viaarxiv icon

FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization

Add code
Jul 17, 2025
Figure 1 for FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
Figure 2 for FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
Figure 3 for FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
Figure 4 for FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
Viaarxiv icon

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

Add code
Jul 08, 2025
Figure 1 for EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Figure 2 for EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Figure 3 for EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Figure 4 for EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Viaarxiv icon

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Add code
Jun 09, 2025
Viaarxiv icon