Picture for Xin Wen

Xin Wen

University of Science and Technology of China

Vision Foundation Models as Generalist Tokenizers for Image Generation

Add code
May 18, 2026
Viaarxiv icon

Beyond Visual Cues: Semantic-Driven Token Filtering and Expert Routing for Anytime Person ReID

Add code
Apr 16, 2026
Viaarxiv icon

ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation

Add code
Apr 13, 2026
Viaarxiv icon

Policy-Aware Design of Large-Scale Factorial Experiments

Add code
Apr 09, 2026
Viaarxiv icon

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

Add code
Apr 07, 2026
Viaarxiv icon

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

Add code
Jan 28, 2026
Viaarxiv icon

A Trajectory-free Crash Detection Framework with Generative Approach and Segment Map Diffusion

Add code
Nov 17, 2025
Viaarxiv icon

The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning

Add code
Sep 16, 2025
Viaarxiv icon

Angio-Diff: Learning a Self-Supervised Adversarial Diffusion Model for Angiographic Geometry Generation

Add code
Jun 24, 2025
Viaarxiv icon

TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

Add code
May 14, 2025
Viaarxiv icon