Picture for Xilin Chen

Xilin Chen

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Add code
Jun 09, 2025
Viaarxiv icon

un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP

Add code
May 30, 2025
Viaarxiv icon

Jodi: Unification of Visual Generation and Understanding via Joint Modeling

Add code
May 25, 2025
Viaarxiv icon

Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

Add code
May 23, 2025
Viaarxiv icon

Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models

Add code
Apr 29, 2025
Viaarxiv icon

DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks

Add code
Apr 24, 2025
Viaarxiv icon

EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models

Add code
Mar 26, 2025
Viaarxiv icon

REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models

Add code
Mar 20, 2025
Viaarxiv icon

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

Add code
Feb 28, 2025
Viaarxiv icon

MATS: An Audio Language Model under Text-only Supervision

Add code
Feb 20, 2025
Viaarxiv icon