Picture for Xilin Chen

Xilin Chen

GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Add code
Jun 17, 2025
Viaarxiv icon

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Add code
Jun 09, 2025
Viaarxiv icon

un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP

Add code
May 30, 2025
Viaarxiv icon

Jodi: Unification of Visual Generation and Understanding via Joint Modeling

Add code
May 25, 2025
Viaarxiv icon

Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

Add code
May 23, 2025
Viaarxiv icon

Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models

Add code
Apr 29, 2025
Viaarxiv icon

DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks

Add code
Apr 24, 2025
Viaarxiv icon

EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models

Add code
Mar 26, 2025
Viaarxiv icon

REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models

Add code
Mar 20, 2025
Viaarxiv icon