Picture for Xilin Chen

Xilin Chen

DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks

Add code
Apr 24, 2025
Viaarxiv icon

EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models

Add code
Mar 26, 2025
Figure 1 for EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Figure 2 for EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Figure 3 for EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Figure 4 for EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Viaarxiv icon

REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models

Add code
Mar 20, 2025
Figure 1 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Figure 2 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Figure 3 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Figure 4 for REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Viaarxiv icon

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

Add code
Feb 28, 2025
Figure 1 for OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing
Figure 2 for OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing
Figure 3 for OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing
Figure 4 for OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing
Viaarxiv icon

MATS: An Audio Language Model under Text-only Supervision

Add code
Feb 20, 2025
Figure 1 for MATS: An Audio Language Model under Text-only Supervision
Figure 2 for MATS: An Audio Language Model under Text-only Supervision
Figure 3 for MATS: An Audio Language Model under Text-only Supervision
Figure 4 for MATS: An Audio Language Model under Text-only Supervision
Viaarxiv icon

Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation

Add code
Jan 08, 2025
Figure 1 for Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation
Figure 2 for Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation
Figure 3 for Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation
Figure 4 for Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation
Viaarxiv icon

M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs

Add code
Dec 30, 2024
Viaarxiv icon

Multi-P$^2$A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models

Add code
Dec 27, 2024
Viaarxiv icon

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Add code
Nov 25, 2024
Figure 1 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 2 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 3 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 4 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Viaarxiv icon

Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection

Add code
Nov 18, 2024
Figure 1 for Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection
Figure 2 for Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection
Figure 3 for Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection
Figure 4 for Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection
Viaarxiv icon