Picture for Ruoyu Chen

Ruoyu Chen

School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China

Explaining multimodal LLMs via intra-modal token interactions

Add code
Sep 26, 2025
Viaarxiv icon

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation

Add code
Sep 26, 2025
Viaarxiv icon

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Add code
Aug 12, 2025
Viaarxiv icon

Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding

Add code
Jun 14, 2025
Viaarxiv icon

APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

Add code
Jun 09, 2025
Viaarxiv icon

Unpacking Positional Encoding in Transformers: A Spectral Analysis of Content-Position Coupling

Add code
May 19, 2025
Viaarxiv icon

FaceInsight: A Multimodal Large Language Model for Face Perception

Add code
Apr 22, 2025
Figure 1 for FaceInsight: A Multimodal Large Language Model for Face Perception
Figure 2 for FaceInsight: A Multimodal Large Language Model for Face Perception
Figure 3 for FaceInsight: A Multimodal Large Language Model for Face Perception
Figure 4 for FaceInsight: A Multimodal Large Language Model for Face Perception
Viaarxiv icon

Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection

Add code
Apr 09, 2025
Viaarxiv icon

Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Add code
Apr 04, 2025
Viaarxiv icon

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection

Add code
Apr 01, 2025
Viaarxiv icon