Picture for Yingqi Fan

Yingqi Fan

What Do Visual Tokens Really Encode? Uncovering Sparsity and Redundancy in Multimodal Large Language Models

Add code
Feb 28, 2026
Viaarxiv icon

HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit

Add code
Feb 27, 2026
Viaarxiv icon

On-Policy Supervised Fine-Tuning for Efficient Reasoning

Add code
Feb 13, 2026
Viaarxiv icon

ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention

Add code
Feb 07, 2026
Viaarxiv icon

SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling

Add code
Jun 04, 2025
Viaarxiv icon

LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

Add code
May 22, 2025
Viaarxiv icon

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Add code
Mar 08, 2025
Viaarxiv icon