Picture for Qi Fan

Qi Fan

CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models

Add code
Apr 14, 2026
Viaarxiv icon

Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning

Add code
Apr 08, 2026
Viaarxiv icon

VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning

Add code
Mar 26, 2026
Viaarxiv icon

Prompt-Free Universal Region Proposal Network

Add code
Mar 18, 2026
Viaarxiv icon

DreamWorld: Unified World Modeling in Video Generation

Add code
Feb 28, 2026
Viaarxiv icon

PointAlign: Feature-Level Alignment Regularization for 3D Vision-Language Models

Add code
Feb 28, 2026
Viaarxiv icon

Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning

Add code
Feb 27, 2026
Viaarxiv icon

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Add code
Feb 05, 2026
Viaarxiv icon

VMonarch: Efficient Video Diffusion Transformers with Structured Attention

Add code
Jan 29, 2026
Viaarxiv icon

FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing

Add code
Dec 30, 2025
Viaarxiv icon