Picture for Fengyun Rao

Fengyun Rao

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

Add code
May 19, 2026
Viaarxiv icon

Semantic-Enriched Latent Visual Reasoning

Add code
May 19, 2026
Viaarxiv icon

OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Add code
May 18, 2026
Viaarxiv icon

Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings

Add code
Apr 24, 2026
Viaarxiv icon

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Add code
Mar 17, 2026
Viaarxiv icon

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

Add code
Feb 22, 2026
Viaarxiv icon

D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning

Add code
Feb 08, 2026
Viaarxiv icon

SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback

Add code
Feb 05, 2026
Viaarxiv icon

ObjEmbed: Towards Universal Multimodal Object Embeddings

Add code
Feb 03, 2026
Viaarxiv icon

MMhops-R1: Multimodal Multi-hop Reasoning

Add code
Dec 16, 2025
Figure 1 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 2 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 3 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 4 for MMhops-R1: Multimodal Multi-hop Reasoning
Viaarxiv icon