Picture for Xuming Hu

Xuming Hu

May

Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding

Add code
Apr 19, 2026
Viaarxiv icon

Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis

Add code
Apr 15, 2026
Viaarxiv icon

Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation

Add code
Apr 14, 2026
Viaarxiv icon

Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval

Add code
Apr 11, 2026
Viaarxiv icon

StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding

Add code
Apr 10, 2026
Viaarxiv icon

Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation

Add code
Apr 04, 2026
Viaarxiv icon

Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models

Add code
Mar 28, 2026
Viaarxiv icon

AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

Add code
Mar 19, 2026
Viaarxiv icon

Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Panoramic Affordance Prediction

Add code
Mar 16, 2026
Viaarxiv icon