Picture for Xuming Hu

Xuming Hu

May

AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

Add code
Mar 19, 2026
Viaarxiv icon

Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Panoramic Affordance Prediction

Add code
Mar 16, 2026
Viaarxiv icon

EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next

Add code
Mar 12, 2026
Viaarxiv icon

LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning

Add code
Mar 12, 2026
Viaarxiv icon

Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design

Add code
Mar 11, 2026
Viaarxiv icon

Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations

Add code
Mar 02, 2026
Viaarxiv icon

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

Add code
Feb 23, 2026
Viaarxiv icon

Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

Add code
Feb 23, 2026
Viaarxiv icon

SONIC: Segmented Optimized Nexus for Information Compression in Key-Value Caching

Add code
Jan 29, 2026
Viaarxiv icon