Picture for Xuming Hu

Xuming Hu

May

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

Add code
Oct 29, 2025
Viaarxiv icon

AI for Service: Proactive Assistance with AI Glasses

Add code
Oct 16, 2025
Viaarxiv icon

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

Add code
Oct 08, 2025
Viaarxiv icon

SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models

Add code
Oct 08, 2025
Viaarxiv icon

DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction

Add code
Sep 18, 2025
Viaarxiv icon

PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era

Add code
Sep 16, 2025
Viaarxiv icon

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

Add code
Aug 12, 2025
Viaarxiv icon

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Add code
Aug 06, 2025
Viaarxiv icon

VLA-Mark: A cross modal watermark for large vision-language alignment model

Add code
Jul 18, 2025
Figure 1 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Figure 2 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Figure 3 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Figure 4 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Viaarxiv icon

Da Yu: Towards USV-Based Image Captioning for Waterway Surveillance and Scene Understanding

Add code
Jun 24, 2025
Viaarxiv icon