Picture for Xuming Hu

Xuming Hu

May

SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models

Add code
Oct 08, 2025
Viaarxiv icon

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

Add code
Oct 08, 2025
Viaarxiv icon

DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction

Add code
Sep 18, 2025
Viaarxiv icon

PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era

Add code
Sep 16, 2025
Viaarxiv icon

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

Add code
Aug 12, 2025
Viaarxiv icon

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Add code
Aug 06, 2025
Viaarxiv icon

VLA-Mark: A cross modal watermark for large vision-language alignment model

Add code
Jul 18, 2025
Figure 1 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Figure 2 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Figure 3 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Figure 4 for VLA-Mark: A cross modal watermark for large vision-language alignment model
Viaarxiv icon

Da Yu: Towards USV-Based Image Captioning for Waterway Surveillance and Scene Understanding

Add code
Jun 24, 2025
Viaarxiv icon

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

Add code
May 30, 2025
Viaarxiv icon

Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

Add code
May 27, 2025
Viaarxiv icon