Picture for Xuming He

Xuming He

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

Add code
Mar 18, 2026
Viaarxiv icon

WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition

Add code
Mar 10, 2026
Viaarxiv icon

AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis

Add code
Mar 09, 2026
Viaarxiv icon

Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum

Add code
Mar 05, 2026
Viaarxiv icon

DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations

Add code
Jan 02, 2026
Viaarxiv icon

Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding

Add code
Dec 25, 2025
Figure 1 for Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
Figure 2 for Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
Figure 3 for Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
Figure 4 for Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
Viaarxiv icon

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Add code
Dec 18, 2025
Viaarxiv icon

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Add code
Oct 31, 2025
Figure 1 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Figure 2 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Figure 3 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Figure 4 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Viaarxiv icon

Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning

Add code
Oct 30, 2025
Figure 1 for Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning
Figure 2 for Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning
Figure 3 for Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning
Figure 4 for Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning
Viaarxiv icon

Pack and Force Your Memory: Long-form and Consistent Video Generation

Add code
Oct 02, 2025
Viaarxiv icon