Picture for Xuri Ge

Xuri Ge

MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval

Add code
Mar 18, 2026
Viaarxiv icon

Hierarchical Dual-Change Collaborative Learning for UAV Scene Change Captioning

Add code
Mar 13, 2026
Viaarxiv icon

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues

Add code
Jan 28, 2026
Viaarxiv icon

Differentiable Semantic ID for Generative Recommendation

Add code
Jan 27, 2026
Viaarxiv icon

Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering

Add code
Jan 27, 2026
Viaarxiv icon

Reinforced Efficient Reasoning via Semantically Diverse Exploration

Add code
Jan 08, 2026
Viaarxiv icon

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

Add code
Jan 01, 2026
Viaarxiv icon

The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

Add code
Apr 21, 2025
Viaarxiv icon

CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

Add code
Apr 14, 2025
Viaarxiv icon

Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

Add code
Apr 14, 2025
Figure 1 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 2 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 3 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Figure 4 for Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis
Viaarxiv icon