Picture for Nan Xu

Nan Xu

PLAS-Net: Pixel-Level Area Segmentation for UAV-Based Beach Litter Monitoring

Add code
Apr 23, 2026
Viaarxiv icon

S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images

Add code
Apr 23, 2026
Viaarxiv icon

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

Add code
Apr 22, 2026
Viaarxiv icon

Wetland mapping from sparse annotations with satellite image time series and temporal-aware segment anything model

Add code
Jan 16, 2026
Viaarxiv icon

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models

Add code
Dec 08, 2025
Viaarxiv icon

Vibe Checker: Aligning Code Evaluation with Human Preference

Add code
Oct 08, 2025
Figure 1 for Vibe Checker: Aligning Code Evaluation with Human Preference
Figure 2 for Vibe Checker: Aligning Code Evaluation with Human Preference
Figure 3 for Vibe Checker: Aligning Code Evaluation with Human Preference
Figure 4 for Vibe Checker: Aligning Code Evaluation with Human Preference
Viaarxiv icon

ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering

Add code
Jun 11, 2025
Viaarxiv icon

TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering

Add code
Jun 04, 2025
Viaarxiv icon

ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering

Add code
May 29, 2025
Viaarxiv icon

MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement

Add code
May 19, 2025
Viaarxiv icon