Picture for Xudong Liu

Xudong Liu

MEDSYN: Benchmarking Multi-EviDence SYNthesis in Complex Clinical Cases for Multimodal Large Language Models

Add code
Feb 25, 2026
Viaarxiv icon

Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding

Add code
Feb 12, 2026
Viaarxiv icon

WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

Add code
Jan 13, 2026
Viaarxiv icon

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

Add code
Dec 18, 2025
Figure 1 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 2 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 3 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 4 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Viaarxiv icon

On Path to Multimodal Historical Reasoning: HistBench and HistAgent

Add code
May 26, 2025
Viaarxiv icon

Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

Add code
Dec 28, 2024
Viaarxiv icon

RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model

Add code
Dec 27, 2024
Figure 1 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 2 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 3 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 4 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Viaarxiv icon

XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation

Add code
Dec 24, 2024
Viaarxiv icon

Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Add code
Jun 26, 2024
Viaarxiv icon

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Figure 1 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 2 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 3 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 4 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Viaarxiv icon