Picture for Xipeng Qiu

Xipeng Qiu

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Add code
Nov 06, 2025
Viaarxiv icon

MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval

Add code
Oct 31, 2025
Viaarxiv icon

Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning

Add code
Oct 30, 2025
Viaarxiv icon

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Add code
Oct 02, 2025
Figure 1 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Figure 2 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Figure 3 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Figure 4 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Viaarxiv icon

MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation

Add code
Oct 01, 2025
Viaarxiv icon

Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue

Add code
Sep 18, 2025
Viaarxiv icon

UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets

Add code
Sep 18, 2025
Viaarxiv icon

Decoupled Proxy Alignment: Mitigating Language Prior Conflict for Multimodal Alignment in MLLM

Add code
Sep 18, 2025
Viaarxiv icon

CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation

Add code
Aug 28, 2025
Figure 1 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Figure 2 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Figure 3 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Figure 4 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Viaarxiv icon

Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

Add code
Aug 28, 2025
Viaarxiv icon