Picture for Xixin Wu

Xixin Wu

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction

Add code
Jun 11, 2025
Viaarxiv icon

Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder

Add code
Jun 10, 2025
Viaarxiv icon

WAKE: Watermarking Audio with Key Enrichment

Add code
Jun 06, 2025
Viaarxiv icon

RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning

Add code
May 28, 2025
Viaarxiv icon

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving

Add code
May 24, 2025
Viaarxiv icon

$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction

Add code
Apr 01, 2025
Viaarxiv icon

UniSep: Universal Target Audio Separation with Language Models at Scale

Add code
Mar 31, 2025
Viaarxiv icon

Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution

Add code
Mar 03, 2025
Viaarxiv icon

Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Add code
Jan 19, 2025
Figure 1 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 2 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 3 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 4 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Viaarxiv icon