Picture for Tianrui Wang

Tianrui Wang

Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought

Add code
May 11, 2026
Viaarxiv icon

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Add code
May 10, 2026
Viaarxiv icon

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Add code
May 07, 2026
Viaarxiv icon

VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models

Add code
May 06, 2026
Viaarxiv icon

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

Add code
Apr 24, 2026
Viaarxiv icon

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Add code
Apr 14, 2026
Viaarxiv icon

MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates

Add code
Mar 24, 2026
Viaarxiv icon

AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

Add code
Feb 11, 2026
Viaarxiv icon

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Add code
Jan 30, 2026
Viaarxiv icon

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

Add code
Jan 06, 2026
Viaarxiv icon