Picture for Xilin Jiang

Xilin Jiang

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior

Add code
May 17, 2025
Viaarxiv icon

Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation

Add code
May 11, 2025
Viaarxiv icon

Unsupervised Blind Speech Separation with a Diffusion Prior

Add code
May 08, 2025
Viaarxiv icon

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Add code
Feb 24, 2025
Viaarxiv icon

Exploring Finetuned Audio-LLM on Heart Murmur Features

Add code
Jan 23, 2025
Figure 1 for Exploring Finetuned Audio-LLM on Heart Murmur Features
Figure 2 for Exploring Finetuned Audio-LLM on Heart Murmur Features
Figure 3 for Exploring Finetuned Audio-LLM on Heart Murmur Features
Figure 4 for Exploring Finetuned Audio-LLM on Heart Murmur Features
Viaarxiv icon

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

Add code
Sep 16, 2024
Viaarxiv icon

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

Add code
Sep 07, 2024
Viaarxiv icon

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

Add code
Aug 13, 2024
Viaarxiv icon

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

Add code
Jul 13, 2024
Viaarxiv icon

SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Add code
May 20, 2024
Viaarxiv icon