Picture for Jingran Xie

Jingran Xie

TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis

Add code
Apr 24, 2026
Viaarxiv icon

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents

Add code
May 27, 2025
Figure 1 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Figure 2 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Figure 3 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Figure 4 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Viaarxiv icon

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving

Add code
May 24, 2025
Viaarxiv icon

Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Add code
Jan 19, 2025
Figure 1 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 2 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 3 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 4 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Viaarxiv icon