Picture for Chunyu Qiang

Chunyu Qiang

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Add code
May 10, 2026
Viaarxiv icon

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

Add code
Apr 24, 2026
Viaarxiv icon

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Add code
Jan 20, 2026
Viaarxiv icon

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Add code
Jan 08, 2026
Viaarxiv icon

Klear: Unified Multi-Task Audio-Video Joint Generation

Add code
Jan 07, 2026
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Figure 1 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 2 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 3 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 4 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Viaarxiv icon

Towards Flow-Matching-based TTS without Classifier-Free Guidance

Add code
Apr 29, 2025
Viaarxiv icon

Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models

Add code
Jan 24, 2025
Figure 1 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Figure 2 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Figure 3 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Figure 4 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Viaarxiv icon

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Add code
Dec 11, 2024
Viaarxiv icon

EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis

Add code
Sep 27, 2024
Figure 1 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Figure 2 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Figure 3 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Figure 4 for EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Viaarxiv icon