Picture for Zhengyan Sheng

Zhengyan Sheng

JoyVoice: Long-Context Conditioning for Anthropomorphic Multi-Speaker Conversational Synthesis

Add code
Dec 22, 2025
Viaarxiv icon

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

Add code
May 14, 2025
Viaarxiv icon

Introducing voice timbre attribute detection

Add code
May 14, 2025
Figure 1 for Introducing voice timbre attribute detection
Figure 2 for Introducing voice timbre attribute detection
Figure 3 for Introducing voice timbre attribute detection
Figure 4 for Introducing voice timbre attribute detection
Viaarxiv icon

Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation

Add code
Jan 11, 2025
Figure 1 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Figure 2 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Figure 3 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Figure 4 for Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
Viaarxiv icon

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

Add code
Dec 13, 2024
Figure 1 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 2 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 3 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 4 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Viaarxiv icon

Voice Attribute Editing with Text Prompt

Add code
Apr 13, 2024
Figure 1 for Voice Attribute Editing with Text Prompt
Figure 2 for Voice Attribute Editing with Text Prompt
Figure 3 for Voice Attribute Editing with Text Prompt
Figure 4 for Voice Attribute Editing with Text Prompt
Viaarxiv icon