Picture for Shengpeng Ji

Shengpeng Ji

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Add code
Jun 25, 2024
Figure 1 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 2 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 3 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 4 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Viaarxiv icon

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Add code
Jun 03, 2024
Viaarxiv icon

Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment

Add code
Mar 08, 2024
Figure 1 for Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
Figure 2 for Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
Figure 3 for Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
Figure 4 for Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
Viaarxiv icon

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Add code
Feb 20, 2024
Viaarxiv icon

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

Add code
Feb 14, 2024
Figure 1 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Figure 2 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Figure 3 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Figure 4 for MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Viaarxiv icon

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Add code
Aug 28, 2023
Figure 1 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Figure 2 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Figure 3 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Figure 4 for TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Viaarxiv icon

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Add code
Jun 06, 2023
Figure 1 for Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Figure 2 for Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Figure 3 for Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Figure 4 for Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Viaarxiv icon