Picture for Long Zhou

Long Zhou

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Figure 1 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 2 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 3 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 4 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Viaarxiv icon

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Add code
Mar 31, 2024
Viaarxiv icon

Boosting Large Language Model for Speech Synthesis: An Empirical Study

Add code
Dec 30, 2023
Viaarxiv icon

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Add code
Sep 25, 2023
Figure 1 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 2 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 3 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 4 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Viaarxiv icon

On decoder-only architecture for speech-to-text and large language model integration

Add code
Jul 14, 2023
Viaarxiv icon

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Add code
May 25, 2023
Figure 1 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 2 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 3 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 4 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Viaarxiv icon

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Add code
May 24, 2023
Figure 1 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 2 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 3 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 4 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Viaarxiv icon

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

Add code
Mar 07, 2023
Viaarxiv icon

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

Add code
Mar 01, 2023
Figure 1 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Figure 2 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Figure 3 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Figure 4 for Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Viaarxiv icon