Picture for Ziyang Ma

Ziyang Ma

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Add code
Jul 09, 2024
Viaarxiv icon

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

Add code
Jun 22, 2024
Figure 1 for TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Figure 2 for TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Figure 3 for TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Figure 4 for TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Viaarxiv icon

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

Add code
Jun 17, 2024
Viaarxiv icon

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Add code
Jun 11, 2024
Figure 1 for EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Figure 2 for EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Figure 3 for EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Figure 4 for EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Viaarxiv icon

MaLa-ASR: Multimedia-Assisted LLM-Based ASR

Add code
Jun 09, 2024
Figure 1 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 2 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 3 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 4 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Viaarxiv icon

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR

Add code
Jun 07, 2024
Viaarxiv icon

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Add code
May 30, 2024
Viaarxiv icon

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Add code
May 29, 2024
Viaarxiv icon

MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

Add code
Apr 29, 2024
Figure 1 for MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Figure 2 for MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Figure 3 for MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Figure 4 for MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Viaarxiv icon

MuPT: A Generative Symbolic Music Pretrained Transformer

Add code
Apr 10, 2024
Figure 1 for MuPT: A Generative Symbolic Music Pretrained Transformer
Figure 2 for MuPT: A Generative Symbolic Music Pretrained Transformer
Figure 3 for MuPT: A Generative Symbolic Music Pretrained Transformer
Figure 4 for MuPT: A Generative Symbolic Music Pretrained Transformer
Viaarxiv icon