Picture for Xuankai Chang

Xuankai Chang

Towards Robust Speech Representation Learning for Thousands of Languages

Add code
Jul 02, 2024
Viaarxiv icon

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Add code
Jun 12, 2024
Viaarxiv icon

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Add code
Jun 11, 2024
Figure 1 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Figure 2 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Figure 3 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Figure 4 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Viaarxiv icon

A Large-Scale Evaluation of Speech Foundation Models

Add code
Apr 15, 2024
Figure 1 for A Large-Scale Evaluation of Speech Foundation Models
Figure 2 for A Large-Scale Evaluation of Speech Foundation Models
Figure 3 for A Large-Scale Evaluation of Speech Foundation Models
Figure 4 for A Large-Scale Evaluation of Speech Foundation Models
Viaarxiv icon

LV-CTC: Non-autoregressive ASR with CTC and latent variable models

Add code
Mar 28, 2024
Figure 1 for LV-CTC: Non-autoregressive ASR with CTC and latent variable models
Figure 2 for LV-CTC: Non-autoregressive ASR with CTC and latent variable models
Figure 3 for LV-CTC: Non-autoregressive ASR with CTC and latent variable models
Figure 4 for LV-CTC: Non-autoregressive ASR with CTC and latent variable models
Viaarxiv icon

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Add code
Feb 25, 2024
Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Jan 30, 2024
Figure 1 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 2 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 3 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 4 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Viaarxiv icon

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Add code
Oct 11, 2023
Figure 1 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 2 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 3 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 4 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Viaarxiv icon

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Add code
Oct 09, 2023
Figure 1 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Figure 2 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Figure 3 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Figure 4 for Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Viaarxiv icon

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model

Add code
Oct 06, 2023
Figure 1 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 2 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 3 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 4 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Viaarxiv icon