Picture for Wei-Ning Hsu

Wei-Ning Hsu

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Add code
Jun 23, 2023
Viaarxiv icon

Scaling Speech Technology to 1,000+ Languages

Add code
May 22, 2023
Viaarxiv icon

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Add code
May 17, 2023
Figure 1 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Figure 2 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Figure 3 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Figure 4 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Viaarxiv icon

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

Add code
Mar 20, 2023
Figure 1 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 2 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 3 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 4 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Viaarxiv icon

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Add code
Mar 07, 2023
Viaarxiv icon

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Add code
Feb 10, 2023
Viaarxiv icon

Scaling Laws for Generative Mixed-Modal Language Models

Add code
Jan 10, 2023
Figure 1 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 2 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 3 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 4 for Scaling Laws for Generative Mixed-Modal Language Models
Viaarxiv icon

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Add code
Dec 21, 2022
Figure 1 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Figure 2 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Figure 3 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Figure 4 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Viaarxiv icon

Efficient Speech Representation Learning with Low-Bit Quantization

Add code
Dec 14, 2022
Figure 1 for Efficient Speech Representation Learning with Low-Bit Quantization
Figure 2 for Efficient Speech Representation Learning with Low-Bit Quantization
Figure 3 for Efficient Speech Representation Learning with Low-Bit Quantization
Viaarxiv icon

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Add code
Dec 14, 2022
Figure 1 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 2 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 3 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 4 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Viaarxiv icon