Alert button
Picture for Wei-Ning Hsu

Wei-Ning Hsu

Alert button

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Add code
Bookmark button
Alert button
May 17, 2023
Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass

Figure 1 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Figure 2 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Figure 3 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Figure 4 for DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Viaarxiv icon

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

Add code
Bookmark button
Alert button
Mar 20, 2023
Maryam Fazel-Zarandi, Wei-Ning Hsu

Figure 1 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 2 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 3 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 4 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Viaarxiv icon

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Add code
Bookmark button
Alert button
Mar 07, 2023
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang

Figure 1 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 2 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 3 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 4 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Viaarxiv icon

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Add code
Bookmark button
Alert button
Feb 10, 2023
Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli

Figure 1 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 2 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 3 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 4 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Viaarxiv icon

Scaling Laws for Generative Mixed-Modal Language Models

Add code
Bookmark button
Alert button
Jan 10, 2023
Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

Figure 1 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 2 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 3 for Scaling Laws for Generative Mixed-Modal Language Models
Figure 4 for Scaling Laws for Generative Mixed-Modal Language Models
Viaarxiv icon

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Add code
Bookmark button
Alert button
Dec 21, 2022
Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

Figure 1 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Figure 2 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Figure 3 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Figure 4 for ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Viaarxiv icon

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Add code
Bookmark button
Alert button
Dec 14, 2022
Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli

Figure 1 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 2 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 3 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Figure 4 for Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Viaarxiv icon

Efficient Speech Representation Learning with Low-Bit Quantization

Add code
Bookmark button
Alert button
Dec 14, 2022
Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Abdelrahman Mohamed

Figure 1 for Efficient Speech Representation Learning with Low-Bit Quantization
Figure 2 for Efficient Speech Representation Learning with Low-Bit Quantization
Figure 3 for Efficient Speech Representation Learning with Low-Bit Quantization
Viaarxiv icon

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Add code
Bookmark button
Alert button
Dec 13, 2022
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

Figure 1 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 2 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 3 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 4 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Viaarxiv icon