Picture for Vimal Manohar

Vimal Manohar

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Add code
May 17, 2024
Viaarxiv icon

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Add code
Jun 23, 2023
Figure 1 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Figure 2 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Figure 3 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Figure 4 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Viaarxiv icon

Self-Supervised Representations for Singing Voice Conversion

Add code
Mar 21, 2023
Figure 1 for Self-Supervised Representations for Singing Voice Conversion
Figure 2 for Self-Supervised Representations for Singing Voice Conversion
Figure 3 for Self-Supervised Representations for Singing Voice Conversion
Figure 4 for Self-Supervised Representations for Singing Voice Conversion
Viaarxiv icon

Voice-preserving Zero-shot Multiple Accent Conversion

Add code
Nov 23, 2022
Figure 1 for Voice-preserving Zero-shot Multiple Accent Conversion
Figure 2 for Voice-preserving Zero-shot Multiple Accent Conversion
Figure 3 for Voice-preserving Zero-shot Multiple Accent Conversion
Figure 4 for Voice-preserving Zero-shot Multiple Accent Conversion
Viaarxiv icon

Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

Add code
Oct 28, 2022
Figure 1 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Figure 2 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Figure 3 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Figure 4 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Viaarxiv icon

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

Add code
Oct 08, 2021
Figure 1 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 2 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 3 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 4 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Viaarxiv icon

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

Add code
Jul 09, 2021
Figure 1 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Figure 2 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Figure 3 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Figure 4 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Viaarxiv icon

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

Add code
Jun 14, 2021
Figure 1 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Figure 2 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Figure 3 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Figure 4 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Viaarxiv icon

Large scale weakly and semi-supervised learning for low-resource video ASR

Add code
May 16, 2020
Figure 1 for Large scale weakly and semi-supervised learning for low-resource video ASR
Figure 2 for Large scale weakly and semi-supervised learning for low-resource video ASR
Figure 3 for Large scale weakly and semi-supervised learning for low-resource video ASR
Viaarxiv icon

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Add code
May 02, 2020
Figure 1 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 2 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 3 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 4 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Viaarxiv icon