Alert button
Picture for Vimal Manohar

Vimal Manohar

Alert button

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Jun 23, 2023
Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu

Figure 1 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Figure 2 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Figure 3 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Figure 4 for Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Viaarxiv icon

Self-Supervised Representations for Singing Voice Conversion

Mar 21, 2023
Tejas Jayashankar, Jilong Wu, Leda Sari, David Kant, Vimal Manohar, Qing He

Figure 1 for Self-Supervised Representations for Singing Voice Conversion
Figure 2 for Self-Supervised Representations for Singing Voice Conversion
Figure 3 for Self-Supervised Representations for Singing Voice Conversion
Figure 4 for Self-Supervised Representations for Singing Voice Conversion
Viaarxiv icon

Voice-preserving Zero-shot Multiple Accent Conversion

Nov 23, 2022
Mumin Jin, Prashant Serai, Jilong Wu, Andros Tjandra, Vimal Manohar, Qing He

Figure 1 for Voice-preserving Zero-shot Multiple Accent Conversion
Figure 2 for Voice-preserving Zero-shot Multiple Accent Conversion
Figure 3 for Voice-preserving Zero-shot Multiple Accent Conversion
Figure 4 for Voice-preserving Zero-shot Multiple Accent Conversion
Viaarxiv icon

Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

Oct 28, 2022
Jason Fong, Yun Wang, Prabhav Agrawal, Vimal Manohar, Jilong Wu, Thilo Köhler, Qing He

Figure 1 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Figure 2 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Figure 3 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Figure 4 for Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Viaarxiv icon

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

Oct 08, 2021
Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Figure 1 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 2 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 3 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 4 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Viaarxiv icon

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

Jul 09, 2021
Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer

Figure 1 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Figure 2 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Figure 3 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Figure 4 for On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Viaarxiv icon

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

Jun 14, 2021
Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

Figure 1 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Figure 2 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Figure 3 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Figure 4 for Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition
Viaarxiv icon

Large scale weakly and semi-supervised learning for low-resource video ASR

May 16, 2020
Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

Figure 1 for Large scale weakly and semi-supervised learning for low-resource video ASR
Figure 2 for Large scale weakly and semi-supervised learning for low-resource video ASR
Figure 3 for Large scale weakly and semi-supervised learning for low-resource video ASR
Viaarxiv icon

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

May 02, 2020
Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

Figure 1 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 2 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 3 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 4 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Viaarxiv icon

Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages

Jun 18, 2018
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur

Figure 1 for Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages
Figure 2 for Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages
Figure 3 for Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages
Figure 4 for Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages
Viaarxiv icon