Alert button
Picture for Boris Ginsburg

Boris Ginsburg

Alert button

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

Mar 14, 2023
Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

Figure 1 for VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation
Viaarxiv icon

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

Feb 27, 2023
Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg

Figure 1 for Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Figure 2 for Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Figure 3 for Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Figure 4 for Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Viaarxiv icon

ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations

Feb 16, 2023
Shehzeen Hussain, Paarth Neekhara, Jocelyn Huang, Jason Li, Boris Ginsburg

Figure 1 for ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Figure 2 for ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Figure 3 for ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Figure 4 for ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Viaarxiv icon

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition

Dec 16, 2022
Aleksandr Laptev, Boris Ginsburg

Figure 1 for Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition
Figure 2 for Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition
Figure 3 for Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition
Figure 4 for Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition
Viaarxiv icon

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

Nov 09, 2022
Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg

Figure 1 for Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Figure 2 for Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Figure 3 for Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Figure 4 for Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Viaarxiv icon

Multi-blank Transducers for Speech Recognition

Nov 04, 2022
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

Figure 1 for Multi-blank Transducers for Speech Recognition
Figure 2 for Multi-blank Transducers for Speech Recognition
Figure 3 for Multi-blank Transducers for Speech Recognition
Figure 4 for Multi-blank Transducers for Speech Recognition
Viaarxiv icon

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

Nov 01, 2022
Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg

Figure 1 for Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Figure 2 for Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Figure 3 for Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Figure 4 for Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Viaarxiv icon

AmberNet: A Compact End-to-End Model for Spoken Language Identification

Oct 27, 2022
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

Figure 1 for AmberNet: A Compact End-to-End Model for Spoken Language Identification
Figure 2 for AmberNet: A Compact End-to-End Model for Spoken Language Identification
Figure 3 for AmberNet: A Compact End-to-End Model for Spoken Language Identification
Figure 4 for AmberNet: A Compact End-to-End Model for Spoken Language Identification
Viaarxiv icon

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

Oct 06, 2022
Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg

Figure 1 for Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Figure 2 for Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Figure 3 for Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Viaarxiv icon

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Jul 29, 2022
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

Figure 1 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 2 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 3 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 4 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Viaarxiv icon