Picture for Rohan Badlani

Rohan Badlani

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Add code
Jun 25, 2024
Figure 1 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Figure 2 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Figure 3 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Figure 4 for Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Viaarxiv icon

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Add code
Feb 02, 2024
Figure 1 for Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Figure 2 for Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Figure 3 for Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Figure 4 for Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Viaarxiv icon

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

Add code
Jan 29, 2024
Viaarxiv icon

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

Add code
Mar 14, 2023
Figure 1 for VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation
Viaarxiv icon

Multilingual Multiaccented Multispeaker TTS with RADTTS

Add code
Jan 24, 2023
Figure 1 for Multilingual Multiaccented Multispeaker TTS with RADTTS
Figure 2 for Multilingual Multiaccented Multispeaker TTS with RADTTS
Figure 3 for Multilingual Multiaccented Multispeaker TTS with RADTTS
Figure 4 for Multilingual Multiaccented Multispeaker TTS with RADTTS
Viaarxiv icon

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

Add code
Mar 07, 2022
Figure 1 for Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Figure 2 for Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Figure 3 for Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Figure 4 for Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Viaarxiv icon

One TTS Alignment To Rule Them All

Add code
Aug 23, 2021
Figure 1 for One TTS Alignment To Rule Them All
Figure 2 for One TTS Alignment To Rule Them All
Figure 3 for One TTS Alignment To Rule Them All
Figure 4 for One TTS Alignment To Rule Them All
Viaarxiv icon

Relation Extraction with Contextualized Relation Embedding (CRE)

Add code
Nov 19, 2020
Figure 1 for Relation Extraction with Contextualized Relation Embedding (CRE)
Figure 2 for Relation Extraction with Contextualized Relation Embedding (CRE)
Figure 3 for Relation Extraction with Contextualized Relation Embedding (CRE)
Figure 4 for Relation Extraction with Contextualized Relation Embedding (CRE)
Viaarxiv icon

Framework for evaluation of sound event detection in web videos

Add code
Apr 04, 2018
Figure 1 for Framework for evaluation of sound event detection in web videos
Figure 2 for Framework for evaluation of sound event detection in web videos
Figure 3 for Framework for evaluation of sound event detection in web videos
Figure 4 for Framework for evaluation of sound event detection in web videos
Viaarxiv icon

An Approach for Self-Training Audio Event Detectors Using Web Data

Add code
Jun 27, 2017
Figure 1 for An Approach for Self-Training Audio Event Detectors Using Web Data
Figure 2 for An Approach for Self-Training Audio Event Detectors Using Web Data
Figure 3 for An Approach for Self-Training Audio Event Detectors Using Web Data
Viaarxiv icon