Picture for Ambuj Mehrish

Ambuj Mehrish

Constrained Dominant Sets for Multimodal Document Question Answering

Add code
Jun 05, 2026
Viaarxiv icon

FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&A

Add code
Jun 05, 2026
Viaarxiv icon

DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors

Add code
May 23, 2025
Viaarxiv icon

PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control

Add code
Jan 10, 2025
Viaarxiv icon

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Add code
Dec 30, 2024
Viaarxiv icon

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech

Add code
Oct 17, 2024
Figure 1 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Figure 2 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Figure 3 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Figure 4 for DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
Viaarxiv icon

Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Add code
Jun 25, 2024
Figure 1 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Figure 2 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Figure 3 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Figure 4 for Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Viaarxiv icon

Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

Add code
Jun 25, 2024
Figure 1 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Figure 2 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Figure 3 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Figure 4 for Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Viaarxiv icon

Improving Text-To-Audio Models with Synthetic Captions

Add code
Jun 18, 2024
Figure 1 for Improving Text-To-Audio Models with Synthetic Captions
Figure 2 for Improving Text-To-Audio Models with Synthetic Captions
Figure 3 for Improving Text-To-Audio Models with Synthetic Captions
Figure 4 for Improving Text-To-Audio Models with Synthetic Captions
Viaarxiv icon

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

Add code
Jun 03, 2024
Figure 1 for Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
Figure 2 for Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
Figure 3 for Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
Figure 4 for Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
Viaarxiv icon