Picture for Soumi Maiti

Soumi Maiti

Towards Robust Speech Representation Learning for Thousands of Languages

Add code
Jul 02, 2024
Viaarxiv icon

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Add code
Feb 25, 2024
Viaarxiv icon

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

Add code
Jan 31, 2024
Viaarxiv icon

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Add code
Jan 30, 2024
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Add code
Oct 01, 2023
Figure 1 for Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Figure 2 for Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Add code
Sep 28, 2023
Figure 1 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 2 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 3 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 4 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Viaarxiv icon

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Add code
Sep 27, 2023
Figure 1 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 2 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 3 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 4 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Viaarxiv icon

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

Add code
Sep 18, 2023
Figure 1 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Figure 2 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Figure 3 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Figure 4 for Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Viaarxiv icon

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Add code
Sep 15, 2023
Figure 1 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 2 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 3 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Figure 4 for Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Viaarxiv icon