Picture for Rama Doddipatla

Rama Doddipatla

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis

Add code
Jul 04, 2024
Viaarxiv icon

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding

Add code
Jun 21, 2024
Figure 1 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Figure 2 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Figure 3 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Figure 4 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Viaarxiv icon

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

Add code
Jan 08, 2024
Figure 1 for Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios
Figure 2 for Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios
Figure 3 for Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios
Figure 4 for Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios
Viaarxiv icon

Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues

Add code
Sep 21, 2023
Figure 1 for Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues
Figure 2 for Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues
Figure 3 for Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues
Figure 4 for Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues
Viaarxiv icon

Adversarial learning of neural user simulators for dialogue policy optimisation

Add code
Jun 01, 2023
Figure 1 for Adversarial learning of neural user simulators for dialogue policy optimisation
Figure 2 for Adversarial learning of neural user simulators for dialogue policy optimisation
Figure 3 for Adversarial learning of neural user simulators for dialogue policy optimisation
Figure 4 for Adversarial learning of neural user simulators for dialogue policy optimisation
Viaarxiv icon

Frame-wise and overlap-robust speaker embeddings for meeting diarization

Add code
Jun 01, 2023
Figure 1 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Figure 2 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Figure 3 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Figure 4 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Viaarxiv icon

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

Add code
Jun 01, 2023
Figure 1 for A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures
Figure 2 for A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures
Figure 3 for A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures
Figure 4 for A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures
Viaarxiv icon

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition

Add code
Apr 24, 2023
Figure 1 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 2 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 3 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 4 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Viaarxiv icon

Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

Add code
Apr 21, 2023
Figure 1 for Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Figure 2 for Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Figure 3 for Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Viaarxiv icon

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

Add code
Jul 29, 2022
Figure 1 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Figure 2 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Figure 3 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Figure 4 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Viaarxiv icon