Picture for Atsunori Ogawa

Atsunori Ogawa

Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over

Add code
Jun 27, 2024
Viaarxiv icon

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition

Add code
Dec 22, 2023
Viaarxiv icon

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

Add code
Dec 20, 2023
Viaarxiv icon

Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition

Add code
Oct 17, 2023
Figure 1 for Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition
Figure 2 for Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition
Figure 3 for Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition
Figure 4 for Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition
Viaarxiv icon

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

Add code
Sep 22, 2023
Figure 1 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Figure 2 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Figure 3 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Figure 4 for NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Viaarxiv icon

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

Add code
Jun 07, 2023
Figure 1 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Figure 2 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Figure 3 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Figure 4 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Add code
May 25, 2023
Figure 1 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 2 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 3 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 4 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Viaarxiv icon

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

Add code
May 23, 2023
Figure 1 for Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Figure 2 for Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Figure 3 for Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Viaarxiv icon

Leveraging Large Text Corpora for End-to-End Speech Summarization

Add code
Mar 02, 2023
Figure 1 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 2 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 3 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Figure 4 for Leveraging Large Text Corpora for End-to-End Speech Summarization
Viaarxiv icon

Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening

Add code
Mar 31, 2022
Figure 1 for Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening
Figure 2 for Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening
Figure 3 for Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening
Figure 4 for Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening
Viaarxiv icon