Picture for Yui Sudo

Yui Sudo

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Add code
Jun 23, 2024
Viaarxiv icon

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Add code
Jun 05, 2024
Viaarxiv icon

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Add code
May 22, 2024
Viaarxiv icon

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Add code
Feb 20, 2024
Figure 1 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 2 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 3 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 4 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Jan 30, 2024
Figure 1 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 2 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 3 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 4 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Viaarxiv icon

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search

Add code
Jan 19, 2024
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

Add code
May 29, 2023
Figure 1 for Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation
Figure 2 for Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation
Figure 3 for Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation
Figure 4 for Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation
Viaarxiv icon

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Add code
May 28, 2023
Figure 1 for DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Figure 2 for DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Figure 3 for DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Viaarxiv icon

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Add code
Dec 21, 2022
Figure 1 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 2 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 3 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Viaarxiv icon