Picture for Yui Sudo

Yui Sudo

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment

Add code
Mar 12, 2026
Viaarxiv icon

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

Add code
Mar 10, 2026
Viaarxiv icon

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation

Add code
Jun 12, 2025
Viaarxiv icon

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary

Add code
Jun 11, 2025
Figure 1 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 2 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 3 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 4 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Viaarxiv icon

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Add code
Jun 23, 2024
Figure 1 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 2 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 3 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Figure 4 for Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Viaarxiv icon

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Add code
Jun 05, 2024
Figure 1 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Figure 2 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Figure 3 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Figure 4 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Viaarxiv icon

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Add code
May 22, 2024
Figure 1 for Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Figure 2 for Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Figure 3 for Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Figure 4 for Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Viaarxiv icon

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Add code
Feb 20, 2024
Figure 1 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 2 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 3 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Figure 4 for OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Jan 30, 2024
Figure 1 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 2 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 3 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 4 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Viaarxiv icon

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search

Add code
Jan 19, 2024
Figure 1 for Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Figure 2 for Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Figure 3 for Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Figure 4 for Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Viaarxiv icon