Picture for Yusuke Fujita

Yusuke Fujita

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

Add code
Jun 24, 2026
Viaarxiv icon

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

Add code
Jun 24, 2026
Viaarxiv icon

Evaluating Japanese Dialect Robustness Across Speech and Text-based Large Language Models

Add code
Jun 24, 2026
Viaarxiv icon

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization

Add code
Mar 13, 2026
Viaarxiv icon

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment

Add code
Mar 12, 2026
Viaarxiv icon

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

Add code
Mar 10, 2026
Viaarxiv icon

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation

Add code
Jun 12, 2025
Viaarxiv icon

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary

Add code
Jun 11, 2025
Figure 1 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 2 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 3 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 4 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Viaarxiv icon

Music Tagging with Classifier Group Chains

Add code
Jan 09, 2025
Figure 1 for Music Tagging with Classifier Group Chains
Figure 2 for Music Tagging with Classifier Group Chains
Figure 3 for Music Tagging with Classifier Group Chains
Figure 4 for Music Tagging with Classifier Group Chains
Viaarxiv icon

Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework

Add code
Jun 24, 2024
Figure 1 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Figure 2 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Figure 3 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Figure 4 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Viaarxiv icon