Picture for Yusuke Fujita

Yusuke Fujita

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization

Add code
Mar 13, 2026
Viaarxiv icon

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment

Add code
Mar 12, 2026
Viaarxiv icon

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

Add code
Mar 10, 2026
Viaarxiv icon

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation

Add code
Jun 12, 2025
Viaarxiv icon

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary

Add code
Jun 11, 2025
Figure 1 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 2 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 3 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Figure 4 for OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Viaarxiv icon

Music Tagging with Classifier Group Chains

Add code
Jan 09, 2025
Figure 1 for Music Tagging with Classifier Group Chains
Figure 2 for Music Tagging with Classifier Group Chains
Figure 3 for Music Tagging with Classifier Group Chains
Figure 4 for Music Tagging with Classifier Group Chains
Viaarxiv icon

Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework

Add code
Jun 24, 2024
Figure 1 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Figure 2 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Figure 3 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Figure 4 for Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Viaarxiv icon

Audio Fingerprinting with Holographic Reduced Representations

Add code
Jun 19, 2024
Figure 1 for Audio Fingerprinting with Holographic Reduced Representations
Figure 2 for Audio Fingerprinting with Holographic Reduced Representations
Figure 3 for Audio Fingerprinting with Holographic Reduced Representations
Figure 4 for Audio Fingerprinting with Holographic Reduced Representations
Viaarxiv icon

Universal Score-based Speech Enhancement with High Content Preservation

Add code
Jun 18, 2024
Figure 1 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 2 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 3 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 4 for Universal Score-based Speech Enhancement with High Content Preservation
Viaarxiv icon

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Add code
May 17, 2024
Viaarxiv icon