Picture for Hayato Futami

Hayato Futami

Benchmarking Speech-to-Speech Translation Models

Add code
Jun 02, 2026
Viaarxiv icon

Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback

Add code
Jan 27, 2026
Viaarxiv icon

Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems

Add code
Oct 02, 2025
Viaarxiv icon

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs

Add code
Jun 12, 2025
Figure 1 for Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Figure 2 for Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Figure 3 for Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Figure 4 for Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Viaarxiv icon

Differentiable K-means for Fully-optimized Discrete Token-based ASR

Add code
May 22, 2025
Viaarxiv icon

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems

Add code
Mar 11, 2025
Figure 1 for ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
Figure 2 for ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
Figure 3 for ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
Figure 4 for ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
Viaarxiv icon

Task Arithmetic for Language Expansion in Speech Translation

Add code
Sep 17, 2024
Figure 1 for Task Arithmetic for Language Expansion in Speech Translation
Figure 2 for Task Arithmetic for Language Expansion in Speech Translation
Figure 3 for Task Arithmetic for Language Expansion in Speech Translation
Figure 4 for Task Arithmetic for Language Expansion in Speech Translation
Viaarxiv icon

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Add code
Jun 23, 2024
Figure 1 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 2 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 3 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Viaarxiv icon

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

Add code
Jun 18, 2024
Figure 1 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 2 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 3 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 4 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Viaarxiv icon

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

Add code
Jun 18, 2024
Figure 1 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 2 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 3 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 4 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Viaarxiv icon