Picture for Naoyuki Kanda

Naoyuki Kanda

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Add code
May 21, 2023
Figure 1 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 2 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 3 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 4 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Viaarxiv icon

Factual Consistency Oriented Speech Recognition

Add code
Feb 24, 2023
Viaarxiv icon

Simulating realistic speech overlaps improves multi-talker ASR

Add code
Nov 17, 2022
Viaarxiv icon

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Add code
Nov 11, 2022
Viaarxiv icon

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Add code
Nov 10, 2022
Figure 1 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 2 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 3 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 4 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Viaarxiv icon

Speech separation with large-scale self-supervised learning

Add code
Nov 09, 2022
Viaarxiv icon

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Add code
Sep 12, 2022
Figure 1 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 2 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 3 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 4 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Viaarxiv icon

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

Add code
Aug 27, 2022
Figure 1 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 2 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 3 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 4 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Viaarxiv icon

i-Code: An Integrative and Composable Multimodal Learning Framework

Add code
May 05, 2022
Figure 1 for i-Code: An Integrative and Composable Multimodal Learning Framework
Figure 2 for i-Code: An Integrative and Composable Multimodal Learning Framework
Figure 3 for i-Code: An Integrative and Composable Multimodal Learning Framework
Figure 4 for i-Code: An Integrative and Composable Multimodal Learning Framework
Viaarxiv icon

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Add code
Apr 07, 2022
Figure 1 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 2 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 3 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 4 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Viaarxiv icon