Alert button
Picture for Takuya Yoshioka

Takuya Yoshioka

Alert button

Simulating realistic speech overlaps improves multi-talker ASR

Add code
Bookmark button
Alert button
Oct 27, 2022
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Figure 1 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 2 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 3 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 4 for Simulating realistic speech overlaps improves multi-talker ASR
Viaarxiv icon

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Add code
Bookmark button
Alert button
Sep 12, 2022
Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Figure 1 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 2 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 3 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 4 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Viaarxiv icon

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

Add code
Bookmark button
Alert button
Aug 27, 2022
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu

Figure 1 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 2 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 3 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 4 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Viaarxiv icon

i-Code: An Integrative and Composable Multimodal Learning Framework

Add code
Bookmark button
Alert button
May 05, 2022
Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Figure 1 for i-Code: An Integrative and Composable Multimodal Learning Framework
Figure 2 for i-Code: An Integrative and Composable Multimodal Learning Framework
Figure 3 for i-Code: An Integrative and Composable Multimodal Learning Framework
Figure 4 for i-Code: An Integrative and Composable Multimodal Learning Framework
Viaarxiv icon

Ultra Fast Speech Separation Model with Teacher Student Learning

Add code
Bookmark button
Alert button
Apr 27, 2022
Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

Figure 1 for Ultra Fast Speech Separation Model with Teacher Student Learning
Figure 2 for Ultra Fast Speech Separation Model with Teacher Student Learning
Figure 3 for Ultra Fast Speech Separation Model with Teacher Student Learning
Viaarxiv icon

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Add code
Bookmark button
Alert button
Apr 07, 2022
Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka

Figure 1 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 2 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 3 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 4 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Viaarxiv icon

Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation

Add code
Bookmark button
Alert button
Apr 02, 2022
Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang

Figure 1 for Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation
Figure 2 for Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation
Figure 3 for Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation
Viaarxiv icon

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

Add code
Bookmark button
Alert button
Mar 30, 2022
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Figure 1 for Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Figure 2 for Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Figure 3 for Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Figure 4 for Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Viaarxiv icon

ICASSP 2022 Deep Noise Suppression Challenge

Add code
Bookmark button
Alert button
Feb 27, 2022
Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner

Figure 1 for ICASSP 2022 Deep Noise Suppression Challenge
Viaarxiv icon

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

Add code
Bookmark button
Alert button
Feb 05, 2022
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Figure 1 for Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Figure 2 for Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Figure 3 for Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Figure 4 for Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Viaarxiv icon