Alert button
Picture for Takuya Yoshioka

Takuya Yoshioka

Alert button

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

Nov 01, 2023
Bandhav Veluri, Malek Itani, Justin Chan, Takuya Yoshioka, Shyamnath Gollakota

Viaarxiv icon

Profile-Error-Tolerant Target-Speaker Voice Activity Detection

Sep 21, 2023
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu

Figure 1 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 2 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 3 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 4 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Viaarxiv icon

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

Sep 15, 2023
Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Figure 1 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 2 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 3 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 4 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Viaarxiv icon

DiariST: Streaming Speech Translation with Speaker Diarization

Sep 14, 2023
Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

Viaarxiv icon

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Aug 14, 2023
Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Figure 1 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 2 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 3 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 4 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

May 30, 2023
Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

Figure 1 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 2 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 3 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 4 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Viaarxiv icon

i-Code Studio: A Configurable and Composable Framework for Integrative AI

May 23, 2023
Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, Ziyi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang

Figure 1 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 2 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 3 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 4 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Viaarxiv icon

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

May 21, 2023
Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Figure 1 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 2 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 3 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 4 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Viaarxiv icon

Target Sound Extraction with Variable Cross-modality Clues

Mar 15, 2023
Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Figure 1 for Target Sound Extraction with Variable Cross-modality Clues
Figure 2 for Target Sound Extraction with Variable Cross-modality Clues
Figure 3 for Target Sound Extraction with Variable Cross-modality Clues
Figure 4 for Target Sound Extraction with Variable Cross-modality Clues
Viaarxiv icon

Factual Consistency Oriented Speech Recognition

Feb 24, 2023
Naoyuki Kanda, Takuya Yoshioka, Yang Liu

Figure 1 for Factual Consistency Oriented Speech Recognition
Figure 2 for Factual Consistency Oriented Speech Recognition
Figure 3 for Factual Consistency Oriented Speech Recognition
Figure 4 for Factual Consistency Oriented Speech Recognition
Viaarxiv icon