Alert button
Picture for Naoyuki Kanda

Naoyuki Kanda

Alert button

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Feb 12, 2024
Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Steven Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

Viaarxiv icon

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Jan 16, 2024
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

Viaarxiv icon

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Oct 23, 2023
Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

Viaarxiv icon

Profile-Error-Tolerant Target-Speaker Voice Activity Detection

Sep 21, 2023
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu

Figure 1 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 2 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 3 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 4 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Viaarxiv icon

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

Sep 15, 2023
Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Figure 1 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 2 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 3 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 4 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Viaarxiv icon

DiariST: Streaming Speech Translation with Speaker Diarization

Sep 14, 2023
Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

Viaarxiv icon

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Aug 14, 2023
Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Figure 1 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 2 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 3 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 4 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

May 30, 2023
Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

Figure 1 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 2 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 3 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 4 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Viaarxiv icon

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

May 21, 2023
Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Figure 1 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 2 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 3 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 4 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Viaarxiv icon

Factual Consistency Oriented Speech Recognition

Feb 24, 2023
Naoyuki Kanda, Takuya Yoshioka, Yang Liu

Figure 1 for Factual Consistency Oriented Speech Recognition
Figure 2 for Factual Consistency Oriented Speech Recognition
Figure 3 for Factual Consistency Oriented Speech Recognition
Figure 4 for Factual Consistency Oriented Speech Recognition
Viaarxiv icon