Alert button
Picture for Tom Ko

Tom Ko

Alert button

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Add code
Bookmark button
Alert button
Oct 28, 2022
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Figure 1 for Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Figure 2 for Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Viaarxiv icon

Personalized Dialogue Generation with Persona-Adaptive Attention

Add code
Bookmark button
Alert button
Oct 27, 2022
Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

Figure 1 for Personalized Dialogue Generation with Persona-Adaptive Attention
Figure 2 for Personalized Dialogue Generation with Persona-Adaptive Attention
Figure 3 for Personalized Dialogue Generation with Persona-Adaptive Attention
Figure 4 for Personalized Dialogue Generation with Persona-Adaptive Attention
Viaarxiv icon

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Add code
Bookmark button
Alert button
Oct 08, 2022
Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li

Figure 1 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Figure 2 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Figure 3 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Figure 4 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Viaarxiv icon

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

Add code
Bookmark button
Alert button
Aug 03, 2022
Qibing Bai, Tom Ko, Yu Zhang

Figure 1 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Figure 2 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Figure 3 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Figure 4 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Viaarxiv icon

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

Add code
Bookmark button
Alert button
May 18, 2022
Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang

Figure 1 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Figure 2 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Figure 3 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Figure 4 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Viaarxiv icon

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

Add code
Bookmark button
Alert button
Apr 08, 2022
Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao

Figure 1 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 2 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 3 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 4 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Viaarxiv icon

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

Add code
Bookmark button
Alert button
Mar 31, 2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei

Figure 1 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 2 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 3 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 4 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Viaarxiv icon

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Add code
Bookmark button
Alert button
Mar 29, 2022
Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

Figure 1 for LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Figure 2 for LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Figure 3 for LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Figure 4 for LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Viaarxiv icon

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing

Add code
Bookmark button
Alert button
Oct 14, 2021
Junyi Ao, Rui Wang, Long Zhou, Shujie Liu, Shuo Ren, Yu Wu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Figure 1 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Figure 2 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Figure 3 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Figure 4 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Viaarxiv icon

Multi-View Self-Attention Based Transformer for Speaker Recognition

Add code
Bookmark button
Alert button
Oct 11, 2021
Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

Figure 1 for Multi-View Self-Attention Based Transformer for Speaker Recognition
Figure 2 for Multi-View Self-Attention Based Transformer for Speaker Recognition
Figure 3 for Multi-View Self-Attention Based Transformer for Speaker Recognition
Figure 4 for Multi-View Self-Attention Based Transformer for Speaker Recognition
Viaarxiv icon