Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

M3ST: Mix at Three Levels for Speech Translation


Dec 07, 2022
Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou

Add code

* Submitted to ICASSP 2023 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Leveraging per Image-Token Consistency for Vision-Language Pre-training


Nov 20, 2022
Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention


Oct 28, 2022
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Add code

* Submitted to ICASSP 2023 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Personalized Dialogue Generation with Persona-Adaptive Attention


Oct 27, 2022
Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

Add code

* 8 pages, 3 figures 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning


Oct 08, 2022
Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li

Add code

* Submitted to ICASSP 2023 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis


Aug 03, 2022
Qibing Bai, Tom Ko, Yu Zhang

Add code

* Accepted by INTERSPEECH 2022 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation


May 18, 2022
Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang

Add code

* Submitted to INTERSPEECH 2022 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

GigaST: A 10,000-hour Pseudo Speech Translation Corpus


Apr 08, 2022
Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao

Add code

* Submitted to Interspeech 2022. GigaST dataset is available at https://st-benchmark.github.io/resources/GigaST 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data


Mar 31, 2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei

Add code

* Submitted to INTERSPEECH 2022 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT


Mar 29, 2022
Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

Add code

* 5 pages, 2 figures, submitted to Insterspeech 2022 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email
1
2
3
>>