Alert button
Picture for Tom Ko

Tom Ko

Alert button

Speech Translation with Large Language Models: An Industrial Practice

Add code
Bookmark button
Alert button
Dec 21, 2023
Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li

Viaarxiv icon

RepCodec: A Speech Representation Codec for Speech Tokenization

Add code
Bookmark button
Alert button
Aug 31, 2023
Zhichao Huang, Chutong Meng, Tom Ko

Figure 1 for RepCodec: A Speech Representation Codec for Speech Tokenization
Figure 2 for RepCodec: A Speech Representation Codec for Speech Tokenization
Figure 3 for RepCodec: A Speech Representation Codec for Speech Tokenization
Figure 4 for RepCodec: A Speech Representation Codec for Speech Tokenization
Viaarxiv icon

Recent Advances in Direct Speech-to-text Translation

Add code
Bookmark button
Alert button
Jun 20, 2023
Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu

Figure 1 for Recent Advances in Direct Speech-to-text Translation
Figure 2 for Recent Advances in Direct Speech-to-text Translation
Viaarxiv icon

MOSPC: MOS Prediction Based on Pairwise Comparison

Add code
Bookmark button
Alert button
Jun 18, 2023
Kexin Wang, Yunlong Zhao, Qianqian Dong, Tom Ko, Mingxuan Wang

Figure 1 for MOSPC: MOS Prediction Based on Pairwise Comparison
Figure 2 for MOSPC: MOS Prediction Based on Pairwise Comparison
Figure 3 for MOSPC: MOS Prediction Based on Pairwise Comparison
Figure 4 for MOSPC: MOS Prediction Based on Pairwise Comparison
Viaarxiv icon

PolyVoice: Language Models for Speech to Speech Translation

Add code
Bookmark button
Alert button
Jun 13, 2023
Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

Figure 1 for PolyVoice: Language Models for Speech to Speech Translation
Figure 2 for PolyVoice: Language Models for Speech to Speech Translation
Figure 3 for PolyVoice: Language Models for Speech to Speech Translation
Figure 4 for PolyVoice: Language Models for Speech to Speech Translation
Viaarxiv icon

CTC-based Non-autoregressive Speech Translation

Add code
Bookmark button
Alert button
May 27, 2023
Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu

Figure 1 for CTC-based Non-autoregressive Speech Translation
Figure 2 for CTC-based Non-autoregressive Speech Translation
Figure 3 for CTC-based Non-autoregressive Speech Translation
Figure 4 for CTC-based Non-autoregressive Speech Translation
Viaarxiv icon

DUB: Discrete Unit Back-translation for Speech Translation

Add code
Bookmark button
Alert button
May 19, 2023
Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

Figure 1 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 2 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 3 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 4 for DUB: Discrete Unit Back-translation for Speech Translation
Viaarxiv icon

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Add code
Bookmark button
Alert button
Mar 30, 2023
Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

Figure 1 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 2 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 3 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 4 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Viaarxiv icon

M3ST: Mix at Three Levels for Speech Translation

Add code
Bookmark button
Alert button
Dec 07, 2022
Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou

Figure 1 for M3ST: Mix at Three Levels for Speech Translation
Figure 2 for M3ST: Mix at Three Levels for Speech Translation
Figure 3 for M3ST: Mix at Three Levels for Speech Translation
Figure 4 for M3ST: Mix at Three Levels for Speech Translation
Viaarxiv icon

Leveraging per Image-Token Consistency for Vision-Language Pre-training

Add code
Bookmark button
Alert button
Nov 20, 2022
Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang

Figure 1 for Leveraging per Image-Token Consistency for Vision-Language Pre-training
Figure 2 for Leveraging per Image-Token Consistency for Vision-Language Pre-training
Figure 3 for Leveraging per Image-Token Consistency for Vision-Language Pre-training
Figure 4 for Leveraging per Image-Token Consistency for Vision-Language Pre-training
Viaarxiv icon