Picture for Tom Ko

Tom Ko

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Add code
Mar 30, 2023
Figure 1 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 2 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 3 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Figure 4 for WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Viaarxiv icon

M3ST: Mix at Three Levels for Speech Translation

Add code
Dec 07, 2022
Viaarxiv icon

Leveraging per Image-Token Consistency for Vision-Language Pre-training

Add code
Nov 20, 2022
Viaarxiv icon

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Add code
Oct 28, 2022
Viaarxiv icon

Personalized Dialogue Generation with Persona-Adaptive Attention

Add code
Oct 27, 2022
Viaarxiv icon

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Add code
Oct 08, 2022
Figure 1 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Figure 2 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Figure 3 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Figure 4 for CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Viaarxiv icon

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

Add code
Aug 03, 2022
Figure 1 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Figure 2 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Figure 3 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Figure 4 for A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Viaarxiv icon

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

Add code
May 18, 2022
Figure 1 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Figure 2 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Figure 3 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Figure 4 for Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Viaarxiv icon

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

Add code
Apr 08, 2022
Figure 1 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 2 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 3 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Figure 4 for GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Viaarxiv icon

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

Add code
Mar 31, 2022
Figure 1 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 2 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 3 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 4 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Viaarxiv icon