Alert button
Picture for David Harwath

David Harwath

Alert button

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

May 18, 2023
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath

Figure 1 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Figure 2 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Figure 3 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Figure 4 for Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Viaarxiv icon

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Dec 13, 2022
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

Figure 1 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 2 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 3 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 4 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Viaarxiv icon

Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models

Dec 03, 2022
Reem Gody, David Harwath

Figure 1 for Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models
Figure 2 for Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models
Figure 3 for Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models
Figure 4 for Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models
Viaarxiv icon

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

Nov 11, 2022
Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald

Figure 1 for Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Figure 2 for Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Figure 3 for Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Figure 4 for Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Viaarxiv icon

Phoneme Segmentation Using Self-Supervised Speech Models

Nov 02, 2022
Luke Strgar, David Harwath

Figure 1 for Phoneme Segmentation Using Self-Supervised Speech Models
Figure 2 for Phoneme Segmentation Using Self-Supervised Speech Models
Figure 3 for Phoneme Segmentation Using Self-Supervised Speech Models
Figure 4 for Phoneme Segmentation Using Self-Supervised Speech Models
Viaarxiv icon

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

Nov 02, 2022
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath

Figure 1 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 2 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 3 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 4 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Oct 07, 2022
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Oct 03, 2022
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath

Figure 1 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 2 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 3 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 4 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Viaarxiv icon