Picture for Michael Picheny

Michael Picheny

Courant Computer Science and Center for Data Science, New York University

Improving Joint Speech-Text Representations Without Alignment

Add code
Aug 11, 2023
Figure 1 for Improving Joint Speech-Text Representations Without Alignment
Figure 2 for Improving Joint Speech-Text Representations Without Alignment
Figure 3 for Improving Joint Speech-Text Representations Without Alignment
Figure 4 for Improving Joint Speech-Text Representations Without Alignment
Viaarxiv icon

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

Add code
Apr 19, 2023
Figure 1 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Figure 2 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Figure 3 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Figure 4 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Viaarxiv icon

Dual Learning for Large Vocabulary On-Device ASR

Add code
Jan 11, 2023
Figure 1 for Dual Learning for Large Vocabulary On-Device ASR
Figure 2 for Dual Learning for Large Vocabulary On-Device ASR
Figure 3 for Dual Learning for Large Vocabulary On-Device ASR
Figure 4 for Dual Learning for Large Vocabulary On-Device ASR
Viaarxiv icon

Towards Disentangled Speech Representations

Add code
Aug 28, 2022
Figure 1 for Towards Disentangled Speech Representations
Figure 2 for Towards Disentangled Speech Representations
Figure 3 for Towards Disentangled Speech Representations
Figure 4 for Towards Disentangled Speech Representations
Viaarxiv icon

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

Add code
Nov 18, 2021
Figure 1 for Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Figure 2 for Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Viaarxiv icon

Cascaded Multilingual Audio-Visual Learning from Videos

Add code
Nov 08, 2021
Figure 1 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 2 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 3 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 4 for Cascaded Multilingual Audio-Visual Learning from Videos
Viaarxiv icon

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

Add code
Oct 08, 2021
Figure 1 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 2 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 3 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 4 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Viaarxiv icon

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

Add code
May 05, 2021
Figure 1 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 2 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 3 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 4 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Viaarxiv icon

Accented Speech Recognition Inspired by Human Perception

Add code
Apr 09, 2021
Figure 1 for Accented Speech Recognition Inspired by Human Perception
Figure 2 for Accented Speech Recognition Inspired by Human Perception
Figure 3 for Accented Speech Recognition Inspired by Human Perception
Figure 4 for Accented Speech Recognition Inspired by Human Perception
Viaarxiv icon

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

Add code
Apr 07, 2021
Figure 1 for Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs
Figure 2 for Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs
Figure 3 for Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs
Figure 4 for Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs
Viaarxiv icon