Alert button
Picture for Michael Picheny

Michael Picheny

Alert button

Courant Computer Science and Center for Data Science, New York University

Improving Joint Speech-Text Representations Without Alignment

Aug 11, 2023
Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho

Figure 1 for Improving Joint Speech-Text Representations Without Alignment
Figure 2 for Improving Joint Speech-Text Representations Without Alignment
Figure 3 for Improving Joint Speech-Text Representations Without Alignment
Figure 4 for Improving Joint Speech-Text Representations Without Alignment
Viaarxiv icon

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

Apr 19, 2023
Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, Ronny Huang, Tara Sainath

Figure 1 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Figure 2 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Figure 3 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Figure 4 for A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Viaarxiv icon

Dual Learning for Large Vocabulary On-Device ASR

Jan 11, 2023
Cal Peyser, Ronny Huang, Tara Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho

Figure 1 for Dual Learning for Large Vocabulary On-Device ASR
Figure 2 for Dual Learning for Large Vocabulary On-Device ASR
Figure 3 for Dual Learning for Large Vocabulary On-Device ASR
Figure 4 for Dual Learning for Large Vocabulary On-Device ASR
Viaarxiv icon

Towards Disentangled Speech Representations

Aug 28, 2022
Cal Peyser, Ronny Huang Andrew Rosenberg Tara N. Sainath, Michael Picheny, Kyunghyun Cho

Figure 1 for Towards Disentangled Speech Representations
Figure 2 for Towards Disentangled Speech Representations
Figure 3 for Towards Disentangled Speech Representations
Figure 4 for Towards Disentangled Speech Representations
Viaarxiv icon

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

Nov 18, 2021
Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

Figure 1 for Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Figure 2 for Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Viaarxiv icon

Cascaded Multilingual Audio-Visual Learning from Videos

Nov 08, 2021
Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

Figure 1 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 2 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 3 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 4 for Cascaded Multilingual Audio-Visual Learning from Videos
Viaarxiv icon

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

Oct 08, 2021
Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Figure 1 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 2 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 3 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 4 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Viaarxiv icon

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

May 05, 2021
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Figure 1 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 2 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 3 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Figure 4 for Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Viaarxiv icon

Accented Speech Recognition Inspired by Human Perception

Apr 09, 2021
Xiangyun Chu, Elizabeth Combs, Amber Wang, Michael Picheny

Figure 1 for Accented Speech Recognition Inspired by Human Perception
Figure 2 for Accented Speech Recognition Inspired by Human Perception
Figure 3 for Accented Speech Recognition Inspired by Human Perception
Figure 4 for Accented Speech Recognition Inspired by Human Perception
Viaarxiv icon