Alert button
Picture for Michael L. Seltzer

Michael L. Seltzer

Alert button

End-to-End Speech Recognition Contextualization with Large Language Models

Add code
Bookmark button
Alert button
Sep 19, 2023
Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen

Figure 1 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 2 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 3 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 4 for End-to-End Speech Recognition Contextualization with Large Language Models
Viaarxiv icon

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

Add code
Bookmark button
Alert button
Jul 22, 2023
Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer

Figure 1 for Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Figure 2 for Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Figure 3 for Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Figure 4 for Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Viaarxiv icon

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

Add code
Bookmark button
Alert button
Dec 15, 2022
Ke Li, Jay Mahadeokar, Jinxi Guo, Yangyang Shi, Gil Keren, Ozlem Kalinli, Michael L. Seltzer, Duc Le

Figure 1 for Improving Fast-slow Encoder based Transducer with Streaming Deliberation
Figure 2 for Improving Fast-slow Encoder based Transducer with Streaming Deliberation
Figure 3 for Improving Fast-slow Encoder based Transducer with Streaming Deliberation
Figure 4 for Improving Fast-slow Encoder based Transducer with Streaming Deliberation
Viaarxiv icon

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

Add code
Bookmark button
Alert button
Nov 10, 2022
Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer

Figure 1 for Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Figure 2 for Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Figure 3 for Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Figure 4 for Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Viaarxiv icon

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

Add code
Bookmark button
Alert button
Nov 02, 2022
Duc Le, Frank Seide, Yuhao Wang, Yang Li, Kjell Schubert, Ozlem Kalinli, Michael L. Seltzer

Figure 1 for Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Figure 2 for Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Figure 3 for Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Figure 4 for Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Viaarxiv icon

Deliberation Model for On-Device Spoken Language Understanding

Add code
Bookmark button
Alert button
Apr 04, 2022
Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer

Figure 1 for Deliberation Model for On-Device Spoken Language Understanding
Figure 2 for Deliberation Model for On-Device Spoken Language Understanding
Figure 3 for Deliberation Model for On-Device Spoken Language Understanding
Figure 4 for Deliberation Model for On-Device Spoken Language Understanding
Viaarxiv icon

Neural-FST Class Language Model for End-to-End Speech Recognition

Add code
Bookmark button
Alert button
Jan 31, 2022
Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer

Figure 1 for Neural-FST Class Language Model for End-to-End Speech Recognition
Figure 2 for Neural-FST Class Language Model for End-to-End Speech Recognition
Figure 3 for Neural-FST Class Language Model for End-to-End Speech Recognition
Viaarxiv icon

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

Add code
Bookmark button
Alert button
Oct 11, 2021
Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

Figure 1 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Figure 2 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Figure 3 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Figure 4 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Viaarxiv icon

Collaborative Training of Acoustic Encoders for Speech Recognition

Add code
Bookmark button
Alert button
Jul 13, 2021
Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra

Figure 1 for Collaborative Training of Acoustic Encoders for Speech Recognition
Figure 2 for Collaborative Training of Acoustic Encoders for Speech Recognition
Figure 3 for Collaborative Training of Acoustic Encoders for Speech Recognition
Viaarxiv icon