Alert button

"speech recognition": models, code, and papers
Alert button

Two-Pass Low Latency End-to-End Spoken Language Understanding

Add code
Bookmark button
Alert button
Jul 14, 2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

Figure 1 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 2 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 3 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 4 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Viaarxiv icon

Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition

Oct 31, 2018
David B. Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharifi

Figure 1 for Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
Figure 2 for Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
Figure 3 for Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
Figure 4 for Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
Viaarxiv icon

Efficient Segmental Cascades for Speech Recognition

Aug 02, 2016
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

Figure 1 for Efficient Segmental Cascades for Speech Recognition
Figure 2 for Efficient Segmental Cascades for Speech Recognition
Figure 3 for Efficient Segmental Cascades for Speech Recognition
Figure 4 for Efficient Segmental Cascades for Speech Recognition
Viaarxiv icon

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

Add code
Bookmark button
Alert button
Jun 16, 2022
Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman

Figure 1 for SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Figure 2 for SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Figure 3 for SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Figure 4 for SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Viaarxiv icon

Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

Mar 09, 2020
Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani

Figure 1 for Improving noise robust automatic speech recognition with single-channel time-domain enhancement network
Figure 2 for Improving noise robust automatic speech recognition with single-channel time-domain enhancement network
Figure 3 for Improving noise robust automatic speech recognition with single-channel time-domain enhancement network
Figure 4 for Improving noise robust automatic speech recognition with single-channel time-domain enhancement network
Viaarxiv icon

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

Jun 26, 2019
Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe

Figure 1 for Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Figure 2 for Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Figure 3 for Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Figure 4 for Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Viaarxiv icon

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

Jul 24, 2015
Haşim Sak, Andrew Senior, Kanishka Rao, Françoise Beaufays

Figure 1 for Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Figure 2 for Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Figure 3 for Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Figure 4 for Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Viaarxiv icon

Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions

Jun 30, 2019
Tejas Srinivasan, Ramon Sanabria, Florian Metze

Figure 1 for Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions
Figure 2 for Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions
Figure 3 for Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions
Figure 4 for Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions
Viaarxiv icon

Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective

Jun 29, 2022
Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang

Figure 1 for Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective
Figure 2 for Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective
Figure 3 for Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective
Figure 4 for Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective
Viaarxiv icon

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

Jul 11, 2022
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura

Figure 1 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 2 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 3 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Viaarxiv icon